BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
finans_sas
Quartz | Level 8

Dear SAS Community Member,

I have attached a preview of the data set I am going to ask you about. First, below is a brief description:

Data Description:

  • inames= industry names associated with each id
  • myear=month-year pair
  • id=firm id
  • returns= returns associated with a given firm id
  • last three columns give us industry names with their returns in a given month (independent of firm id. Originally matched with firm ids based on myear.)

I actually have 5000 firm_ids, 49 industry names, myear goes up 200812.

My Goal:

  • Run a simple time series regressions, regressing returns on each industry returns
  • Exclude cases where a firm's industry (inames) is the same as the industry in the columns. For example, in the data set provided, run id=2's returns against agric and chem, but banks. On the other hand, run id=1's returns against all of the three industries.
  • Save R-squared (and others) from each regression and choose the model with the highest R-squared.

My Progress:

  • Initially, I paired each stock id with all of the industry returns (72 months X 5000 companies X 49 industries) and ended up with millions of observations. Then, I removed the cases where inames are the same as the industry names, using datastep.
  • This appeared to be a very tedious way of accomplishing my goal and I started writing a SAS macro, where I got stock. Below is the outline of the program. It does not work at all, but I just wanted to give it a try. I would really appreciate your help to get it work.

*First impose the exclusion criterion via arrays;

data example ;

  set example;

  array industry(*) agric--chem;

  do i=1 to dim(industry);

    if input(vname(industry(i)) eq inames then call missing(industry(i));     *There seems to be an issue at this step. ;

  end;

  run;

%macro industry;

        %do m=1 %to 2;

         data reg&m ; set example; where id=&m;  *create a separate data set for each id;

                    array industry(*) agric--chem;  *introducing arrays in each data set, but I do not know how exactly I should do that;

                     do i=1 to dim(industry);

                    proc reg data=reg&m noprint outest=out tableout;

                    model returns =industry(i) ; *I want to store r-squared and others for each industry-id pair, but again I am not sure how to accomplish it;

                                          data adjr t r; set out;

                                         if _type_= 'adjrsq' then output adjr;      .

                                         if _type_='parms' then output p;

                                         if _type_='t' then output t;

                                         if _type_='rsq' then output r;

                                        data &myoutput; set adjr p t r ; industry;

                      end; run;

                                      proc append base=stats data=&myoutput; run;

       

      %end;

%mend;

%industry;

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

I don't think you need ANY macros.  If you get the data in the proper structure the entire analysis can be done with one call to PROC REG followed by a PROC SUMMARY to find the models(variables) of interest with max Rsquare.

proc transpose data=work.example out=taller(where=(inames ne _name_));
   by id inames myear returns;
   var agric--chem;
   run;
proc sort;
  
by id inames _name_;
   run;
proc reg data=taller  outest=est noprint;
  
by id inames _name_;
   model returns = col1 / edf;
  
run;
proc print;
  
run;
proc summary data=est nway;
  
class id inames;
   output out=maxrsq
     
idgroup(max(_RSQ_) out(_name_ _rsq_)=)
      ;
   run;
proc print;
  
run;

View solution in original post

3 REPLIES 3
Ksharp
Super User

Suggest you change the structure of Data firstly .then run REG by using BY statement.

%macro industry;

        %do m=1 %to 2;

         data reg&m ;

          set example;

               where id=&m;  *create a separate data set for each id;

              array industry(*) agric--chem;  *introducing arrays in each data set, but I do not know how exactly I should do that;

                     do i=1 to dim(industry);

                        _industry=industry(i);

                        output;

                     end;

         keep i _industry returns;

run;

proc sort data=reg&m ; by i ; run;

                    proc reg data=reg&m noprint outest=out tableout;

                      by i ;

                    model returns =_industry ; *I want to store r-squared and others for each industry-id pair, but again I am not sure how to accomplish it;

                                          data adjr t r; set out;

                                         if _type_= 'adjrsq' then output adjr;      .

                                         if _type_='parms' then output p;

                                         if _type_='t' then output t;

                                         if _type_='rsq' then output r;

                                        data &myoutput; set adjr p t r ; industry;

                      end; run;

                                      proc append base=stats data=&myoutput; run;

      

      %end;

%mend;

Ksharp

data_null__
Jade | Level 19

I don't think you need ANY macros.  If you get the data in the proper structure the entire analysis can be done with one call to PROC REG followed by a PROC SUMMARY to find the models(variables) of interest with max Rsquare.

proc transpose data=work.example out=taller(where=(inames ne _name_));
   by id inames myear returns;
   var agric--chem;
   run;
proc sort;
  
by id inames _name_;
   run;
proc reg data=taller  outest=est noprint;
  
by id inames _name_;
   model returns = col1 / edf;
  
run;
proc print;
  
run;
proc summary data=est nway;
  
class id inames;
   output out=maxrsq
     
idgroup(max(_RSQ_) out(_name_ _rsq_)=)
      ;
   run;
proc print;
  
run;
finans_sas
Quartz | Level 8

Thank you Ksharp and data_null for your help. The second solution was a lot easier to implement than SAS macros. This is really a great community.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 871 views
  • 5 likes
  • 3 in conversation