BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
finans_sas
Quartz | Level 8

Dear SAS Community Member,

I have attached a preview of the data set I am going to ask you about. First, below is a brief description:

Data Description:

  • inames= industry names associated with each id
  • myear=month-year pair
  • id=firm id
  • returns= returns associated with a given firm id
  • last three columns give us industry names with their returns in a given month (independent of firm id. Originally matched with firm ids based on myear.)

I actually have 5000 firm_ids, 49 industry names, myear goes up 200812.

My Goal:

  • Run a simple time series regressions, regressing returns on each industry returns
  • Exclude cases where a firm's industry (inames) is the same as the industry in the columns. For example, in the data set provided, run id=2's returns against agric and chem, but banks. On the other hand, run id=1's returns against all of the three industries.
  • Save R-squared (and others) from each regression and choose the model with the highest R-squared.

My Progress:

  • Initially, I paired each stock id with all of the industry returns (72 months X 5000 companies X 49 industries) and ended up with millions of observations. Then, I removed the cases where inames are the same as the industry names, using datastep.
  • This appeared to be a very tedious way of accomplishing my goal and I started writing a SAS macro, where I got stock. Below is the outline of the program. It does not work at all, but I just wanted to give it a try. I would really appreciate your help to get it work.

*First impose the exclusion criterion via arrays;

data example ;

  set example;

  array industry(*) agric--chem;

  do i=1 to dim(industry);

    if input(vname(industry(i)) eq inames then call missing(industry(i));     *There seems to be an issue at this step. ;

  end;

  run;

%macro industry;

        %do m=1 %to 2;

         data reg&m ; set example; where id=&m;  *create a separate data set for each id;

                    array industry(*) agric--chem;  *introducing arrays in each data set, but I do not know how exactly I should do that;

                     do i=1 to dim(industry);

                    proc reg data=reg&m noprint outest=out tableout;

                    model returns =industry(i) ; *I want to store r-squared and others for each industry-id pair, but again I am not sure how to accomplish it;

                                          data adjr t r; set out;

                                         if _type_= 'adjrsq' then output adjr;      .

                                         if _type_='parms' then output p;

                                         if _type_='t' then output t;

                                         if _type_='rsq' then output r;

                                        data &myoutput; set adjr p t r ; industry;

                      end; run;

                                      proc append base=stats data=&myoutput; run;

       

      %end;

%mend;

%industry;

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

I don't think you need ANY macros.  If you get the data in the proper structure the entire analysis can be done with one call to PROC REG followed by a PROC SUMMARY to find the models(variables) of interest with max Rsquare.

proc transpose data=work.example out=taller(where=(inames ne _name_));
   by id inames myear returns;
   var agric--chem;
   run;
proc sort;
  
by id inames _name_;
   run;
proc reg data=taller  outest=est noprint;
  
by id inames _name_;
   model returns = col1 / edf;
  
run;
proc print;
  
run;
proc summary data=est nway;
  
class id inames;
   output out=maxrsq
     
idgroup(max(_RSQ_) out(_name_ _rsq_)=)
      ;
   run;
proc print;
  
run;

View solution in original post

3 REPLIES 3
Ksharp
Super User

Suggest you change the structure of Data firstly .then run REG by using BY statement.

%macro industry;

        %do m=1 %to 2;

         data reg&m ;

          set example;

               where id=&m;  *create a separate data set for each id;

              array industry(*) agric--chem;  *introducing arrays in each data set, but I do not know how exactly I should do that;

                     do i=1 to dim(industry);

                        _industry=industry(i);

                        output;

                     end;

         keep i _industry returns;

run;

proc sort data=reg&m ; by i ; run;

                    proc reg data=reg&m noprint outest=out tableout;

                      by i ;

                    model returns =_industry ; *I want to store r-squared and others for each industry-id pair, but again I am not sure how to accomplish it;

                                          data adjr t r; set out;

                                         if _type_= 'adjrsq' then output adjr;      .

                                         if _type_='parms' then output p;

                                         if _type_='t' then output t;

                                         if _type_='rsq' then output r;

                                        data &myoutput; set adjr p t r ; industry;

                      end; run;

                                      proc append base=stats data=&myoutput; run;

      

      %end;

%mend;

Ksharp

data_null__
Jade | Level 19

I don't think you need ANY macros.  If you get the data in the proper structure the entire analysis can be done with one call to PROC REG followed by a PROC SUMMARY to find the models(variables) of interest with max Rsquare.

proc transpose data=work.example out=taller(where=(inames ne _name_));
   by id inames myear returns;
   var agric--chem;
   run;
proc sort;
  
by id inames _name_;
   run;
proc reg data=taller  outest=est noprint;
  
by id inames _name_;
   model returns = col1 / edf;
  
run;
proc print;
  
run;
proc summary data=est nway;
  
class id inames;
   output out=maxrsq
     
idgroup(max(_RSQ_) out(_name_ _rsq_)=)
      ;
   run;
proc print;
  
run;
finans_sas
Quartz | Level 8

Thank you Ksharp and data_null for your help. The second solution was a lot easier to implement than SAS macros. This is really a great community.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1108 views
  • 5 likes
  • 3 in conversation