Solved: Re: Macros for Regressions

finans_sas · Posted 10-11-2012 11:33 AM

Dear SAS Community Member,

I have attached a preview of the data set I am going to ask you about. First, below is a brief description:

Data Description:

inames= industry names associated with each id
myear=month-year pair
id=firm id
returns= returns associated with a given firm id
last three columns give us industry names with their returns in a given month (independent of firm id. Originally matched with firm ids based on myear.)

I actually have 5000 firm_ids, 49 industry names, myear goes up 200812.

My Goal:

Run a simple time series regressions, regressing returns on each industry returns
Exclude cases where a firm's industry (inames) is the same as the industry in the columns. For example, in the data set provided, run id=2's returns against agric and chem, but banks. On the other hand, run id=1's returns against all of the three industries.
Save R-squared (and others) from each regression and choose the model with the highest R-squared.

My Progress:

Initially, I paired each stock id with all of the industry returns (72 months X 5000 companies X 49 industries) and ended up with millions of observations. Then, I removed the cases where inames are the same as the industry names, using datastep.
This appeared to be a very tedious way of accomplishing my goal and I started writing a SAS macro, where I got stock. Below is the outline of the program. It does not work at all, but I just wanted to give it a try. I would really appreciate your help to get it work.

*First impose the exclusion criterion via arrays;

data example ;

set example;

array industry(*) agric--chem;

do i=1 to dim(industry);

if input(vname(industry(i)) eq inames then call missing(industry(i)); *There seems to be an issue at this step. ;

end;

run;

%macro industry;

%do m=1 %to 2;

data reg&m ; set example; where id=&m; *create a separate data set for each id;

array industry(*) agric--chem; *introducing arrays in each data set, but I do not know how exactly I should do that;

do i=1 to dim(industry);

proc reg data=reg&m noprint outest=out tableout;

model returns =industry(i) ; *I want to store r-squared and others for each industry-id pair, but again I am not sure how to accomplish it;

data adjr t r; set out;

if _type_= 'adjrsq' then output adjr; .

if _type_='parms' then output p;

if _type_='t' then output t;

if _type_='rsq' then output r;

data &myoutput; set adjr p t r ; industry;

end; run;

proc append base=stats data=&myoutput; run;

%end;

%mend;

%industry;

data_null__ · Posted 10-12-2012 09:01 AM

I don't think you need ANY macros. If you get the data in the proper structure the entire analysis can be done with one call to PROC REG followed by a PROC SUMMARY to find the models(variables) of interest with max Rsquare.

proc transpose data=work.example out=taller(where=(inames ne _name_));
   by id inames myear returns;
   var agric--chem;
   run; 
proc sort; 
   by id inames _name_;
   run; 
proc reg data=taller  outest=est noprint; 
   by id inames _name_;
   model returns = col1 / edf; 
   run; 
proc print; 
   run; 
proc summary data=est nway; 
   class id inames;
   output out=maxrsq
      idgroup(max(_RSQ_) out(_name_ _rsq_)=)
      ;
   run; 
proc print; 
   run; 

View solution in original post

Ksharp · Posted 10-12-2012 01:03 AM

Suggest you change the structure of Data firstly .then run REG by using BY statement.

%macro industry;

%do m=1 %to 2;

data reg&m ;

set example;

where id=&m; *create a separate data set for each id;

array industry(*) agric--chem; *introducing arrays in each data set, but I do not know how exactly I should do that;

do i=1 to dim(industry);

_industry=industry(i);

output;

end;

keep i _industry returns;

run;

proc sort data=reg&m ; by i ; run;

proc reg data=reg&m noprint outest=out tableout;

by i ;

model returns =_industry ; *I want to store r-squared and others for each industry-id pair, but again I am not sure how to accomplish it;

data adjr t r; set out;

if _type_= 'adjrsq' then output adjr; .

if _type_='parms' then output p;

if _type_='t' then output t;

if _type_='rsq' then output r;

data &myoutput; set adjr p t r ; industry;

end; run;

proc append base=stats data=&myoutput; run;

%end;

%mend;

Ksharp

data_null__ · Posted 10-12-2012 09:01 AM

I don't think you need ANY macros. If you get the data in the proper structure the entire analysis can be done with one call to PROC REG followed by a PROC SUMMARY to find the models(variables) of interest with max Rsquare.

proc transpose data=work.example out=taller(where=(inames ne _name_));
   by id inames myear returns;
   var agric--chem;
   run; 
proc sort; 
   by id inames _name_;
   run; 
proc reg data=taller  outest=est noprint; 
   by id inames _name_;
   model returns = col1 / edf; 
   run; 
proc print; 
   run; 
proc summary data=est nway; 
   class id inames;
   output out=maxrsq
      idgroup(max(_RSQ_) out(_name_ _rsq_)=)
      ;
   run; 
proc print; 
   run; 

finans_sas · Posted 10-13-2012 09:14 PM

Thank you Ksharp and data_null for your help. The second solution was a lot easier to implement than SAS macros. This is really a great community.

Macros for Regressions

Re: Macros for Regressions

Re: Macros for Regressions

Re: Macros for Regressions

Re: Macros for Regressions

Register Today!

SAS Training: Just a Click Away