Solved
Contributor
Posts: 57

# Macros for Regressions

Dear SAS Community Member,

I have attached a preview of the data set I am going to ask you about. First, below is a brief description:

Data Description:

• inames= industry names associated with each id
• myear=month-year pair
• id=firm id
• returns= returns associated with a given firm id
• last three columns give us industry names with their returns in a given month (independent of firm id. Originally matched with firm ids based on myear.)

I actually have 5000 firm_ids, 49 industry names, myear goes up 200812.

My Goal:

• Run a simple time series regressions, regressing returns on each industry returns
• Exclude cases where a firm's industry (inames) is the same as the industry in the columns. For example, in the data set provided, run id=2's returns against agric and chem, but banks. On the other hand, run id=1's returns against all of the three industries.
• Save R-squared (and others) from each regression and choose the model with the highest R-squared.

My Progress:

• Initially, I paired each stock id with all of the industry returns (72 months X 5000 companies X 49 industries) and ended up with millions of observations. Then, I removed the cases where inames are the same as the industry names, using datastep.
• This appeared to be a very tedious way of accomplishing my goal and I started writing a SAS macro, where I got stock. Below is the outline of the program. It does not work at all, but I just wanted to give it a try. I would really appreciate your help to get it work.

*First impose the exclusion criterion via arrays;

data example ;

set example;

array industry(*) agric--chem;

do i=1 to dim(industry);

if input(vname(industry(i)) eq inames then call missing(industry(i));     *There seems to be an issue at this step. ;

end;

run;

%macro industry;

%do m=1 %to 2;

data reg&m ; set example; where id=&m;  *create a separate data set for each id;

array industry(*) agric--chem;  *introducing arrays in each data set, but I do not know how exactly I should do that;

do i=1 to dim(industry);

proc reg data=reg&m noprint outest=out tableout;

model returns =industry(i) ; *I want to store r-squared and others for each industry-id pair, but again I am not sure how to accomplish it;

data adjr t r; set out;

if _type_='parms' then output p;

if _type_='t' then output t;

if _type_='rsq' then output r;

data &myoutput; set adjr p t r ; industry;

end; run;

proc append base=stats data=&myoutput; run;

%end;

%mend;

%industry;

Accepted Solutions
Solution
‎10-12-2012 09:01 AM
Posts: 3,852

## Re: Macros for Regressions

I don't think you need ANY macros.  If you get the data in the proper structure the entire analysis can be done with one call to PROC REG followed by a PROC SUMMARY to find the models(variables) of interest with max Rsquare.

proc transpose data=work.example out=taller(where=(inames ne _name_));
by id inames myear returns;
var agric--chem;
run;
proc sort;

by id inames _name_;
run;
proc reg data=taller  outest=est noprint;

by id inames _name_;
model returns = col1 / edf;

run;
proc print;

run;
proc summary data=est nway;

class id inames;
output out=maxrsq

idgroup(max(_RSQ_) out(_name_ _rsq_)=)
;
run;
proc print;

run;

All Replies
Super User
Posts: 10,770

## Re: Macros for Regressions

Suggest you change the structure of Data firstly .then run REG by using BY statement.

%macro industry;

%do m=1 %to 2;

data reg&m ;

set example;

where id=&m;  *create a separate data set for each id;

array industry(*) agric--chem;  *introducing arrays in each data set, but I do not know how exactly I should do that;

do i=1 to dim(industry);

_industry=industry(i);

output;

end;

keep i _industry returns;

run;

proc sort data=reg&m ; by i ; run;

proc reg data=reg&m noprint outest=out tableout;

by i ;

model returns =_industry ; *I want to store r-squared and others for each industry-id pair, but again I am not sure how to accomplish it;

data adjr t r; set out;

if _type_='parms' then output p;

if _type_='t' then output t;

if _type_='rsq' then output r;

data &myoutput; set adjr p t r ; industry;

end; run;

proc append base=stats data=&myoutput; run;

%end;

%mend;

Ksharp

Solution
‎10-12-2012 09:01 AM
Posts: 3,852

## Re: Macros for Regressions

I don't think you need ANY macros.  If you get the data in the proper structure the entire analysis can be done with one call to PROC REG followed by a PROC SUMMARY to find the models(variables) of interest with max Rsquare.

proc transpose data=work.example out=taller(where=(inames ne _name_));
by id inames myear returns;
var agric--chem;
run;
proc sort;

by id inames _name_;
run;
proc reg data=taller  outest=est noprint;

by id inames _name_;
model returns = col1 / edf;

run;
proc print;

run;
proc summary data=est nway;

class id inames;
output out=maxrsq

idgroup(max(_RSQ_) out(_name_ _rsq_)=)
;
run;
proc print;

run;
Contributor
Posts: 57