BookmarkSubscribeRSS Feed
Amalik
Calcite | Level 5

Hi, I have data in the following form,

 

Year

Mkt-RF

SMB

HML

RF

Y1

Y2

and so on

200801

       

200802

       

and so on

       

I want to run a regression in the following form;

 (Yi – RF) = α + Mkt-RF +   SMB + HML     

Where i=1 to 2000

This essentially means that I want 2000 different regressions as my dependent variable is changing everytime.

Also my data only contains Yi, and I have to compute Yi-RF for each regression and finally I want one summary file that extracts the intercept term of each regression.

I am unsure on how to go about it. I can run individual regressions in SAS but generating thousands of regressions of this sort, I am not getting how to do it. Can anyone help me on this?

9 REPLIES 9
PaigeMiller
Diamond | Level 26

I always point out:

 

Just because you CAN run 1000s of regressions doesn't mean you SHOULD run 1000s of regressions. 

 

It's not clear why you want to do this, but usually there are better methods to accomplish your goals than brute force running 1000s of regressions.

--
Paige Miller
Amalik
Calcite | Level 5

Hi Miler,

 

I want to run this regression because I want to extract intercept term of each regression to be used as input for another data set. Do you have an easier suggestion?

 

 

 

Amalik
Calcite | Level 5

Hi Miler,

 

I want to use intercept term of each regression to be used as input for another data set. Do you have another suggestion?

Kurt_Bremser
Super User

See this simple example code:

data have;
input var1 $ var2 $ y1 y2 y3;
cards;
x y 1 2 3
a b 4 5 6
;
run;

proc transpose data=have out=int (rename=(col1=y));
by var1 var2 notsorted;
var y:;
run;

You can now sort dataset int by _name_ and use _name_ in a by statement in your regression analysis.

PaigeMiller
Diamond | Level 26

@Amalik wrote:

Hi Miller,

 

I want to use intercept term of each regression to be used as input for another data set. Do you have another suggestion?


Given the very limited explanation ("used as input for another data set" and then how is it used?), there's no way I can give any useful suggestions.

 

In fact, you seem to be implying that you have Y1-Y2000 (that's two thousand Y variables), is that right? They're not independent of one another, they are correlated, perhaps some method that takes into account the correlation would help.

--
Paige Miller
Rick_SAS
SAS Super FREQ

If you need to use PROC REG, use KSharp's suggestion and read that blog post.

 

However, if you only want the intercepts, not the other parameter estimates, you don't need to use PROC REG or even transpose the data. If you have SAS/IML, you can obtain the intercept estimates from the matrix of crossproducts by applying a SWEEP operator to the explanatory variables (plus intercept column). The intercept estimates are then in the first row of the swept matrix:

 

data a;
call streaminit(1);
array y[5];
array diff[5];
do time = 1 to 100;
   RF  = rand("Normal");
   Mkt = rand("Normal");
   SMB = rand("Normal");
   HML = rand("Normal");
   MktRF = Mkt - RF;
   do i = 1 to dim(y);
      y[i] = rand("Normal");
      diff[i] = Y[i] - RF;
   end;
   output;
end;

ods exclude all;
proc reg data=A plots=none;
var Y1-Y5 Diff1-Diff5 MktRF SMB HML;
Diff1: model Diff1 = MktRF SMB HML;
Diff2: model Diff2 = MktRF SMB HML;
Diff3: model Diff3 = MktRF SMB HML;
Diff4: model Diff4 = MktRF SMB HML;
Diff5: model Diff5 = MktRF SMB HML;
ods output ParameterEstimates=PE;
run;
ods exclude none;

/* print just the intercept estimates */
proc print data=PE(where=(Variable="Intercept"));run;

proc iml;
XNames = {MktRF SMB HML};
YNames = "Diff1":"Diff5";
varNames = XNames || YNames;
use A; read all var varNames into X; close;

/* add intercept column */
XNames = "Intercept" || XNames;
varNames = XNames || YNames;
X = j(nrow(X), 1, 1) || X;

/* for crossproducts and sweep the explanatory columns */
XpX = X`*X;
S = sweep(XpX, 1:ncol(XNames));
/* 1st row contains intercepts; get column names for Y variables */
InterceptOnly = S[1, ncol(XNames)+1: ncol(varNames)]; 
print InterceptOnly[c=YNames];
Rick_SAS
SAS Super FREQ

Now that I read the OP's question more carefully, I think this is simple to implement if the data does not have many missing values. The OP has multiple response variables. You just need to create the differences (Y[i] - RF) and then put all the difference variables on the left side of the equal sign on the MODEL statement. There is no need to transpose the data:

 

data Have;   /* simulate data */
call streaminit(1);
array y[3];
array x[4];
do i = 1 to 100;
   do j = 1 to dim(x);
      x[j] = rand("Normal");
   end;
   do j = 1 to dim(y);
      y[j] = -1 + 2*x[1] - 3*x[2] + 4*x[3] + rand("Normal");
   end;
   output;
end;


%let p = 3;   /* <== make this 2000 if the variables are Y1-Y2000 */
data Diff / view=Diff;  /* create view for the differences */
set Have;
array y[*] y1-y&p;
array Diff[*] Diff1-Diff&p;
do i = 1 to dim(y);
   Diff[i] = y[i] - X1;
end;
run;

proc reg data=Diff noprint outest=PE;
   model Diff1-Diff&p = x1-x4;  /* all responses on left-hand side */
run;

proc print data=PE;
   var _DEPVAR_ Intercept;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2355 views
  • 6 likes
  • 5 in conversation