Contributor
Posts: 50

# running thousands of regression

Hi, I have data in the following form,

 Year Mkt-RF SMB HML RF Y1 Y2 and so on 200801 200802 and so on

I want to run a regression in the following form;

(Yi – RF) = α + Mkt-RF +   SMB + HML

Where i=1 to 2000

This essentially means that I want 2000 different regressions as my dependent variable is changing everytime.

Also my data only contains Yi, and I have to compute Yi-RF for each regression and finally I want one summary file that extracts the intercept term of each regression.

I am unsure on how to go about it. I can run individual regressions in SAS but generating thousands of regressions of this sort, I am not getting how to do it. Can anyone help me on this?

Super User
Posts: 9,874

## Re: running thousands of regression

Transpose to a long format and use by?

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super User
Posts: 10,686

Posts: 2,806

## Re: running thousands of regression

I always point out:

Just because you CAN run 1000s of regressions doesn't mean you SHOULD run 1000s of regressions.

It's not clear why you want to do this, but usually there are better methods to accomplish your goals than brute force running 1000s of regressions.

--
Paige Miller
Contributor
Posts: 50

## Re: running thousands of regression

Hi Miler,

I want to run this regression because I want to extract intercept term of each regression to be used as input for another data set. Do you have an easier suggestion?

Contributor
Posts: 50

## Re: running thousands of regression

Hi Miler,

I want to use intercept term of each regression to be used as input for another data set. Do you have another suggestion?

Super User
Posts: 9,874

## Re: running thousands of regression

See this simple example code:

``````data have;
input var1 \$ var2 \$ y1 y2 y3;
cards;
x y 1 2 3
a b 4 5 6
;
run;

proc transpose data=have out=int (rename=(col1=y));
by var1 var2 notsorted;
var y:;
run;``````

You can now sort dataset int by _name_ and use _name_ in a by statement in your regression analysis.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Posts: 2,806

## Re: running thousands of regression

@Amalik wrote:

Hi Miller,

I want to use intercept term of each regression to be used as input for another data set. Do you have another suggestion?

Given the very limited explanation ("used as input for another data set" and then how is it used?), there's no way I can give any useful suggestions.

In fact, you seem to be implying that you have Y1-Y2000 (that's two thousand Y variables), is that right? They're not independent of one another, they are correlated, perhaps some method that takes into account the correlation would help.

--
Paige Miller
SAS Super FREQ
Posts: 4,171

## Re: running thousands of regression

If you need to use PROC REG, use KSharp's suggestion and read that blog post.

However, if you only want the intercepts, not the other parameter estimates, you don't need to use PROC REG or even transpose the data. If you have SAS/IML, you can obtain the intercept estimates from the matrix of crossproducts by applying a SWEEP operator to the explanatory variables (plus intercept column). The intercept estimates are then in the first row of the swept matrix:

``````data a;
call streaminit(1);
array y[5];
array diff[5];
do time = 1 to 100;
RF  = rand("Normal");
Mkt = rand("Normal");
SMB = rand("Normal");
HML = rand("Normal");
MktRF = Mkt - RF;
do i = 1 to dim(y);
y[i] = rand("Normal");
diff[i] = Y[i] - RF;
end;
output;
end;

ods exclude all;
proc reg data=A plots=none;
var Y1-Y5 Diff1-Diff5 MktRF SMB HML;
Diff1: model Diff1 = MktRF SMB HML;
Diff2: model Diff2 = MktRF SMB HML;
Diff3: model Diff3 = MktRF SMB HML;
Diff4: model Diff4 = MktRF SMB HML;
Diff5: model Diff5 = MktRF SMB HML;
ods output ParameterEstimates=PE;
run;
ods exclude none;

/* print just the intercept estimates */
proc print data=PE(where=(Variable="Intercept"));run;

proc iml;
XNames = {MktRF SMB HML};
YNames = "Diff1":"Diff5";
varNames = XNames || YNames;
use A; read all var varNames into X; close;

XNames = "Intercept" || XNames;
varNames = XNames || YNames;
X = j(nrow(X), 1, 1) || X;

/* for crossproducts and sweep the explanatory columns */
XpX = X`*X;
S = sweep(XpX, 1:ncol(XNames));
/* 1st row contains intercepts; get column names for Y variables */
InterceptOnly = S[1, ncol(XNames)+1: ncol(varNames)];
print InterceptOnly[c=YNames];``````
SAS Super FREQ
Posts: 4,171

## Re: running thousands of regression

Now that I read the OP's question more carefully, I think this is simple to implement if the data does not have many missing values. The OP has multiple response variables. You just need to create the differences (Y[i] - RF) and then put all the difference variables on the left side of the equal sign on the MODEL statement. There is no need to transpose the data:

``````data Have;   /* simulate data */
call streaminit(1);
array y[3];
array x[4];
do i = 1 to 100;
do j = 1 to dim(x);
x[j] = rand("Normal");
end;
do j = 1 to dim(y);
y[j] = -1 + 2*x[1] - 3*x[2] + 4*x[3] + rand("Normal");
end;
output;
end;

%let p = 3;   /* <== make this 2000 if the variables are Y1-Y2000 */
data Diff / view=Diff;  /* create view for the differences */
set Have;
array y[*] y1-y&p;
array Diff[*] Diff1-Diff&p;
do i = 1 to dim(y);
Diff[i] = y[i] - X1;
end;
run;

proc reg data=Diff noprint outest=PE;
model Diff1-Diff&p = x1-x4;  /* all responses on left-hand side */
run;

proc print data=PE;
var _DEPVAR_ Intercept;
run;
``````
Discussion stats
• 9 replies
• 160 views
• 6 likes
• 5 in conversation