04-11-2018 08:34 PM
Hi, I have data in the following form,
Year | Mkt-RF | SMB | HML | RF | Y1 | Y2 | and so on |
200801 | |||||||
200802 | |||||||
and so on |
I want to run a regression in the following form;
(Yi – RF) = α + Mkt-RF + SMB + HML
Where i=1 to 2000
This essentially means that I want 2000 different regressions as my dependent variable is changing everytime.
Also my data only contains Yi, and I have to compute Yi-RF for each regression and finally I want one summary file that extracts the intercept term of each regression.
I am unsure on how to go about it. I can run individual regressions in SAS but generating thousands of regressions of this sort, I am not getting how to do it. Can anyone help me on this?
04-12-2018 02:45 AM
Transpose to a long format and use by?
04-12-2018 08:46 AM
04-12-2018 08:59 AM
I always point out:
Just because you CAN run 1000s of regressions doesn't mean you SHOULD run 1000s of regressions.
It's not clear why you want to do this, but usually there are better methods to accomplish your goals than brute force running 1000s of regressions.
04-12-2018 06:28 PM
Hi Miler,
I want to run this regression because I want to extract intercept term of each regression to be used as input for another data set. Do you have an easier suggestion?
04-12-2018 06:29 PM
Hi Miler,
I want to use intercept term of each regression to be used as input for another data set. Do you have another suggestion?
04-13-2018 01:54 AM
See this simple example code:
data have;
input var1 $ var2 $ y1 y2 y3;
cards;
x y 1 2 3
a b 4 5 6
;
run;
proc transpose data=have out=int (rename=(col1=y));
by var1 var2 notsorted;
var y:;
run;
You can now sort dataset int by _name_ and use _name_ in a by statement in your regression analysis.
04-13-2018 10:14 AM
@Amalik wrote:
Hi Miller,
I want to use intercept term of each regression to be used as input for another data set. Do you have another suggestion?
Given the very limited explanation ("used as input for another data set" and then how is it used?), there's no way I can give any useful suggestions.
In fact, you seem to be implying that you have Y1-Y2000 (that's two thousand Y variables), is that right? They're not independent of one another, they are correlated, perhaps some method that takes into account the correlation would help.
04-13-2018 11:08 AM
If you need to use PROC REG, use KSharp's suggestion and read that blog post.
However, if you only want the intercepts, not the other parameter estimates, you don't need to use PROC REG or even transpose the data. If you have SAS/IML, you can obtain the intercept estimates from the matrix of crossproducts by applying a SWEEP operator to the explanatory variables (plus intercept column). The intercept estimates are then in the first row of the swept matrix:
data a;
call streaminit(1);
array y[5];
array diff[5];
do time = 1 to 100;
RF = rand("Normal");
Mkt = rand("Normal");
SMB = rand("Normal");
HML = rand("Normal");
MktRF = Mkt - RF;
do i = 1 to dim(y);
y[i] = rand("Normal");
diff[i] = Y[i] - RF;
end;
output;
end;
ods exclude all;
proc reg data=A plots=none;
var Y1-Y5 Diff1-Diff5 MktRF SMB HML;
Diff1: model Diff1 = MktRF SMB HML;
Diff2: model Diff2 = MktRF SMB HML;
Diff3: model Diff3 = MktRF SMB HML;
Diff4: model Diff4 = MktRF SMB HML;
Diff5: model Diff5 = MktRF SMB HML;
ods output ParameterEstimates=PE;
run;
ods exclude none;
/* print just the intercept estimates */
proc print data=PE(where=(Variable="Intercept"));run;
proc iml;
XNames = {MktRF SMB HML};
YNames = "Diff1":"Diff5";
varNames = XNames || YNames;
use A; read all var varNames into X; close;
/* add intercept column */
XNames = "Intercept" || XNames;
varNames = XNames || YNames;
X = j(nrow(X), 1, 1) || X;
/* for crossproducts and sweep the explanatory columns */
XpX = X`*X;
S = sweep(XpX, 1:ncol(XNames));
/* 1st row contains intercepts; get column names for Y variables */
InterceptOnly = S[1, ncol(XNames)+1: ncol(varNames)];
print InterceptOnly[c=YNames];
04-13-2018 02:05 PM
Now that I read the OP's question more carefully, I think this is simple to implement if the data does not have many missing values. The OP has multiple response variables. You just need to create the differences (Y[i] - RF) and then put all the difference variables on the left side of the equal sign on the MODEL statement. There is no need to transpose the data:
data Have; /* simulate data */
call streaminit(1);
array y[3];
array x[4];
do i = 1 to 100;
do j = 1 to dim(x);
x[j] = rand("Normal");
end;
do j = 1 to dim(y);
y[j] = -1 + 2*x[1] - 3*x[2] + 4*x[3] + rand("Normal");
end;
output;
end;
%let p = 3; /* <== make this 2000 if the variables are Y1-Y2000 */
data Diff / view=Diff; /* create view for the differences */
set Have;
array y[*] y1-y&p;
array Diff[*] Diff1-Diff&p;
do i = 1 to dim(y);
Diff[i] = y[i] - X1;
end;
run;
proc reg data=Diff noprint outest=PE;
model Diff1-Diff&p = x1-x4; /* all responses on left-hand side */
run;
proc print data=PE;
var _DEPVAR_ Intercept;
run;