I am trying to do a regression for the monthly data-set with more than 3 million data. The model is linear, and what I need is the coefficient of the independent variables, for the lag between 7 to 42. I know about proc reg data, but my problem is that I don't know how to use the lag of 7 to lag of 42 in the regression.
For example: I have the bellow data-set (1980-2017), and the model is "VWRETD=SMB HML Mkt_RF". But in the regression I should use the lag7 to lag42 of every variable.
Thank you in advance for your help.
PROC AUTOREG, with the NLAG option specified
Or compute the lag values as individual columns in a SAS data set before your run PROC REG, then include these columns into your MODEL statement in PROC REG.
I have never heard of a modeling situation where you use LAG 7, but not lag 1 through lag 6.
Thank you for your help and suggestions. Actually I have tried the first suggestion, unfortunately, it doesn't work. The second suggestion makes the data and model very big. I have 3 independent variables. Base of your suggestion, I need to calculate lag7 to lag 42 for each of the variables (42-7=36*3=108 columns). It is almost impossible.
Do you think there is another way to solve this problem?
Almost impossible? I don't think so.
Would it take an hour of coding and typing? Possibly, much less if you can write a macro.
You say that it makes the model very big. That's because you are asking how to fit a model with all these terms. You are, in your original question, asking for a very big model. So I don't understand your point. Maybe you don't really need all of those lag columns...
And I don't really think SAS has problems fitting such a model; however the correlation between the variables will be a problem.
Maybe PROC ARIMA can be used for this, but this is beyond my skill level.
Thank You for your prompt reply. Seems my question was unclear. To make sure you understand my problem, let me explain it one more time.
I am using monthly value-weighted stock return data to compute monthly expected return (base of its size and market-to- book ratio of Fama French) same as the paper by Daniel and Titman (1997). I have to run regression for every single observation ( Model: monthly stock return (DV) = size + market to book ratio (IV)) to reach the coefficients; and then use them to compute expected return for every month-firm. In order to run each regression I should design its own specific data set by using the observation + lag7 of the observations through lag42 of the observations.
My problem is to write sas code to specify data set for each regressions.
I hope my explanation is clear.
I don't have the paper you mentioned, so that reference doesn't help.
In order to run each regression I should design its own specific data set by using the observation + lag7 of the observations through lag42 of the observations.
If I interpret this literally (which is the only way I can do things), you want each regression to have its own specific data set. I'm not sure that's necessary here. This seems like unnecessary work that you don't need to do (which is a redundant statement, but I state it this way for emphasis). SAS allows you to perform many regressions on a single data set with a BY command, and also by allowing (in PROC REG) multiple different model statements.
It also seems like you want each specific data set to have lag7 through lag42 (all of them). That's the literal meaning of your words. Yet somehow I get the feeling (although you don't say this) that you want one data set with lag7 (but no other lags) and another data set with lag8 (but no other lags) and so on until you get to the 36th data set which has lag42 (but no other lags).
So, which is it?
Thanks again Paige for your reply.
This is exactly what I am looking for. I want to use PROC REG to do it. and it should be lag 7 through lag 42 (-7, -8, -9,...., -42).
Best,
So, create one data set with all the lags, and of course any other variables that will be used.
proc reg data=whatever;
lag7: model VWRETD=SMB HML Mkt_RF lag7;
lag8: model VWRETD=SMB HML Mkt_RF lag8;
lag9: model VWRETD=SMB HML Mkt_RF lag9;
...
run;
Hey Miller,
Thank You for your suggestion. But your suggested commend hasn't solved my problem. When I run proc reg by year month permno, It gives me missing. Because of limiting to year month and permno, only one observation remain per model. That is why, previous papers recommend to use monthly lag 7 through lag 42 on the model.
What I need is to write a commend of "proc reg" for one model by year month permno when lags consider as observations on the model.
My model is : model VWRETD=SMB HML Mkt_RF
My data is monthly.
I need to have "coefficient var" of the model per month and per permno.
please look at the following commends. I also add ridge to model. but I am not sure, it is right to reach "coeff var"
proc sort data=a;
by PERMNO month year;
run;
proc reg data=a outvif
outest=b ridge=0 to 0.02 by .002;
m1: model VWRETD=SMB HML Mkt_RF;
by PERMNO month year;
run;
proc print data=b;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.