BookmarkSubscribeRSS Feed
kandelous
Calcite | Level 5

I am trying to do a regression for the monthly data-set with more than 3 million data. The model is linear, and what I need is the coefficient of the independent variables, for the lag between 7 to 42. I know about proc reg data, but my problem is that I don't know how to use the lag of 7 to lag of 42 in the regression.

For example: I have the bellow data-set (1980-2017), and the model is "VWRETD=SMB HML Mkt_RF". But in the regression I should use the lag7 to lag42 of every variable.

 

Thank you in advance for your help.

8 REPLIES 8
PaigeMiller
Diamond | Level 26

PROC AUTOREG, with the NLAG option specified

 

Or compute the lag values as individual columns in a SAS data set before your run PROC REG, then include these columns into your MODEL statement in PROC REG.

 

I have never heard of a modeling situation where you use LAG 7, but not lag 1 through lag 6.

--
Paige Miller
kandelous
Calcite | Level 5

Thank you for your help and suggestions. Actually I have tried the first suggestion, unfortunately, it doesn't work. The second suggestion makes the data and model very big. I have 3 independent variables. Base of your suggestion, I need to calculate lag7 to lag 42 for each of the variables (42-7=36*3=108 columns). It is almost impossible.
Do you think there is another way to solve this problem?

PaigeMiller
Diamond | Level 26

Almost impossible? I don't think so.

 

Would it take an hour of coding and typing? Possibly, much less if you can write a macro. 

 

You say that it makes the model very big. That's because you are asking how to fit a model with all these terms. You are, in your original question, asking for a very big model. So I don't understand your point. Maybe you don't really need all of those lag columns...

 

And I don't really think SAS has problems fitting such a model; however the correlation between the variables will be a problem.

 

Maybe PROC ARIMA can be used for this, but this is beyond my skill level.

--
Paige Miller
kandelous
Calcite | Level 5

Thank You for your prompt reply. Seems my question was unclear. To make sure you understand my problem, let me explain it one more time.

I am using monthly value-weighted stock return data to compute monthly expected return (base of its size and market-to- book ratio of Fama French) same as the paper by Daniel and Titman (1997). I have to run regression for every single observation ( Model: monthly stock return (DV) = size + market to book ratio (IV)) to reach the coefficients; and then use them to compute expected return for every month-firm. In order to run each regression I should design its own specific data set by using the observation + lag7 of the observations through lag42 of the observations.

My problem is to write sas code to specify data set for each regressions.

I hope my explanation is clear. 

 

PaigeMiller
Diamond | Level 26

I don't have the paper you mentioned, so that reference doesn't help.

 

In order to run each regression I should design its own specific data set by using the observation + lag7 of the observations through lag42 of the observations.

 

If I interpret this literally (which is the only way I can do things), you want each regression to have its own specific data set. I'm not sure that's necessary here. This seems like unnecessary work that you don't need to do (which is a redundant statement, but I state it this way for emphasis). SAS allows you to perform many regressions on a single data set with a BY command, and also by allowing (in PROC REG) multiple different model statements.

 

It also seems like you want each specific data set to have lag7 through lag42 (all of them). That's the literal meaning of your words. Yet somehow I get the feeling (although you don't say this) that you want one data set with lag7 (but no other lags) and another data set with lag8 (but no other lags) and so on until you get to the 36th data set which has lag42 (but no other lags).

 

So, which is it? 

--
Paige Miller
kandelous
Calcite | Level 5

Thanks again Paige for your reply.

 

 

This is exactly what I am looking for. I want to use PROC REG to do it. and  it should be lag 7 through lag 42 (-7, -8, -9,...., -42).

 

Best,

 

 

 

 

PaigeMiller
Diamond | Level 26

So, create one data set with all the lags, and of course any other variables that will be used.

 

proc reg data=whatever;
    lag7: model VWRETD=SMB HML Mkt_RF lag7;
    lag8: model VWRETD=SMB HML Mkt_RF lag8;
    lag9: model VWRETD=SMB HML Mkt_RF lag9;
...
run;
--
Paige Miller
kandelous
Calcite | Level 5

Hey Miller,

Thank You for your suggestion. But your suggested commend hasn't solved my problem. When I run proc reg by year month permno, It gives me missing. Because of limiting to year month and permno, only one observation remain per model. That is why, previous  papers recommend to use monthly lag 7 through lag 42 on the model. 

What I need is to write a commend of "proc reg" for one model  by year month permno when lags consider as observations on the model.

 

My model is :   model VWRETD=SMB HML Mkt_RF

My data is monthly.

I need to have "coefficient var" of the  model per month and per permno.

 

 

please look at the following commends. I also add ridge to model. but I am not sure, it is right to reach "coeff var"

 

proc sort data=a;
by PERMNO month year;
run;

 

proc reg data=a outvif
outest=b ridge=0 to 0.02 by .002;
m1: model VWRETD=SMB HML Mkt_RF;
by PERMNO month year;
run;

 

proc print data=b;
run; 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Discussion stats
  • 8 replies
  • 2507 views
  • 0 likes
  • 2 in conversation