Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Forecasting
- /
- how to get regression

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 05-12-2018 12:59 PM
(887 views)

I am trying to do a regression for the monthly data-set with more than 3 million data. The model is linear, and what I need is the coefficient of the independent variables, for the lag between 7 to 42. I know about proc reg data, but my problem is that I don't know how to use the lag of 7 to lag of 42 in the regression.

For example: I have the bellow data-set (1980-2017), and the model is "VWRETD=SMB HML Mkt_RF". But in the regression I should use the lag7 to lag42 of every variable.

Thank you in advance for your help.

8 REPLIES 8

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

PROC AUTOREG, with the NLAG option specified

Or compute the lag values as individual columns in a SAS data set before your run PROC REG, then include these columns into your MODEL statement in PROC REG.

I have never heard of a modeling situation where you use LAG 7, but not lag 1 through lag 6.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Do you think there is another way to solve this problem?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Almost impossible? I don't think so.

Would it take an hour of coding and typing? Possibly, much less if you can write a macro.

You say that it makes the model very big. That's because you are asking how to fit a model with all these terms. You are, in your original question, asking for a very big model. So I don't understand your point. Maybe you don't really need all of those lag columns...

And I don't really think SAS has problems fitting such a model; however the correlation between the variables will be a problem.

Maybe PROC ARIMA can be used for this, but this is beyond my skill level.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank You for your prompt reply. Seems my question was unclear. To make sure you understand my problem, let me explain it one more time.

I am using monthly value-weighted stock return data to compute monthly expected return (base of its size and market-to- book ratio of Fama French) same as the paper by Daniel and Titman (1997). I have to run regression for every single observation ( Model: monthly stock return (DV) = size + market to book ratio (IV)) to reach the coefficients; and then use them to compute expected return for every month-firm. In order to run each regression I should design its own specific data set by using the observation + lag7 of the observations through lag42 of the observations.

My problem is to write sas code to specify data set for each regressions.

I hope my explanation is clear.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I don't have the paper you mentioned, so that reference doesn't help.

In order to run each regression I should design its own specific data set by using the observation + lag7 of the observations through lag42 of the observations.

If I interpret this literally (which is the only way I can do things), you want each regression to have its own specific data set. I'm not sure that's necessary here. This seems like unnecessary work that you don't need to do (which is a redundant statement, but I state it this way for emphasis). SAS allows you to perform many regressions on a single data set with a BY command, and also by allowing (in PROC REG) multiple different model statements.

It also seems like you want each specific data set to have lag7 through lag42 (all of them). That's the literal meaning of your words. Yet somehow I get the feeling (although you don't say this) that you want one data set with lag7 (but no other lags) and another data set with lag8 (but no other lags) and so on until you get to the 36th data set which has lag42 (but no other lags).

So, which is it?

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks again Paige for your reply.

This is exactly what I am looking for. I want to use PROC REG to do it. and it should be lag 7 through lag 42 (-7, -8, -9,...., -42).

Best,

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

So, create one data set with all the lags, and of course any other variables that will be used.

```
proc reg data=whatever;
lag7: model VWRETD=SMB HML Mkt_RF lag7;
lag8: model VWRETD=SMB HML Mkt_RF lag8;
lag9: model VWRETD=SMB HML Mkt_RF lag9;
...
run;
```

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hey Miller,

Thank You for your suggestion. But your suggested commend hasn't solved my problem. When I run proc reg by year month permno, It gives me missing. Because of limiting to year month and permno, only one observation remain per model. That is why, previous papers recommend to use monthly lag 7 through lag 42 on the model.

What I need is to write a commend of "proc reg" for one model by year month permno when lags consider as observations on the model.

My model is : model VWRETD=SMB HML Mkt_RF

My data is monthly.

I need to have "coefficient var" of the model per month and per permno.

please look at the following commends. I also add ridge to model. but I am not sure, it is right to reach "coeff var"

proc sort data=a;

by PERMNO month year;

run;

proc reg data=a outvif

outest=b ridge=0 to 0.02 by .002;

m1: model VWRETD=SMB HML Mkt_RF;

by PERMNO month year;

run;

proc print data=b;

run;

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.