turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Rolling regression with conditions

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-13-2017 07:16 AM - edited 01-08-2018 03:56 AM

Hi.

I have a (company ID-time) panel data. Now I have to do 5-year rolling regression of COGS (as dependent variable) for each firm with 3 independent variables:

- COGS's first lag (COGS (t-1))

- current SALE (SALE (t))

- SALE's first lag (SALE (t-1))

and **retain all of the coefficients for each date that has rolling window with >= 10 observations after filtering for outliers**

**The problem is:**

+/ The rolling windows for each regression must have at least 10 observations (if any variable has more than 10 missing values in a rolling window, then skip the regression for that window)

+/ Quarterly growth rates of COGS, SALEQ and ASSET are no more than plus or minus 75% (if they are >75% or <-75%, then drop these observations from regressions)

.

**The sample data is like this:**

| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to trungcva112

12-13-2017 08:59 AM

Please. Anything could help

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to trungcva112

12-13-2017 09:28 AM

If you drop quarters with greater than a 75% change in cogsq or sale, then how do you define lag(cogsq) and lag(sale) for the subsequent quarter.

If Q3 has 80% change vs Q2, then what values will you use as lag of coqsq and sale in Q4?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to trungcva112

12-13-2017 09:46 AM

Good question. For example, If cogs in Q3 has 80% change vs Q2, then the lag of cogs in Q4 would be cogs in Q2 (or equivalently the nearest quarter that has cogs growth rate within (-75%, +75%).

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to trungcva112

12-13-2017 10:38 AM

So does this mean that the concept of lagged values of cogsq and sale can represent varying time spans? Let's say you have successive changes in sales of +80%, -%78%, and +90%, for a highly seasonal company. Then your lagged value of cogsq and sale will represent one-year old values.

And more generally, I don't get this notion of dropping dramatic changes anyway, since it is likely to introduce even more dramatic changes in your analysis data. Let's say a company is in a growth spurt, with successive changes in sales of 78% (for Q2) and 20% (for Q3). By dropping Q2 and inserting Q1 sales as the lagged value for Q3, you will have introduced a "change" of 96% between current and "preceding" value of the variable. That would increase the beta coefficient for lagged sale.

What you MIGHT want to do instead is insert a dummy variable indicating records with large absolute proportional changes, and keep those records.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to trungcva112

12-13-2017 06:12 PM

Thanks for your advice. But how can I insert this dummy variable and other conditions in the rolling regression?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to trungcva112

12-14-2017 09:17 AM

Anyone has any idea?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to trungcva112

12-14-2017 10:43 AM

This program should give you then needed infrastructure:

```
data need1 (keep=window data cogsq sale lag_: outlier_:) /view=need1;
do n=1 by 1 until (last.gvkey);
set have;
by gvkey;
lag_cogsq=lag(cogsq);
lag_sale=lag(sale);
if n=1 then call missing(of lag_:);
else do;
outlier_cogsq=1-(0.25<cogsq/lag_cogsq<1.75);
outlier_sale=1-(0.25<sale/lag_sale<1.75);
end;
array var {*} data cogsq sale lag_cogsq lag_sale outlier_cogsq outlier_sale;
array data {100,7}; /*Up to 100 historic records for 7 vars*/
do v=1 to 7; data{n,v}=var{v};end
end;
if n>10 then do end=11 to n;
beg=max(2,n-59);
window=end-11;
do row=beg to end;
do v=1 to 7; var{v}=data{row,v};end;
output;
end;
end;
run;
proc reg data=need;
......
run;
quit;
```

Remember data set NEED will be roughly 20 times the size of HAVE (you want 5-year rolling quarterly data). So I made NEED a data set VIEW instead of a data set FILE. It's only activated when a subsequent PROC calls for NEED, and the data is streamed directly to the proc instead of a disk file.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mkeintz

01-08-2018 03:55 AM

Hi everyone.

I have describe the data in more detail. Could anyone has an idea? Because previous codes do not work

I have describe the data in more detail. Could anyone has an idea? Because previous codes do not work

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to trungcva112

01-08-2018 10:53 AM

- The phrase "do not work" is not a usefull description of the problem. What results did you get vs what you expected? Please show those results.

- Test this with a sample data set with two gvkeys and just a couple years of data. And if you are testing my suggestion, change the dataset view VNEED to a data set file. Then you can examine the intermediate data set to confirm it is constructed as intended.