Contributor
Posts: 42

# Rolling regression with conditions

[ Edited ]

Hi.

I have a (company ID-time) panel data. Now I have to do 5-year rolling regression of COGS (as dependent variable) for each firm with 3 independent variables:

- COGS's first lag (COGS (t-1))

- current SALE (SALE (t))

- SALE's first lag (SALE (t-1))

and retain all of the coefficients for each date that has rolling window with >= 10 observations after filtering for outliers

The problem is:

+/ The rolling windows for each regression must have at least 10 observations (if any variable has more than 10 missing values in a rolling window, then skip the regression for that window)

+/ Quarterly growth rates of COGS, SALEQ and ASSET are no more than plus or minus 75% (if they are >75% or <-75%, then drop these observations from regressions)

.

The sample data is like this:

 gvkey date Company COGSQ SALE Asset 1001 19830331 A 1.258 2 10 1001 19830630 A 1.4 2.3 12 1001 19830930 A 1.5 3 14 1001 19831231 A 1.6 4 16 …….. …… ……. ……. ……. ……. 1002 19860331 B 2.3 5 20 1002 19860630 B 3.5 6 22 1002 19861031 B 6.5 6.5 25 1002 19860131 B 7 9 29 …….. …….. …….. …….. …….. ……..

The output is like this:

 gvkey date Company Intercept Coef_lagcogs Coef_sale Coef_lagsale 1001 19830331 A ……. ……. ……. ……. 1001 19830630 A ……. ……. ……. ……. 1001 19830930 A ……. ……. ……. ……. 1001 19831231 A ……. ……. ……. ……. …….. …… ……. ……. ……. ……. ……. 1002 19860331 B ……. ……. ……. ……. 1002 19860630 B ……. ……. ……. ……. 1002 19861031 B ……. ……. ……. ……. 1002 19860131 B ……. ……. ……. ……. …….. …….. …….. ……. ……. ……. …….

Contributor
Posts: 42

## Re: Rolling regression with conditions

Posted in reply to trungcva112
Please. Anything could help
Posts: 1,312

## Re: Rolling regression with conditions

Posted in reply to trungcva112

If you drop quarters with greater than a 75% change in cogsq or sale, then how do you define lag(cogsq) and lag(sale) for the subsequent quarter.

If  Q3 has 80% change vs Q2, then what values will you use as lag of coqsq and sale in Q4?

Contributor
Posts: 42

## Re: Rolling regression with conditions

Posted in reply to trungcva112
Good question. For example, If cogs in Q3 has 80% change vs Q2, then the lag of cogs in Q4 would be cogs in Q2 (or equivalently the nearest quarter that has cogs growth rate within (-75%, +75%).
Posts: 1,312

## Re: Rolling regression with conditions

Posted in reply to trungcva112

So does this mean that the concept of lagged values of cogsq and sale can represent varying time spans?   Let's say you have successive changes in sales of +80%, -%78%, and +90%, for a highly seasonal company.  Then your lagged value of cogsq and sale will represent one-year old values.

And more generally, I don't get this notion of dropping dramatic changes anyway, since it is likely to introduce even more dramatic changes in your analysis data.  Let's say a company is in a growth spurt, with successive changes in sales of 78% (for Q2) and 20% (for Q3).  By dropping Q2 and inserting Q1 sales as the lagged value for Q3, you will have introduced a "change" of 96% between current and "preceding" value of the variable.  That would increase the beta coefficient for lagged sale.

What you MIGHT want to do instead is insert a dummy variable indicating records with large absolute proportional changes, and keep those records.

Contributor
Posts: 42

## Re: Rolling regression with conditions

Posted in reply to trungcva112
Thanks for your advice. But how can I insert this dummy variable and other conditions in the rolling regression?
Contributor
Posts: 42

## Re: Rolling regression with conditions

Posted in reply to trungcva112
Anyone has any idea?
Posts: 1,312

## Re: Rolling regression with conditions

Posted in reply to trungcva112

This program should give you then needed infrastructure:

``````data need1 (keep=window data cogsq sale lag_: outlier_:) /view=need1;

do n=1 by 1 until (last.gvkey);
set have;
by gvkey;
lag_cogsq=lag(cogsq);
lag_sale=lag(sale);
if n=1 then call missing(of lag_:);
else do;
outlier_cogsq=1-(0.25<cogsq/lag_cogsq<1.75);
outlier_sale=1-(0.25<sale/lag_sale<1.75);
end;

array var {*} data cogsq sale lag_cogsq lag_sale outlier_cogsq outlier_sale;
array data {100,7}; /*Up to 100 historic records for 7 vars*/

do v=1 to 7; data{n,v}=var{v};end
end;

if n>10 then do end=11 to n;
beg=max(2,n-59);
window=end-11;
do row=beg to end;
do v=1 to 7; var{v}=data{row,v};end;
output;
end;
end;
run;

proc reg data=need;
......
run;
quit;``````

Remember data set NEED will be roughly 20 times the size of HAVE (you want 5-year rolling quarterly data). So I made NEED a data set VIEW instead of a data set FILE.  It's only activated when a subsequent PROC calls for NEED, and the data is streamed directly to the proc instead of a disk file.

Contributor
Posts: 42

## Re: Rolling regression with conditions

Hi everyone.

I have describe the data in more detail. Could anyone has an idea? Because previous codes do not work