If you drop quarters with greater than a 75% change in cogsq or sale, then how do you define lag(cogsq) and lag(sale) for the subsequent quarter.
If Q3 has 80% change vs Q2, then what values will you use as lag of coqsq and sale in Q4?
So does this mean that the concept of lagged values of cogsq and sale can represent varying time spans? Let's say you have successive changes in sales of +80%, -%78%, and +90%, for a highly seasonal company. Then your lagged value of cogsq and sale will represent one-year old values.
And more generally, I don't get this notion of dropping dramatic changes anyway, since it is likely to introduce even more dramatic changes in your analysis data. Let's say a company is in a growth spurt, with successive changes in sales of 78% (for Q2) and 20% (for Q3). By dropping Q2 and inserting Q1 sales as the lagged value for Q3, you will have introduced a "change" of 96% between current and "preceding" value of the variable. That would increase the beta coefficient for lagged sale.
What you MIGHT want to do instead is insert a dummy variable indicating records with large absolute proportional changes, and keep those records.
This program should give you then needed infrastructure:
data need1 (keep=window data cogsq sale lag_: outlier_:) /view=need1;
do n=1 by 1 until (last.gvkey);
set have;
by gvkey;
lag_cogsq=lag(cogsq);
lag_sale=lag(sale);
if n=1 then call missing(of lag_:);
else do;
outlier_cogsq=1-(0.25<cogsq/lag_cogsq<1.75);
outlier_sale=1-(0.25<sale/lag_sale<1.75);
end;
array var {*} data cogsq sale lag_cogsq lag_sale outlier_cogsq outlier_sale;
array data {100,7}; /*Up to 100 historic records for 7 vars*/
do v=1 to 7; data{n,v}=var{v};end
end;
if n>10 then do end=11 to n;
beg=max(2,n-59);
window=end-11;
do row=beg to end;
do v=1 to 7; var{v}=data{row,v};end;
output;
end;
end;
run;
proc reg data=need;
......
run;
quit;
Remember data set NEED will be roughly 20 times the size of HAVE (you want 5-year rolling quarterly data). So I made NEED a data set VIEW instead of a data set FILE. It's only activated when a subsequent PROC calls for NEED, and the data is streamed directly to the proc instead of a disk file.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.