BookmarkSubscribeRSS Feed
vl12
Calcite | Level 5

how to run the regression on a rolling basis as follows? for each firm (gvkey) and date(datadate), include all the observations for this firm before this date but require at least 8 observations for this regression? the regression code is as follows

data erc;
	set data.erctest1;

proc sort data=erc nodupkey;
by gvkey;

proc reg data=erc outest=cfsde adjrsq noprint;

	model  car= ue;
	by gvkey;

output out=varone r=residone;

the data is

data WORK.ERCTEST;

infile datalines dsd truncover;

input gvkey:$6. datadate:YYMMDDN8. ue:32. car:32.;

format datadate YYMMDDN8.;

label gvkey="Global Company Key" datadate="Data Date";

datalines;

001690 20070331 . .

001690 20070630 0.0004525206 0.4670161755

001690 20070930 0.0006423839 0.2271965327

001690 20071231 0.0038899365 0.0535349058

001690 20080331 -0.004237645 -0.04493543

001690 20080630 0.0001820518 0.1184123378

001690 20080930 0.0006338696 -0.242361881

001690 20081231 0.0147242809 -0.06718815

001690 20090331 -0.006772771 0.4468636412

001690 20090630 0.0016303555 0.0926635129

001690 20090930 0.0042211539 0.0569200846

001690 20091231 0.0044297225 0.0694110274

001690 20100331 -0.001422126 0.1062908265

001690 20100630 0.0007790463 0.1138357995

001690 20100930 0.0040591522 0.1280910542

001690 20101231 0.0057087261 -0.024709269

001690 20110331 -0.000052753 -0.053340722

001690 20110630 0.0042457609 0.1448117306

001690 20110930 -0.001933107 0.2521745565

001690 20111231 0.0170601425 -0.080865297

001690 20120331 -0.002572389 0.2810152156

001690 20120630 -0.005111778 0.0876576022

001690 20120930 -0.000959221 -0.032848439

001690 20121231 0.0097159076 -0.22854538

001690 20130331 -0.008485084 -0.223406274

001690 20130630 -0.007348195 -0.014162502

001690 20130930 0.0014275724 0.1887835903

001690 20131231 0.0111048837 0.0384054444

001690 20140331 -0.00615956 -0.078924939

001690 20140630 -0.004446851 0.1953453542

001690 20140930 0.0012165497 0.1066788925

001690 20141231 0.0148603652 0.0470219108

001690 20150331 -0.006213387 0.1303489251

001690 20150630 -0.004041365 0.0175582589

001690 20150930 0.0007264318 -0.071783252

001690 20151231 0.012400347 -0.069109732

001690 20160331 -0.013138595 -0.071681597

001690 20160630 -0.005275545 -0.119072014

001690 20160930 0.0020190515 0.2109433449

001690 20161231 0.0145839438 -0.026547759

001690 20170331 -0.009175424 0.1675285819

001690 20170630 -0.003105223 0.0070201798

001690 20170930 0.0025276878 0.0806324686

001690 20171231 0.010873663 -0.077033442

001690 20180331 -0.007527275 0.0769450604

001690 20180630 -0.002568958 0.1174891488

012141 20070331 . .

012141 20070630 -0.006840826 0.039301915

012141 20070930 0.0045501006 0.0791492442

012141 20071231 0.00125861 0.2159948102

012141 20080331 -0.001207337 -0.091396305

012141 20080630 -0.000361478 -0.013992142

012141 20080930 0.0003172004 0.2467016235

012141 20081231 -0.001151606 -0.171371867

012141 20090331 -0.00732306 -0.058326322

012141 20090630 0.0003211438 0.1484024492

012141 20090930 0.0023109721 -0.000563749

012141 20091231 0.0114983925 0.1113820094

012141 20100331 -0.010350052 -0.038707234

012141 20100630 0.0025670507 -0.094669369

012141 20100930 0.0042540329 -0.06558027

012141 20101231 0.005218999 0.0450195831

012141 20110331 -0.006549471 -0.113316172

012141 20110630 0.0029479832 0.0599936985

012141 20110930 -0.000649708 0.1114515078

012141 20111231 0.0040717526 -0.048890014

012141 20120331 -0.005595293 0.0330296286

012141 20120630 -0.021843021 -0.007530868

012141 20120930 0.0197814607 -0.068648255

012141 20121231 0.0085439504 -0.105270689

012141 20130331 -0.001348278 0.0355996234

012141 20130630 -0.00378879 0.1366469038

012141 20130930 0.0010044828 -0.111795675

012141 20131231 0.0042318432 -0.009889401

012141 20140331 -0.002652274 0.0791397488

012141 20140630 -0.003050357 0.066885466

012141 20140930 -0.000188136 0.0338976295

012141 20141231 0.0034658358 -0.001447478

012141 20150331 -0.002661945 -0.113894517

012141 20150630 -0.023081782 0.11049779

012141 20150930 0.0229078022 0.0872105183

012141 20151231 0.0002638288 0.204585892

012141 20160331 -0.002903418 -0.039893154

012141 20160630 -0.001586843 -0.090012729

012141 20160930 0.0056762626 0.0723214928

012141 20161231 0.0012491095 0.0365159269

012141 20170331 -0.001535477 0.033560407

012141 20170630 0.0048615462 0.0584440048

012141 20170930 -0.002596238 0.0345231002

012141 20171231 -0.019539189 0.1052017727

012141 20180331 0.0195564312 0.0792267377

012141 20180630 0.0019140615 0.0157609209

;;;;

5 REPLIES 5
mkeintz
PROC Star

First, the code

data erc;
	set data.erctest1;

proc sort data=erc nodupkey;
by gvkey;

has a major inefficiency, and also a major mistake.

 

 

The inefficiency.   You don't need the first data step.  You can sort from one dataset to another.  I.e. fro data.erctest1 to erc, as in

proc sort data=data.erctest1 out=erc ;
  by gvkey datadate;
run;

Also you had "nodupkey" and "by gvkey" which would generate 1 record per gvkey.  You want one record per gvkey/date, correct?  Hence, I have "by gvkey datadate".  And I didn't bother with nodupkey, assuming you only have one record per gvkey/datadate.

 

 

Now, to get rolling window regressions, I see that you want minimum rolling window size of 8, but what is the maximum size rolling window that you want?  You could do this by brute force, by repeating each observation a sufficient number of times (and then another sort), but I suspect you'll be better off generating rolling sum-of-squares-and-cross-products, and submitting that to the regression.  But first, please provide the maximum window size you want.

 

And also, have you insured that there are no holes in your time series?  If there are, do you want window size based on date-range, or based simply on record count?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
vl12
Calcite | Level 5

thank you! You are right that there is a mistake in the proc sort. Should be proc sort by gvkey  datadate.

 

Yes I'd like the minimum # of obs in a regression of 8 observations and I'd like to use all the observations prior to the certain 'gvkey datadate'. For example, if for one gvkey datadate, I have 9 quarter of data prior to this date, I'd like to put them all in the regression. And if I have 10, I'd like to use all of them PRIOR to this date. The only requirement is I 'd like to have at least 8 observations.

 

How can I do this?

mkeintz
PROC Star

If you have sas/ets (which includes proc expand) you could consider the appendix of the paper @PGStats referred to.  I'm "familiar" with the paper since I wrote it.  In short the appendix makes a series of "rolling" window sums-of-squares-and-cross-products which can be submitted to proc reg, instead of the original data.  At larger window sizes, this is an efficient approach.  Although your 11 years of quarterly data suggest your average window size is a relatively small 26 observations (min size 8, max size 44).

 

What the program does is:

 

  1. Makes a set of variables representing the products and cross products of your variables of interest for the regression.
  2. Run proc expand to make the rolling sums of squares and cross products.  However, in the example code it has "where _n=90".  You would need "where _n>=8".   And it also has "movsum 90" which can be left as is as long as you have no windows longer than 90 observations.  If you do change the 90 to a number you are sure exceeds the length of your longest window.
  3. Reshape the proc expand output into a form of SSCP acceptable to proc reg.
  4. run the proc reg.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PGStats
Opal | Level 21

Sorry Mark for not giving you credit. I hadn't checked the name of the author. Cheers.

PG
PGStats
Opal | Level 21

For a very efficient method to do all those regressions with proc expand and proc reg, look at Appendix 1 in

 

https://lexjansen.com/nesug/nesug12/fi/fi08.pdf

 

 

PG

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1098 views
  • 2 likes
  • 3 in conversation