OK, here is a program that deconstructs the task, losing efficiency (for instance it does not combine generation of monthly SSCP with accumulating 12-month rolling SSCP:
Notes:
The program is untested, so I recommend you test it on 2 permnos with 13 months each (i.e. use something like where permno in (12404,16801) and date between ('01jan2011'd and 31jan2012'd in the first data step. That's 4 rolling 12-month windows (JAN-DEC and FEB-JAN for each permno). Then you can test the regression using this program against a direct PROC REG. Just run the direct proc reg's as follows: proc reg data=have; where permno=12404 and date between '01jan2011'd and '31dec2012'd; model ... quit; proc reg data=have; where permno=12404 and date between '01feb2011'd and '31jan2012'd; Do same for permno 16801 I suggest you not only compare the proc reg results, but take a look to the intermediate datasets to see the work that is taking place.
When you run the real data, do NOT run the PROC REG at the end as I have specified it. You'll need at least to put in a NOPRINT statement and also some keywords to output the r-square, total residuals, and estimated coefficients. I have it there just as a placeholder.
Notice you have to specific two macrovars:
varnames .. a list of variables that MIGHT be in the model. You can list, say 10 variables, and later on when you run PROC REGs, you can specific a subset in your model statement. I've just put in %let varnames= RET FACTOR1 FACTOR2 FACTOR3 as an example
NM .. number of months in your rolling windows. I put in %let NM=12; A third macro variable NRC is created by the program. NRC is the number of rows (and number of columns) in the generated SSCP matrix. It equals the number of vars in VARNAMES plus 1 (for the intercept term).
The sequence of steps are these:
data vtemp / view=vtemp Make a data set with a fixed value MONTH_END_DATE for every record. This is needed by the BY STATEMENT in the next proc.
PROC REG. Notice that this reg procedure does NOT estimate models. All is does is generate an SSCP for each permno/month_end_date. These are the value that need to be aggregated into 12-month rolling windows.
DATA SSCP_FINAL This one aggregate the rolling windows and puts them into a data set file (SSCP_FINAL), for submission to any regression you want to run. SSCP_FINAL has the "TYPE=SSCP" data set name parameter. Setting this attribute for the data set is neccessary for subsequent proc reg's to recognize that these are not regular data files. Also note it has the "if month_n>=&nm then output;" This prevent outputting windows before the first 12-months have been read in. IMPORTANT, IMPORTANT: This program assumes there are no "holes" in any window. I.e. a given permno has at least one active trading date in every month from its first month to its last month. This is probably a safe assumption in most cases, but remember: a stock can be termporarily delisted from an exchange, often when its price goes below a certain value. If it regains value it can be relisted.
PROC REG. This is where you run you model or models. Note this can be done later, because once you have generated SSCP_FINAL, you have the 12-month rolling SSCP values for all the estimations you need. Just remember that all the PROC REGs you run against SSCP_FINAL have to have a "by permno month_end_date;" statement. And as mentioned earlier, you probably want to put a NOPRINT option on this proc reg, and then use various parameters on it to store you R-squared, total residuals, and estimated coefficents in a separated data set.
Again, the efficieny loss here is that the data are passed through 2 times to get SSCP_FINAL. Both tasks could be done in a single DATA step, but the program simplification here might be worth it.
Regards,
Mark
editted addition. Notice that in the SSCP_FINAL step I use lag&nrc (which is LAG5 in this case). That is I am using a 5-deep lag queue. The reason is the every "record" in the incoming SSCP is a single row in the 5*5 matrix. One row for _NAME_="intercept", one for _NAME_="ret", one for _name_="FACTOR1" through "FACTOR3". This means that the each month has 5 records, so to get lagged values for corresponding records, I use LAG5, not LAG. Of course, if the user specifies, say 8 variables in macrovar VARNAMES, then there are 9 rows per month. That's why this program uses LAG&nrc - it automatically adjusts for the size of the SSCP matrix.
MK
/* Names of variables that might be part of models*/
%let varnames=ret factor1 factor2 factor3;
%let NM=12; /* Number of months per rolling window */
/* Get size of SSCP matrix (one row/col per variable & 1 row/col for intercept)*/
%let nrc=%eval(1+ %sysfunc(countw(&varnames,%str( ))));
/* Make a dataset view with fixed value (month_end_date) for each month*/
data vtemp / view=vtemp;
set have;
month_end_date=intnx('month',date,0,'end');
format month_end_date yymmddn8.;
run;
/* Use proc reg to make SSCP for each month */
/* Notice there is no MODEL statement */
proc reg data=vtemp noprint outsscp=sscp (where=(_type_='SSCP')) ;
var &varnames;
by permno month_end_date;
run;
/* Now accumlate rolling total 12-month SSCP values */
data sscp_final (type=sscp drop=row col month_n);
array total_sscp{&nrc,&nrc} _temporary_ ;
do row=1 to &nrc; do col=1 to &nrc; total_sscp{row,col}=0; end; end;
do month_n=1 by 1 until (last.permno);
do row=1 to &nrc;
set sscp;
by permno;
array vars {*} intercept &varnames;
do col=1 to &nrc;
total_sscp{row,col}=total_sscp{row,col}+vars{col}-ifn(month_n>&nm,lag&nrc(vars{col}),0);
end;
do col=1 to &nrc;
vars{col}=total_sscp{row,col};
end;
if month_n>=&nm then output;
end;
end;
run;
/* And run the regression for each permno/month_end_date */
proc reg data=sscp_final ;
by permno month_end_date;
var &varnames;
model ret=factor1 factor2 factor3 ;
quit;
... View more