Hi,
I am trying to run PROC corr in order to obtain some rolling correlations based on 3 years. My main problem is that there is no enough internal space as the utility temporary file is really big ( I am dealing with more than 100M datapoints). Is there any way to avoid this utility temporary file to be saved on the SAS temporary files? I tried many things but neither of them worked.
This is my code
DATA rwin / view=rwin;
array _X {1116} _temporary_ ;
array _Y {1116} _temporary_ ;
array _D {1116} _temporary_ ;
set CORRELATIONS3;
by PERMNO;
retain N 0;
N = ifn(first.PERMNO,1,N+1);
I=mod(N-1,1116)+1;
_X{I}=marketRET1;
_Y{I}=stockRET1;
_D{I}=day;
if N>=1116 then do I= 1 to 1116;
x=_X{I};
y=_Y{I};
d=_D{I};
output;
end;
run;
proc corr data=rwin noprint NOPROB NOSIMPLE outp=Babcorr1(where=(_type_='CORR'));
by permno N;
var x;
with y;
quit;
Are you actually using the _X _Y and _D variables later? If not you might consider dropping them.
If you do but don't want them around for the Coor try
proc corr data=rwin (keep = permno n x y) <remaining code for CORR>
to reduce the number of variables read into the proc.
Another might be to break up the analysis into chunks of Permno or N values, create a separate output set for each and recombine afterwards.
Usually it makes sense to create the big dataset and then do the calculations in one pass.
In this case, if you're running out of space, it may be easier to write a macro to do each loop and then apppend the results into a table.
Can you provide a few lines of data from the correlations3 data set, at least the variables used in making rwin?
And explain where the magic number 1116 comes from?
I tried simulating a data set but the results I get for rwin have an extremely large number of missing values for x and y so I likely have not guessed correctly quite what role 1116 plays.
Hi,
I am trying to compute the BAB factor in the Betting against Beta paper. I am using overlapping 3 days log returns of the market returns and the stock returns. I am assuming that every month has 31 days and as they look for at least 3 years of data this is the way I arrive to 1116. I impose to get at least 750 trading days with non-missing returns as they require in the paper. Thanks to assuming this, I do not have problems when computing the rolling windows. The last step is to use the proc corr in order to arrive to the correlations. I am following what some of you recommend me I am geting rid of the \view and trying to run the proc corr for the essential elements I need x y PERMNO and N
First, months may have 30 to 31 calendar days,but they typically have only 20 trading days. Three years of trading with daily data is actually closer to 3*12*20=720 daily records, not 1116.
Second you only want PERMNO's with three years or more of trading. Don't use the daily returns with the array to find out which permno's qualify. Go to the other CRSP datasets. There's one out there that has the earliest and latest trading dates for each permno. You can use that to identify the ones you want to keep.
Or use PROC SUMMARY on your returns data to get the date range and count of records for each PERMNO, as in
proc summary data=mydata noprint n min max;
class permno;
var date;
output n(date)=ndates min(date)=mindate max(date)=maxdate out=permno_stats;
run;
Then you can merge that with your data and filter which permno's to process using conditions on NDATES or on MINDATE/MAXDATE.
But most important - don't drop the view technique. Instead drop the brute force calculation of correlations by providing complete windows of original returns data. Instead use proc expand to create rolling windows of SSCP (preceded by a very helpful data set view). From the sscp data, it's easy to calculate correlations.
My example using sashelp.stocks shows my recommendation. It has 5 steps:
This should eliminate a lot of calculations - primarily because proc expand will generate rolling sscp's by adding the latest squares and cross-products while subtracting the 12-month old equivalents. Not by reprocessing all the records in the new window. The larger the window size, the more beneficial this will be.
/* Get monthly stock data for ibm, intel,msft */
proc sort data=sashelp.stocks (keep=stock date open close)
out=stocks (rename=(open=x close=y));
by stock date;
run;
/* Get the profiles of each stock */
proc summary data=sashelp.stocks noprint n min max nway;
class stock;
var date;
output n(date)=ndates min(date)=mindate max(date)=maxdate
out=stock_stats (keep=stock ndates mindate maxdate);
run;
/* Prepare to use PROC EXPAND to generate rolling SSCP values */
/* and simultaneously only keep stock with the desired date range */
data need /view=need;
merge stocks stock_stats;
by stock;
if ndates>200;
xx=x*x; xy=x*y; yx=y*x; yy=y*y;
run;
/* Now make rolling 12-month windows with sums of square and cross products*/
/* and also rolling means and std */
proc expand data=need out=sscpdata (where=(_n=12)) method= none;
by stock;
id date;
convert y=_n / transformin=(*0) transformout=(+1 MOVSUM 12);
convert x=xsum y=ysum xy=xysum xx=xxsum yy=yysum / transformout=(MOVSUM 12);
convert x=xmean y=ymean / transformout=(MOVAVE 12);
convert x=xstd y=ystd / transformout=(MOVSTD 12);
run;
/* Now get the correlation values */
data corrdata;
set sscpdata;
corrxy=(xysum-_n*xmean*ymean)/((_n-1)*xstd*ystd);
run;
Just a comment on mkeintz's excellent response. The trick of using PROC EXPAND and the DATA step to create a rolling correlation is explained in Vora (2008) "Easy Rolling Statistics with PROC EXPAND."
Dear all,
Thanks for all your comments. I found that this temporary utility problem is mainly coming from the use of \view. I considered now all the observations and I did not have any problem about running proc corr statement (It ony took more time to do it). On the other hand, I used this 31 days assumption in order to take into account missing returns in some days. At the end, I am always imposing to get at least 750 trading days per stock but this way it is very easy to create the rolling windows. The only thing I do is to use a macro in order to compute the number of non-missing returns in these windows and impuse at least 750 days.
Thanks a lot to all of you
Best
Pedro
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.