Dear community,
I would be glad if someone could help me understand (and, in fact assure that there is indeed nothing dubious going on behind the scene, which I, in my fondness, am not capable of getting aware of) the following issue.
I have a data set of roughly 8,000 monthly stock return observations.
For them, I would like to calculate the covariance.
Using "proc corr", I receive an "ERROR: The SAS System stopped processing this step because of insufficient memory." (64 GB RAM).
Desperately, I loaded the data into a matrix using "proc iml" with its "cov" and in less than two seconds, I get a covariance matrix, which is about 500MB in size.
Using two stocks only, "proc corr" and "proc iml" yield the same results (de facto, "proc iml" displays one decimal place more).
I specify "noprint" and "just in case" suppress ODS with the aid of this wonderful tool I am deeply greatful for (or "trying to use", rather) when using "proc corr".
Further, I only use stocks with non-missing data over the whole considered time span.
In the process, using "proc iml" means a detour.
Is this detour correct? If so, why is "proc iml" capable for handling the data while "proc corr" is not?
Yours sincerely,
Sinistrum
SAS says " MEMSIZE=2147483648".
That's only 2GB. If you have a system with 64 GB and you want to use more you need to increase your memsize option.
I don't think you can do that via an OPTION statement though, you need to modify it in the config file.
What's your MEMSIZE set to?
Although you have 64 GB of RAM do you have SAS set up to use it?
*display memsize option in the log;
proc options option=memsize;
run;
How big is your data, number of rows/cols?
You say 8000 monthly stock returns so that's 8000 columns with months in the rows? How many months?
@Sinistrum wrote:
Dear community,
I would be glad if someone could help me understand (and, in fact assure that there is indeed nothing dubious going on behind the scene, which I, in my fondness, am not capable of getting aware of) the following issue.
I have a data set of roughly 8,000 monthly stock return observations.
For them, I would like to calculate the covariance.
Using "proc corr", I receive an "ERROR: The SAS System stopped processing this step because of insufficient memory." (64 GB RAM).
Desperately, I loaded the data into a matrix using "proc iml" with its "cov" and in less than two seconds, I get a covariance matrix, which is about 500MB in size.
Using two stocks only, "proc corr" and "proc iml" yield the same results (de facto, "proc iml" displays one decimal place more).
I specify "noprint" and "just in case" suppress ODS with the aid of this wonderful tool I am deeply greatful for (or "trying to use", rather) when using "proc corr".
Further, I only use stocks with non-missing data over the whole considered time span.
In the process, using "proc iml" means a detour.
Is this detour correct? If so, why is "proc iml" capable for handling the data while "proc corr" is not?
Yours sincerely,
Sinistrum
You didn't show the code, but I suspect the issue is that PROC CORR is trying to display the huge correlation and covariance matrices in ODS. You don't say what you are trying to do with these huge matrices, but I'm guessing you don't need them displayed to the screen, so use OUTP=dataset to save the correlation/covariances to a SAS data set and use the NOPRINT option to suppress the output:
proc corr data=sashelp.class cov outp=CorrCov noprint;
run;
You can use WHERE clauses such as
where _Type_ = "COV";
to access only the relevant data.
The above code saves the CORR and COV. If you only want the COV, use
proc corr data=sashelp.class cov NOPRINT
outp=CorrCov(where=(_TYPE_^="CORR"));
run;
PS. Glad you liked my macro to suppress ODS output!
Also, if you know the stocks have nonmissing data, you can speed up the computation by using the NOMISS option:
proc corr nomiss noprint data=...;
Thank you for the quick responses!
Reeza worte:
What's your MEMSIZE set to?
Although you have 64 GB of RAM do you have SAS set up to use it?
SAS says " MEMSIZE=2147483648".
Reeza worte:
How big is your data, number of rows/cols?
I started playing with six months only to write the program and see how it works, as later on I want to employ daily data in this interval. Thus, the data set I used is only 1.1MB in size. The number of rows is equal to six. The number of rows is equal to 6, the number of columns is equal to roughly 8,000.
Rick_SAS worte:
You didn't show the code, but I suspect the issue is that PROC CORR is trying to display the huge correlation and covariance matrices in ODS. You don't say what you are trying to do with these huge matrices, but I'm guessing you don't need them displayed to the screen
You are completely right. I need the estimated covariances as input parameters to calculate portfolio variances, given different sets of weights.
It would bee convenient to have the whole covariance matrix, such that for each portfolio I need to compute the variance for I would "just" need to pick among the matrix the elements relevant for that particular portfolio.
I could loop over all different portfolios, id est, pick, from the list of stock returns, only those stocks actually included in the respective portfolio and then calculate the covariance matrix for this particular loop step.
But as there are 52 points in time where portfolios are built and roughly 500 portfolios each point in time, I would like to avoid it.
With what I have in my mind I would just need to estimate 52 covariance matrices.
My code was this (thank you for posting yours):
%ODSoff;
proc corr
data = in
noprint
cov;
ods output
cov = cov;
run;
%ODSon;
I am deeply grateful for your blog and stunned that I am indeed in a conversation with you.
How much your "proc iml" guidance helped me to get into this facility is hard to quantify.
SAS says " MEMSIZE=2147483648".
That's only 2GB. If you have a system with 64 GB and you want to use more you need to increase your memsize option.
I don't think you can do that via an OPTION statement though, you need to modify it in the config file.
Thank you, I have tried it immediately and it worked.
If you are using NOPRINT, you don't need to use %ODSOFF.
The ODS OUTPUT statement is not doing anything because no output is produced.
Use the OUTP= option, as I showed.
Yes, sorry, I posted the wrong code. I actually was not running with "NOPRINT" on.
Your "OUTP=" solution is much quicker than suppressing ODS and using "ods output" (takes 6 times longer in this special setting).
Still, it needs the set up of MEMSIZE while "proc iml" does not.
Thank you very much indeed, both of you!
Yes. As I say in my article "What is the best way to suppress ODS output in SAS?", "the NOPRINT option is the most efficient way to suppress output."
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.