I am running a PROC MEANS on a large dataset (100gb) and keeps getting errors of insufficient memory. I read this article about in-database processing which could be my solution but do not really know how to implement it. Does anyone know how to deal with this issue?
http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a003331709.htm
Running out of memory? It's possible that in-database processing will help. Assuming that you have in-database processing available, you would simply have to switch from PROC MEANS to PROC HPSUMMARY. The syntax is pretty much the same.
It's also possible that you can control this without running in-database. Show us the PROC MEANS step that you are trying to run. Also (and this is unlikely if running in-database is even a possibility), is there a sorted order to your data?
Where and how is this data set located?
Try specifying
options sqlgeneration="dbms";
before your PROC MEANS run.
The documentation does say:
"In-database processing can greatly reduce the volume of data transferred to the procedure if there are no class variables (one row is returned) or if the selected class variables have a small number of unique values. However, because PROC MEANS loads the result set into its internal structures, the memory requirements for the SAS process will be equivalent to what would have been required without in-database processing."
Switching from CLASS to BY processing would most likely reduce memory requirements, but your data would need to be properly sorted or indexed.
thanks, but i read somewhere that using BY is more sufficient with large datasets, and so I am confused
BY processing is more efficient.
What are YOU doing?
Post your code. Memory requirements vary depending on the number of unique values of your CLASS statement variables. Do you know how many unique values you have?
Not only should you post your code, but 100gb is meaningless in this context. We need to know the number of observations, and the number of variables that you are computing means for, and probably the number of BY groups.
@somebody wrote:
thanks, but i read somewhere that using BY is more sufficient with large datasets, and so I am confused
And sometimes directing the output to a data set instead of the output/results window helps if you are generating lots of output in a table.
But since we haven't seen any actual code or log specific.
See this code:
proc sort data=sashelp.class out=work.class; by age; run; proc means data=work.class; by age; run; proc means data=work.class; class age; run;
Notice that the resulting displayed tables in the Results window take more "space". The results window tries to accumulate everything into memory to create the output tables. As a minimum the repeated header rows for each by groups adds to the memory requirement.
If you have a largish number of other variables coupled with many requested statistics and many values of the by variables you might be hitting the display memory limit.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.