06-15-2018 05:21 PM
I am running a PROC MEANS on a large dataset (100gb) and keeps getting errors of insufficient memory. I read this article about in-database processing which could be my solution but do not really know how to implement it. Does anyone know how to deal with this issue?
06-15-2018 05:39 PM
Running out of memory? It's possible that in-database processing will help. Assuming that you have in-database processing available, you would simply have to switch from PROC MEANS to PROC HPSUMMARY. The syntax is pretty much the same.
It's also possible that you can control this without running in-database. Show us the PROC MEANS step that you are trying to run. Also (and this is unlikely if running in-database is even a possibility), is there a sorted order to your data?
06-15-2018 06:04 PM
The documentation does say:
"In-database processing can greatly reduce the volume of data transferred to the procedure if there are no class variables (one row is returned) or if the selected class variables have a small number of unique values. However, because PROC MEANS loads the result set into its internal structures, the memory requirements for the SAS process will be equivalent to what would have been required without in-database processing."
Switching from CLASS to BY processing would most likely reduce memory requirements, but your data would need to be properly sorted or indexed.
06-15-2018 07:09 PM
Post your code. Memory requirements vary depending on the number of unique values of your CLASS statement variables. Do you know how many unique values you have?
06-15-2018 09:18 PM
Not only should you post your code, but 100gb is meaningless in this context. We need to know the number of observations, and the number of variables that you are computing means for, and probably the number of BY groups.
06-18-2018 03:55 PM
thanks, but i read somewhere that using BY is more sufficient with large datasets, and so I am confused
And sometimes directing the output to a data set instead of the output/results window helps if you are generating lots of output in a table.
But since we haven't seen any actual code or log specific.
See this code:
proc sort data=sashelp.class out=work.class; by age; run; proc means data=work.class; by age; run; proc means data=work.class; class age; run;
Notice that the resulting displayed tables in the Results window take more "space". The results window tries to accumulate everything into memory to create the output tables. As a minimum the repeated header rows for each by groups adds to the memory requirement.
If you have a largish number of other variables coupled with many requested statistics and many values of the by variables you might be hitting the display memory limit.