Help using Base SAS procedures

PROC MEANS with large dataset (100gb)

Reply
Contributor
Posts: 61

PROC MEANS with large dataset (100gb)

I am running a PROC MEANS on a large dataset (100gb) and keeps getting errors of insufficient memory. I read this article about in-database processing which could be my solution but do not really know how to implement it. Does anyone know how to deal with this issue?

http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a003331709.htm

Super User
Posts: 6,785

Re: PROC MEANS with large dataset (100gb)

Posted in reply to ducman1611

Running out of memory?  It's possible that in-database processing will help.  Assuming that you have in-database processing available, you would simply have to switch from PROC MEANS to PROC HPSUMMARY.  The syntax is pretty much the same.

 

It's also possible that you can control this without running in-database.  Show us the PROC MEANS step that you are trying to run.  Also (and this is unlikely if running in-database is even a possibility), is there a sorted order to your data?

PROC Star
Posts: 1,283

Re: PROC MEANS with large dataset (100gb)

Posted in reply to ducman1611

Where and how is this data set located?

 

Try specifying 

 

options sqlgeneration="dbms";

before your PROC MEANS run.

Esteemed Advisor
Posts: 5,540

Re: PROC MEANS with large dataset (100gb)

Posted in reply to ducman1611

The documentation does say:

 

"In-database processing can greatly reduce the volume of data transferred to the procedure if there are no class variables (one row is returned) or if the selected class variables have a small number of unique values. However, because PROC MEANS loads the result set into its internal structures, the memory requirements for the SAS process will be equivalent to what would have been required without in-database processing."

 

Switching from CLASS to BY processing would most likely reduce memory requirements, but your data would need to be properly sorted or indexed.

PG
Contributor
Posts: 61

Re: PROC MEANS with large dataset (100gb)

thanks, but i read somewhere that using BY is more sufficient with large datasets, and so I am confused

 

Esteemed Advisor
Posts: 5,540

Re: PROC MEANS with large dataset (100gb)

Posted in reply to ducman1611

BY processing is more efficient.

 

What are YOU doing?

PG
Super User
Posts: 3,926

Re: PROC MEANS with large dataset (100gb)

Posted in reply to ducman1611

Post your code. Memory requirements vary depending on the number of unique values of your CLASS statement variables. Do you know how many unique values you have?

Respected Advisor
Posts: 3,055

Re: PROC MEANS with large dataset (100gb)

Not only should you post your code, but 100gb is meaningless in this context. We need to know the number of observations, and the number of variables that you are computing means for, and probably the number of BY groups.

--
Paige Miller
Super User
Posts: 13,583

Re: PROC MEANS with large dataset (100gb)

Posted in reply to ducman1611

@ducman1611 wrote:

thanks, but i read somewhere that using BY is more sufficient with large datasets, and so I am confused

 


And sometimes directing the output to a data set instead of the output/results window helps if you are generating lots of output in a table.

But since we haven't seen any actual code or log specific.

 

See this code:

proc sort data=sashelp.class 
   out=work.class;
   by age;
run;

proc means data=work.class;
   by age;
run;

proc means data=work.class;
   class age;
run;

Notice that the resulting displayed tables in the Results window take more "space". The results window tries to accumulate everything into memory to create the output tables. As a minimum the repeated header rows for each by groups adds to the memory requirement.

 

If you have a largish number of other variables coupled with many requested statistics and many values of the by variables you might be hitting the display memory limit.

Ask a Question
Discussion stats
  • 8 replies
  • 201 views
  • 0 likes
  • 7 in conversation