Error using proc SQL in a very large dataset

Reply
Occasional Contributor
Posts: 17

Error using proc SQL in a very large dataset

I am using proc sql with the group by option in a large dataset (more than 1TB) to calculate means and get the following error:

"error occurred while reading from temporary sort file"

 

I know I have enough space on the temporary folder (approximately 36TB free) to generate the new dataset. I have extended the memsize and sortsize options, and also tried using the option extendobscounter=yes for the library (although it's probably irrelevant for 64-bit environment).

 

Thanks a lot in advance

Grand Advisor
Posts: 17,333

Re: Error using proc SQL in a very large dataset

Can you use one of the SAS procs directly? Proc Means?
Occasional Contributor
Posts: 17

Re: Error using proc SQL in a very large dataset

[ Edited ]

Thanks for your suggestion, I'll check if proc means can get me where I want. My concerns is that I'm calculating means under specific conditions, so I'll need an extra data step afterwards to make adjustments. The generated file is approximately 5TB, and I was trying to minimize the steps as much as possible. 

 

Do you know though why SAS gives this error even though the server has both the power and enough space to do this using proc sql?

 

Thanks again.

 

 

Respected Advisor
Posts: 3,063

Re: Error using proc SQL in a very large dataset

Can you post your SQL code so we can get a better idea what you are doing?

 

Also what version of SAS are you using? I've had some issues with SQL in SAS 9.4 M2 although not with tables as big as yours.

Occasional Contributor
Posts: 17

Re: Error using proc SQL in a very large dataset

Using SAS 9.4 1M3. The code is a simple sql like this:
proc sql;
create table data_want as
select distinct *, case when mean(pdw_ij) ^= 0 then pdw_ij/mean(pdw_ij) else 0 end as pwcorr_ij_pc
/* plus more cases like the one above */
from data_have
group by cust_id1, cust_id2
;
quit;
Grand Advisor
Posts: 10,210

Re: Error using proc SQL in a very large dataset

 
Esteemed Advisor
Posts: 6,646

Re: Error using proc SQL in a very large dataset

select distinct *

forces a sort by ALL variables to check for distinct values. This needs LOTS of disk space.

If your original file was stored with the compress option, utility files will be a multiple in size.

 

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Grand Advisor
Posts: 17,333

Re: Error using proc SQL in a very large dataset

Why is the generated file bigger than the source file?If you're summarizing I assume it would be less...in fact, only the variables you really need and it should be way less.

Occasional Contributor
Posts: 17

Re: Error using proc SQL in a very large dataset

This is because I do not collapse the data. I need to keep it for subsequent grouping in the next step (in which observations do drop, but not yet)
Grand Advisor
Posts: 17,333

Re: Error using proc SQL in a very large dataset

I think it's an issue of RAM not disk space per se. 

Esteemed Advisor
Posts: 6,646

Re: Error using proc SQL in a very large dataset


ioannis wrote:

Do you know though why SAS gives this error even though the server has both the power and enough space to do this using proc sql?

 


You get this error because you do not have the space and resources. The fact that you have XXX TB free before the SQL step does not preclude the step running out of space. So, if you have such problems, open a commandline connection (SSH) to the server and use the OS tools to monitor disk space and CPU/RAM usage while that particular step is running. You may even search for your temporary WORK directory and watch the files grow there.

You may run into limitations caused by the filesystem architecture (maximum filesize being less than available diskspace), by limits for the specific user (UNIXens can set a maximum filesize per user), or by quotas set for the user in the filesystem.

Or you simply need more space for the SQL utility file(s) than is available. SQL is a notorious resource hog.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Valued Guide
Posts: 854

Re: Error using proc SQL in a very large dataset

This is a remedial QC test, but have you pulled a smaller size (obs=100) and run the same code to make sure it works, proving that it's the size of the data that's causing the issue?

Occasional Contributor
Posts: 17

Re: Error using proc SQL in a very large dataset

I tried with half the sample and it worked fine. Seems like the problem comes up when I use a large sample

Occasional Contributor
Posts: 17

Re: Error using proc SQL in a very large dataset

Answer I got from SAS (for those interested)

[response from sas starts here]

There is an open defect with R&D to address this issue.  However, at this time there is no fix available.  Based on recent information added to the defect, it appears that the problem has been identified and a fix should be available with the next release of SAS.

 

The only current known workarounds to this issue are listed below:

 

1.  Presort the data with a Proc Sort in the order PROC SQL would most likely use.  This is most likely going to be a sort on the key columns if you are joining tables in the PROC SQL query.  

2.  Break the data sets up into smaller data sets and rerun the query.

3.  Try a Data step merge, however this could conceivably throw the same error.

 

 

Grand Advisor
Posts: 9,576

Re: Error using proc SQL in a very large dataset

Can you use PROC FEDSQL ?

Or make an index for those group variables before run your SQL.

 

Ask a Question
Discussion stats
  • 23 replies
  • 744 views
  • 2 likes
  • 8 in conversation