BookmarkSubscribeRSS Feed
caveman529
Calcite | Level 5

I have a large dataset that requires sorting while using proc sql.  SAS run out of disk space always.  If I use proc sort tagsort, things are much easier.  But is there such an option in the proc sql to force sas use tagsort?  Also, can I sort before using proc sql so that proc sql won't have to do sorting again?  Thanks!

7 REPLIES 7
Tom
Super User Tom
Super User

Could you create an index on the BIG dataset?

caveman529
Calcite | Level 5

Hi, Tom.  Thank you for your reply!

I summarize the data using proc sql by different "level" of data depending on situation.  For example, I have the following variables in my dataset

key1 key2 key3 key4 var1 var2 var3 var4

I sometimes have to group and sort by key1 and key2.  Sometimes, I have to group and sort by key3 and key4.  Sometimes, just by key1. 

Does creating index with key1 to key4 help in this situation.  Is proc sql intelligent enough to know that there is an index it can use?  Thank you!

I also heard that proc summary doesn't require sorting to do the job.  Am I correct on this?  Thanks!

Tom
Super User Tom
Super User

PROC SUMMARY does not require that CLASS variables be sorted.  But it will require enough memory to store all of the bins.

proc summary data=have nway ;

  class key1 key2 ;

  var var1 var2 ;

  output out=want sum= ;

run;

will create the sums of VAR1 and VAR2 for all combinations of KEY1 crossed with KEY2 .

You can also create many other combinations of the class variables.

caveman529
Calcite | Level 5

I plan to summarize the data at different classes.  Is creating an index for all the key variables useful?  For example, I have:

key1 key2 key3    var1 var2 var3

I use proc sql to summarize by key1 sometimes and by key1 and key3 sometime.  Can I create a index for key1 - key3?  Is the proc sql intelligent enough to know that i have created index on key1 - key3 already and don't sort the data?

Alternatively, can I create index for each situation?  For example, one index for key1, one index for key1 and key3, ... etc.  When I summarize the data using proc sql under different class (by var1, var3 vs. by var1), is the proc sql clever enough to know which index to use?  Thank you!

Tom
Super User Tom
Super User

Index might not help if you are summarizing all of the data.  Would help if you are applying WHERE clause.

Also index might help if you have a lot of levels as you could then use PROC SUMMARY with a BY statement without running out of memory.

proc summary data=have nway ;

by key1 ;

class key2 ;

var var1 var2 ;

output out=want sum=;

run;

caveman529
Calcite | Level 5

I'll give that method a try.

The strange things is that the temporary/scratch drive has over 3 TB in space.  But when I see the log, I saw the following:

ERROR: Error occurred while reading from temporary sort file.

This seem to be different from insufficient space error.  Before this failure, the program was able to successfully complete a much larger sorting and summarizing.  I did a quick search on the Google and find nothing.  Thanks!

Tom
Super User Tom
Super User

Report it to SAS support.  Check the file structure used by the disk, perhaps it has problems with large files?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2102 views
  • 6 likes
  • 2 in conversation