BookmarkSubscribeRSS Feed
Haydn
Quartz | Level 8

I have the below code which will sort by "Ref" and in the following datastep I use if first.Ref to return a Unique count.

My question is. Is there a better more efficient way to do the proc sort or another way altogether to achieve what I doing.

 

The Proc Sort takes a while to run as I have over 17 million rows in the data I'm using.

 

 

proc sort data=TestI;
	by Ref;
run;


data TestII;
	set TestI;
	by  Ref;
	if first. Ref then Unique_Count=1;
	if Balance >0 then IDB=Ref; 
	if Balance >0 then Balance_Count =1;

run;;
4 REPLIES 4
Kurt_Bremser
Super User

Bottom line: proc sort is the quickest way to get a dataset sorted in SAS.

But there are some tunable things.

 

Run the sort with

options fullstimer;

so you get a clear picture if you're running out of CPU power or suffer form I/O waits.

 

Separate your WORK and utility location (use the utilloc= option in your configuration file), so that they sit on physically separate volumes, if you use disks internal to the server.

If your disks sit on a SAN, check with the SAN admin for possible hostspots, or other ways to improve storage performance.

 

If you work with a compressed dataset (and a considerable compression rate), try the tagsort option in the proc sort statement. This prevents writing of an uncompressed utility file.

 

Some summary-related things can be done easier with proc summary/means. Depending on the cardinality of Ref, you might be better off with using proc summary with

class Ref;

avoiding the sort altogether. It's just that the summary structures for all unique values of Ref need to fit into available memory.

 

Astounding
PROC Star

A couple of things to consider ...

 

First, where are you headed with the results?  Depending on the planned analysis steps, there may be another way to get there.  It's easy enough to add to a subsequent analysis step:  where balance > 0;

 

Second, does the subsequent analysis require sorted data, or does it only require that the three new variables exist?  SQL (or even PROC FREQ) can get a count of the number of distinct values of REF.

 

A good hash programmer (not me) should be able to create the three variables without sorting.  Well, any SAS programmer could create two out of the three.  But Unique_Count could be created by:

 

Create a HASH table based on REF.  Along the way, for each observation check whether that REF value already exists in the hash table.  If not, give Unique_Count a value of 1, and add that REF value to the hash table.

Ksharp
Super User

Here are some option to fast PROC SORT.

 

options bufno=100 bufsize=128k;
proc sort data=sashelp.class out=class sortsize=max noequals;
by sex; run;

Another fast way is spliting the big table into smalle table ,and combine them together.

You can write a macro to automatic it.

data F M;
 set sashelp.class;
if sex='F' then output F;
 else if sex='M' then output M;
run;

data want;
 set F M;
run;
DaveHorne
SAS Employee

Depending on your platform, there are also host specific options that may help.  For example, on Windows Server I've used SGIO (for mulit-gigabyte sized SAS data sets) and MEMLIB (to process the data in memory).

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1326 views
  • 4 likes
  • 5 in conversation