BookmarkSubscribeRSS Feed
ybz12003
Rhodochrosite | Level 12

Hi experts,

 

I would like to remove the PROC SORT steps below to eliminate my run time in SAS; the three sorting steps take hours and hours to complete with humongous datasets, test1 and test2.   PROC SQL might be one way to approach this cause it doesn't need to sort to merge.  Please let me know if there is any alternative path to get the result.  Thank you.

 

proc sort data=Test1 tagsort; by specimentype; 
proc sort data=Test2 out=spec_delete (where=(Keep_spec="No"))tagsort; by specimentype;
proc sort data=Test2 out=spec_keep (where=(Keep_spec="Yes"))tagsort; by specimentype;
quit;

data x1_spec spec_new other_spec;
	merge Test1 (in=a) spec_delete (in=b) spec_keep (in=c);
	by specimentype;

	if (a=1 and b=1) or (a=0 and (b=1 or c=1)) or specimentype = " " then delete; 
	else if a=1 and c=1 then output x1_spec;
	else if a=1 /*and b=0 and c=0*/ then output spec_new;
	else output other_spec;

run;
4 REPLIES 4
Patrick
Opal | Level 21

SQL will sort implicitly for joins and though won't perform better but depending on your SQL potentially even worse.

There is certainly no need to sort your table Test2 twice. But it would be worth to have the where clause with the exclusion (specimentype ne " ") on the source and not the target table prior to sorting.

Not sure that tagsort will improve things but make sure the sort is multithreaded.

 

What really avoids any sorting are data step hash lookups. What are the volumes of your tables Test1 and Test2? and what's the relationship between the tables when joined over key specimentype (1:1, 1:m)?

Can you share a proc contents of these tables(with only the columns that you need in x1_spec and spec_new).

And how much memory do you have? 

 

Looking through your logic I believe below code would return the exactly same result (untested of course as there is no sample data).

I also believe that in your code the other case for table other_spec will never apply and though this table would always have zero rows.

proc sort data=Test1(where=(specimentype ne " ")) out=test1_sorted; 
  by specimentype; 
run;
proc sort data=Test2(where=(Keep_spec="Yes" and specimentype ne " ")) out=test2_sorted ; 
  by specimentype;
run;

data x1_spec spec_new;
	merge Test1_sorted (in=a) test2_sorted (in=b);
	by specimentype;

	if a=1; 

  if b=1 then output x1_spec;
  else output spec_new;
run;

Now if the rows of test1_sorted or test2_sorted can fit into memory then we can get rid of sorting and use a single data step with a hash lookup table.

SASKiwi
PROC Star

TAGSORT slows down sorting and is usually used only if you are short of disk space - is that the case? Try removing TAGSORT. Also you create three datasets in your DATA step but PROC SQL can only create one so I doubt SQL will improve processing time if you need to run three CREATE TABLE steps.

Patrick
Opal | Level 21

To add to my earlier post: If sort memory is the bottleneck and not throughput to the source data or disk space for utility files then you could try and increase the sort memory. 

To see what memory you've got allocated run below code (please share the result from the log as this also tells us how much memory you've got available for loading a hash table).

proc options group=memory;
run;

 

If you want to increase the sort memory then you could use code as below

%let sv_sortsize=%sysfunc(getoption(sortsize,keyword));
options sortsize=max;
/* your proc sorts */
options sortsize=&sv_sortsize;

 

Kurt_Bremser
Super User

It is a mistake to think that PROC SQL does not need sorting. It just does the sorting "under the hood", and often in a less efficient way than PROC SORT.

TAGSORT only makes sense when you are running out of space in WORK (or UTILLOC), or when you need to sort heavily compressed datasets.

Maxim 3: Know Your Data.

Decide which variables are needed for your final result, and keep only those by using dataset options for the input datasets in PROC SORT. Which means you may also need the OUT= option in your sort of test1.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 783 views
  • 0 likes
  • 4 in conversation