DATA Step, Macro, Functions and more

sortedby=

Accepted Solution Solved
Reply
Valued Guide
Posts: 860
Accepted Solution

sortedby=

I'm searching a large dataset for a specific zip code.  I have this dataset sorted by prop_zip_code which is 9 digits, there is another field us_zip5 that it is not sorted on.  (next month I will sort on both)  If I add sortedby in the set statement does this option tell SAS to treat the search as if it is sorted by us_zip5?  I cannot find any clear documentation on that, any clarification or other suggestions for speed are appreciated:

 

data buckingham;
set mdj.monthly_1mm_zips_sort(sortedby=us_zip5);
where us_zip5 = '18912';
run;

 

Thanks,

 

Mark


Accepted Solutions
Solution
‎10-02-2015 12:12 PM
Super User
Posts: 11,343

Re: sortedby=

Posted in reply to Steelers_In_DC

I don't believe that Sortedby is going to have any speed effect on an input data set though it may have an impact on a BY statement. The option is used to set the indicator stored in the dataset for sort order and would be useful for an output set. Normally a data step has no guarantee of what the output order might be but if you have made sure the data is in a specific order setting the option on the output data set would speed things that examine sort order.

 

 

View solution in original post


All Replies
Solution
‎10-02-2015 12:12 PM
Super User
Posts: 11,343

Re: sortedby=

Posted in reply to Steelers_In_DC

I don't believe that Sortedby is going to have any speed effect on an input data set though it may have an impact on a BY statement. The option is used to set the indicator stored in the dataset for sort order and would be useful for an output set. Normally a data step has no guarantee of what the output order might be but if you have made sure the data is in a specific order setting the option on the output data set would speed things that examine sort order.

 

 

Valued Guide
Posts: 860

Re: sortedby=

Thanks for the info.  It took an hour and two minutes to run.  I just added a by statement and am running it again.  I'll reply again if there is any improvement.

Super User
Posts: 5,513

Re: sortedby=

Posted in reply to Steelers_In_DC

Although SORTEDBY will not help, if your data set is in order, you are allowed to use a BY statement:

 

 

 

data buckingham;
set mdj.monthly_1mm_zips_sort;
where us_zip5 = '18912';

by us_zip5;
run;

 

It doesn't matter whether PROC SORT performed the sorting or not, it only matters that the observations are in order.

 

This in itself will not speed up the process, but it may set the stage for other experimentation.  For example, you could try:

 

data buckingham;
set mdj.monthly_1mm_zips_sort;
where us_zip5 = '18912';

by us_zip5;

output;

if last.us_zip5 then stop;
run;

 

 

Using the WHERE statement saves the software from reading observations into the PDV.  But the software still has to search through blocks of data (not read into the PDV) to check for additional "18912" observations.  So adding the STOP logic saves the software from having to search through remaining blocks of data.  The savings really depend on how far into the data "18912" appears, but you should be able to measure it and it might be considerable.

 

Let us know what you discover.

 

Good luck.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 503 views
  • 0 likes
  • 3 in conversation