BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Steelers_In_DC
Barite | Level 11

I'm searching a large dataset for a specific zip code.  I have this dataset sorted by prop_zip_code which is 9 digits, there is another field us_zip5 that it is not sorted on.  (next month I will sort on both)  If I add sortedby in the set statement does this option tell SAS to treat the search as if it is sorted by us_zip5?  I cannot find any clear documentation on that, any clarification or other suggestions for speed are appreciated:

 

data buckingham;
set mdj.monthly_1mm_zips_sort(sortedby=us_zip5);
where us_zip5 = '18912';
run;

 

Thanks,

 

Mark

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

I don't believe that Sortedby is going to have any speed effect on an input data set though it may have an impact on a BY statement. The option is used to set the indicator stored in the dataset for sort order and would be useful for an output set. Normally a data step has no guarantee of what the output order might be but if you have made sure the data is in a specific order setting the option on the output data set would speed things that examine sort order.

 

 

View solution in original post

3 REPLIES 3
ballardw
Super User

I don't believe that Sortedby is going to have any speed effect on an input data set though it may have an impact on a BY statement. The option is used to set the indicator stored in the dataset for sort order and would be useful for an output set. Normally a data step has no guarantee of what the output order might be but if you have made sure the data is in a specific order setting the option on the output data set would speed things that examine sort order.

 

 

Steelers_In_DC
Barite | Level 11

Thanks for the info.  It took an hour and two minutes to run.  I just added a by statement and am running it again.  I'll reply again if there is any improvement.

Astounding
PROC Star

Although SORTEDBY will not help, if your data set is in order, you are allowed to use a BY statement:

 

 

 

data buckingham;
set mdj.monthly_1mm_zips_sort;
where us_zip5 = '18912';

by us_zip5;
run;

 

It doesn't matter whether PROC SORT performed the sorting or not, it only matters that the observations are in order.

 

This in itself will not speed up the process, but it may set the stage for other experimentation.  For example, you could try:

 

data buckingham;
set mdj.monthly_1mm_zips_sort;
where us_zip5 = '18912';

by us_zip5;

output;

if last.us_zip5 then stop;
run;

 

 

Using the WHERE statement saves the software from reading observations into the PDV.  But the software still has to search through blocks of data (not read into the PDV) to check for additional "18912" observations.  So adding the STOP logic saves the software from having to search through remaining blocks of data.  The savings really depend on how far into the data "18912" appears, but you should be able to measure it and it might be considerable.

 

Let us know what you discover.

 

Good luck.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1928 views
  • 0 likes
  • 3 in conversation