BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
NKormanik
Barite | Level 11

Some of the websites out there talk of using the "most efficient" form for writing a program.  Please take a look at the following:

 

data SASData__N (keep = Combo_Plus i1_X i2_X Label1 nValue1 Label2 nValue2);
set Moments_Combined (where = (Label1='N'));
run;

 

I was very glad to simply get the above program to run.  Does it look "efficient" enough?  Should I attempt to try some other route?

 

Thanks!

 

Nicholas Kormanik

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Moving KEEP= to the SET statement will make the program run faster:

 

data SASData__N;
set Moments_Combined (where = (Label1='N') keep=Combo_Plus i1_X i2_X Label1 nValue1 Label2 nValue2);
run;

 

Assuming that there are other variables in the incoming data set, the SET statement no longer has to read in those other variables.

 

In most cases, WHERE runs faster than a subsetting IF.  The most important factors are the percentage of observations that meet the WHERE condition, and the number of variables in the data set.  If most observations have Label1='N', you could try this and compare the speed:

data SASData__N;
set Moments_Combined (keep=Combo_Plus i1_X i2_X Label1 nValue1 Label2 nValue2);
if Label1='N';
run;

 

It is likely to be slower, but running it is the sure way to find out.  

 

Finally, if your data sets are small, you might not be able to measure which technique is faster.  You would probably need a minimum of tens of thousands of observations to notice a difference (possibly more than that). 

View solution in original post

4 REPLIES 4
Reeza
Super User

The only efficiency here is adding the WHERE clause to your SET statement.

 

The KEEP on the output vs in the datastep doesn't make a difference AFAIK.

 

PS. Please post your code in a code block to make it easier to read. See the little running man icon on the editor. 

 

 

Astounding
PROC Star

Moving KEEP= to the SET statement will make the program run faster:

 

data SASData__N;
set Moments_Combined (where = (Label1='N') keep=Combo_Plus i1_X i2_X Label1 nValue1 Label2 nValue2);
run;

 

Assuming that there are other variables in the incoming data set, the SET statement no longer has to read in those other variables.

 

In most cases, WHERE runs faster than a subsetting IF.  The most important factors are the percentage of observations that meet the WHERE condition, and the number of variables in the data set.  If most observations have Label1='N', you could try this and compare the speed:

data SASData__N;
set Moments_Combined (keep=Combo_Plus i1_X i2_X Label1 nValue1 Label2 nValue2);
if Label1='N';
run;

 

It is likely to be slower, but running it is the sure way to find out.  

 

Finally, if your data sets are small, you might not be able to measure which technique is faster.  You would probably need a minimum of tens of thousands of observations to notice a difference (possibly more than that). 

NKormanik
Barite | Level 11

Thank you Astounding and Reeza.  So glad I asked, and you were so available.  I tried experimenting with other formats, but failed.  Great to have the answer.

 

Nicholas

 

Reeza
Super User

Moving Keep to SET only works if you're not calculating new variables which it appears you aren't in this case.

 

Another way to make this faster, if required, may be to use PROC COPY because it doesn't read the data set line by line but can copy it over in blocks. It doesn't look like you're doing any processing in the step.

 

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1180 views
  • 5 likes
  • 3 in conversation