DATA Step, Macro, Functions and more

"Efficient" form for SAS data step

Accepted Solution Solved
Reply
Regular Contributor
Posts: 212
Accepted Solution

"Efficient" form for SAS data step

[ Edited ]

Some of the websites out there talk of using the "most efficient" form for writing a program.  Please take a look at the following:

 

data SASData__N (keep = Combo_Plus i1_X i2_X Label1 nValue1 Label2 nValue2);
set Moments_Combined (where = (Label1='N'));
run;

 

I was very glad to simply get the above program to run.  Does it look "efficient" enough?  Should I attempt to try some other route?

 

Thanks!

 

Nicholas Kormanik

 


Accepted Solutions
Solution
‎06-22-2016 12:18 AM
Super User
Posts: 5,080

Re: "Efficient" form for SAS data step

Moving KEEP= to the SET statement will make the program run faster:

 

data SASData__N;
set Moments_Combined (where = (Label1='N') keep=Combo_Plus i1_X i2_X Label1 nValue1 Label2 nValue2);
run;

 

Assuming that there are other variables in the incoming data set, the SET statement no longer has to read in those other variables.

 

In most cases, WHERE runs faster than a subsetting IF.  The most important factors are the percentage of observations that meet the WHERE condition, and the number of variables in the data set.  If most observations have Label1='N', you could try this and compare the speed:

data SASData__N;
set Moments_Combined (keep=Combo_Plus i1_X i2_X Label1 nValue1 Label2 nValue2);
if Label1='N';
run;

 

It is likely to be slower, but running it is the sure way to find out.  

 

Finally, if your data sets are small, you might not be able to measure which technique is faster.  You would probably need a minimum of tens of thousands of observations to notice a difference (possibly more than that). 

View solution in original post


All Replies
Super User
Posts: 17,784

Re: "Efficient" form for SAS data step

[ Edited ]

The only efficiency here is adding the WHERE clause to your SET statement.

 

The KEEP on the output vs in the datastep doesn't make a difference AFAIK.

 

PS. Please post your code in a code block to make it easier to read. See the little running man icon on the editor. 

 

 

Solution
‎06-22-2016 12:18 AM
Super User
Posts: 5,080

Re: "Efficient" form for SAS data step

Moving KEEP= to the SET statement will make the program run faster:

 

data SASData__N;
set Moments_Combined (where = (Label1='N') keep=Combo_Plus i1_X i2_X Label1 nValue1 Label2 nValue2);
run;

 

Assuming that there are other variables in the incoming data set, the SET statement no longer has to read in those other variables.

 

In most cases, WHERE runs faster than a subsetting IF.  The most important factors are the percentage of observations that meet the WHERE condition, and the number of variables in the data set.  If most observations have Label1='N', you could try this and compare the speed:

data SASData__N;
set Moments_Combined (keep=Combo_Plus i1_X i2_X Label1 nValue1 Label2 nValue2);
if Label1='N';
run;

 

It is likely to be slower, but running it is the sure way to find out.  

 

Finally, if your data sets are small, you might not be able to measure which technique is faster.  You would probably need a minimum of tens of thousands of observations to notice a difference (possibly more than that). 

Regular Contributor
Posts: 212

Re: "Efficient" form for SAS data step

Thank you Astounding and Reeza.  So glad I asked, and you were so available.  I tried experimenting with other formats, but failed.  Great to have the answer.

 

Nicholas

 

Super User
Posts: 17,784

Re: "Efficient" form for SAS data step

Moving Keep to SET only works if you're not calculating new variables which it appears you aren't in this case.

 

Another way to make this faster, if required, may be to use PROC COPY because it doesn't read the data set line by line but can copy it over in blocks. It doesn't look like you're doing any processing in the step.

 

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 263 views
  • 5 likes
  • 3 in conversation