BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
NKormanik
Barite | Level 11

Some of the websites out there talk of using the "most efficient" form for writing a program.  Please take a look at the following:

 

data SASData__N (keep = Combo_Plus i1_X i2_X Label1 nValue1 Label2 nValue2);
set Moments_Combined (where = (Label1='N'));
run;

 

I was very glad to simply get the above program to run.  Does it look "efficient" enough?  Should I attempt to try some other route?

 

Thanks!

 

Nicholas Kormanik

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Moving KEEP= to the SET statement will make the program run faster:

 

data SASData__N;
set Moments_Combined (where = (Label1='N') keep=Combo_Plus i1_X i2_X Label1 nValue1 Label2 nValue2);
run;

 

Assuming that there are other variables in the incoming data set, the SET statement no longer has to read in those other variables.

 

In most cases, WHERE runs faster than a subsetting IF.  The most important factors are the percentage of observations that meet the WHERE condition, and the number of variables in the data set.  If most observations have Label1='N', you could try this and compare the speed:

data SASData__N;
set Moments_Combined (keep=Combo_Plus i1_X i2_X Label1 nValue1 Label2 nValue2);
if Label1='N';
run;

 

It is likely to be slower, but running it is the sure way to find out.  

 

Finally, if your data sets are small, you might not be able to measure which technique is faster.  You would probably need a minimum of tens of thousands of observations to notice a difference (possibly more than that). 

View solution in original post

4 REPLIES 4
Reeza
Super User

The only efficiency here is adding the WHERE clause to your SET statement.

 

The KEEP on the output vs in the datastep doesn't make a difference AFAIK.

 

PS. Please post your code in a code block to make it easier to read. See the little running man icon on the editor. 

 

 

Astounding
PROC Star

Moving KEEP= to the SET statement will make the program run faster:

 

data SASData__N;
set Moments_Combined (where = (Label1='N') keep=Combo_Plus i1_X i2_X Label1 nValue1 Label2 nValue2);
run;

 

Assuming that there are other variables in the incoming data set, the SET statement no longer has to read in those other variables.

 

In most cases, WHERE runs faster than a subsetting IF.  The most important factors are the percentage of observations that meet the WHERE condition, and the number of variables in the data set.  If most observations have Label1='N', you could try this and compare the speed:

data SASData__N;
set Moments_Combined (keep=Combo_Plus i1_X i2_X Label1 nValue1 Label2 nValue2);
if Label1='N';
run;

 

It is likely to be slower, but running it is the sure way to find out.  

 

Finally, if your data sets are small, you might not be able to measure which technique is faster.  You would probably need a minimum of tens of thousands of observations to notice a difference (possibly more than that). 

NKormanik
Barite | Level 11

Thank you Astounding and Reeza.  So glad I asked, and you were so available.  I tried experimenting with other formats, but failed.  Great to have the answer.

 

Nicholas

 

Reeza
Super User

Moving Keep to SET only works if you're not calculating new variables which it appears you aren't in this case.

 

Another way to make this faster, if required, may be to use PROC COPY because it doesn't read the data set line by line but can copy it over in blocks. It doesn't look like you're doing any processing in the step.

 

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 886 views
  • 5 likes
  • 3 in conversation