Some of the websites out there talk of using the "most efficient" form for writing a program. Please take a look at the following:
data SASData__N (keep = Combo_Plus i1_X i2_X Label1 nValue1 Label2 nValue2);
set Moments_Combined (where = (Label1='N'));
run;
I was very glad to simply get the above program to run. Does it look "efficient" enough? Should I attempt to try some other route?
Thanks!
Nicholas Kormanik
Moving KEEP= to the SET statement will make the program run faster:
data SASData__N;
set Moments_Combined (where = (Label1='N') keep=Combo_Plus i1_X i2_X Label1 nValue1 Label2 nValue2);
run;
Assuming that there are other variables in the incoming data set, the SET statement no longer has to read in those other variables.
In most cases, WHERE runs faster than a subsetting IF. The most important factors are the percentage of observations that meet the WHERE condition, and the number of variables in the data set. If most observations have Label1='N', you could try this and compare the speed:
data SASData__N;
set Moments_Combined (keep=Combo_Plus i1_X i2_X Label1 nValue1 Label2 nValue2);
if Label1='N';
run;
It is likely to be slower, but running it is the sure way to find out.
Finally, if your data sets are small, you might not be able to measure which technique is faster. You would probably need a minimum of tens of thousands of observations to notice a difference (possibly more than that).
The only efficiency here is adding the WHERE clause to your SET statement.
The KEEP on the output vs in the datastep doesn't make a difference AFAIK.
PS. Please post your code in a code block to make it easier to read. See the little running man icon on the editor.
Moving KEEP= to the SET statement will make the program run faster:
data SASData__N;
set Moments_Combined (where = (Label1='N') keep=Combo_Plus i1_X i2_X Label1 nValue1 Label2 nValue2);
run;
Assuming that there are other variables in the incoming data set, the SET statement no longer has to read in those other variables.
In most cases, WHERE runs faster than a subsetting IF. The most important factors are the percentage of observations that meet the WHERE condition, and the number of variables in the data set. If most observations have Label1='N', you could try this and compare the speed:
data SASData__N;
set Moments_Combined (keep=Combo_Plus i1_X i2_X Label1 nValue1 Label2 nValue2);
if Label1='N';
run;
It is likely to be slower, but running it is the sure way to find out.
Finally, if your data sets are small, you might not be able to measure which technique is faster. You would probably need a minimum of tens of thousands of observations to notice a difference (possibly more than that).
Thank you Astounding and Reeza. So glad I asked, and you were so available. I tried experimenting with other formats, but failed. Great to have the answer.
Nicholas
Moving Keep to SET only works if you're not calculating new variables which it appears you aren't in this case.
Another way to make this faster, if required, may be to use PROC COPY because it doesn't read the data set line by line but can copy it over in blocks. It doesn't look like you're doing any processing in the step.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.