options stimer;
data du;
do until(age=15);
set sashelp.class;
end;
run;
data ifst;
set sashelp.class;
if age=15 then output;
run;
This is likely faster:
data wh15;
set sashelp.class;
where age=15;
run;
This is because the WHERE statement exports the filtering to the data engine. Unlike your two alternatives, the data step never sees the unwanted observations, so never needs to populate the program data vector with them.
Of course you will need a lot larger dataset than sashelp.class to find any difference.
Also if you are doing a test of input speed, don't contaminate the test with (implied) output operations. Use data _null_, as in:
data _null_;
do until(age=15);
set sashelp.class;
end;
run;
data _null_;
set sashelp.class;
if age=15 then output;
run;
INCORRECT STATETENTS
Also note that both steps are not equivalent:
If age=15 doesn't exist in sashelp.class,
dataset du will contain the last observation of sashelp.class
but dataset ifst will be empty.
If age=15 occurs more than once in sashelp.class,
dataset du will contain the first such observation
but dataset ifst will contain all of them.
EDIT: See @Quentin's comment below. My statements above were INCORRECT. I should have tested. Sorry.
@PGStats , they looks equivalent to me. I don't think either of your statements about what happens if age=15 doesn't exist, or if there are multiple records with age=15, are correct.
1 2 data du; 3 do until(age=99); 4 set sashelp.class; 5 end; 6 run; NOTE: There were 19 observations read from the data set SASHELP.CLASS. NOTE: The data set WORK.DU has 0 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.04 seconds cpu time 0.00 seconds 7 8 9 data ifst; 10 set sashelp.class; 11 if age=99 then output; 12 run; NOTE: There were 19 observations read from the data set SASHELP.CLASS. NOTE: The data set WORK.IFST has 0 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 13 14 15 data du; 16 do until(age=15); 17 set sashelp.class; 18 end; 19 run; NOTE: There were 19 observations read from the data set SASHELP.CLASS. NOTE: The data set WORK.DU has 4 observations and 5 variables. NOTE: Compressing data set WORK.DU increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 20 21 22 data ifst; 23 set sashelp.class; 24 if age=15 then output; 25 run; NOTE: There were 19 observations read from the data set SASHELP.CLASS. NOTE: The data set WORK.IFST has 4 observations and 5 variables. NOTE: Compressing data set WORK.IFST increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
There might be a little extra overhead to setup the DO loop, but really both are reading the same observations and performing the same comparison of AGE to the constant 15.
It is probably going to be even better to use a WHERE statement, especially if your source dataset is some remote database since then SAS will not even transfer the other observations from the database to SAS.
Which one to use is based on WHY you are doing it and what the rest of the data step is actually doing.
You probably want to use the version that makes understanding, creating, validating, and maintaining the code the easiest.
In my experience, if you are doing simple, sequential DATA step reads, experimenting with DO loops is a waste of time - stick with just a SET statement as adding DO's will have minimal impact on performance. If you want to optimize your code focus on other areas, especially any really slow processes.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.