BookmarkSubscribeRSS Feed
Suryak
Calcite | Level 5
options stimer;

data du;
do until(age=15);
set sashelp.class;
end;
run;


data ifst;
set sashelp.class;
if age=15 then output;
run;
5 REPLIES 5
mkeintz
PROC Star

This is likely faster:

 

data wh15;
  set sashelp.class;
  where age=15;
run;

This is because the WHERE statement exports the filtering to the data engine.  Unlike your two alternatives, the data step never sees the unwanted observations, so never needs to populate the program data vector with them.

 

Of course you will need a lot larger dataset than sashelp.class to find any difference.

 

Also if you are doing a test of input speed, don't contaminate the test with (implied) output operations.  Use data _null_, as in:

 

data _null_;
do until(age=15);
set sashelp.class;
end;
run;


data _null_;
set sashelp.class;
if age=15 then output;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PGStats
Opal | Level 21

INCORRECT STATETENTS

 

Also note that both steps are not equivalent:

 

If age=15 doesn't exist in sashelp.class,
dataset du will contain the last observation of sashelp.class
but dataset ifst will be empty.

 

If age=15 occurs more than once in sashelp.class,
dataset du will contain the first such observation
but dataset ifst will contain all of them.

 

EDIT: See @Quentin's comment below. My statements above were INCORRECT. I should have tested. Sorry.

PG
Quentin
Super User

@PGStats , they looks equivalent to me.  I don't think either of your statements about what happens if age=15 doesn't exist, or if there are multiple records with age=15, are correct.

 

1
2    data du;
3    do until(age=99);
4    set sashelp.class;
5    end;
6    run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.DU has 0 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.04 seconds
      cpu time            0.00 seconds


7
8
9    data ifst;
10   set sashelp.class;
11   if age=99 then output;
12   run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.IFST has 0 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


13
14
15   data du;
16   do until(age=15);
17   set sashelp.class;
18   end;
19   run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.DU has 4 observations and 5 variables.
NOTE: Compressing data set WORK.DU increased size by 100.00 percent.
      Compressed is 2 pages; un-compressed would require 1 pages.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


20
21
22   data ifst;
23   set sashelp.class;
24   if age=15 then output;
25   run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.IFST has 4 observations and 5 variables.
NOTE: Compressing data set WORK.IFST increased size by 100.00 percent.
      Compressed is 2 pages; un-compressed would require 1 pages.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

 

Tom
Super User Tom
Super User

There might be a little extra overhead to setup the DO loop, but really both are reading the same observations and performing the same comparison of AGE to the constant 15. 

It is probably going to be even better to use a WHERE statement, especially if your source dataset is some remote database since then SAS will not even transfer the other observations from the database to SAS.

 

Which one to use is based on WHY you are doing it and what the rest of the data step is actually doing.

You probably want to use the version that makes understanding, creating, validating, and maintaining the code the easiest.  

 

SASKiwi
PROC Star

In my experience, if you are doing simple, sequential DATA step reads, experimenting with DO loops is a waste of time - stick with just a SET statement as adding DO's will have minimal impact on performance. If you want to optimize your code focus on other areas, especially any really slow processes.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1298 views
  • 2 likes
  • 6 in conversation