@data_null__: I don''t think so. In the test below, the difference in the run times for the two cases are within the margin of confounding variations dictated by competing processes other than SAS. On the other hand, in both cases there's a great deal of difference between whether the subsetting condition is true or false because if true, the record has to be moved from the buffer - and otherwise it has not. Also, IF evaluates this simple condition about 3 times faster than WHERE does it in the buffer, which is why IF works much faster if this advantage is not offset by not having to move many records from the buffer (if WHERE evaluates the condition false). The advantage of IF grows with the complexity of the condition to be evaluated. Conversely, the advantage of WHERE grows with the relative number of unmoved records plus the record length, since the longer the record, the more work it takes to move it. Not surprisingly, IF performs about the same regardless of whether the condition is true or false, as it works only after every record has been already moved, anyway. 1 data sex ;
2 retain sex "F" ;
3 do _n_ = 1 to 1E8 ;
4 output ;
5 end ;
6 run ;
NOTE: The data set WORK.SEX has 100000000 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 3.22 seconds
7 data _null_ ;
8 set sex (where = (sex eq "F")) ;
9 run ;
NOTE: There were 100000000 observations read from the data set WORK.SEX.
WHERE sex='F';
NOTE: DATA statement used (Total process time):
real time 9.38 seconds
10 data _null_ ;
11 set sex ; where sex eq "F" ;
12 run ;
NOTE: There were 100000000 observations read from the data set WORK.SEX.
WHERE sex='F';
NOTE: DATA statement used (Total process time):
real time 9.48 seconds
13 data _null_ ;
14 set sex (where = (sex ne "F")) ;
15 run ;
NOTE: There were 0 observations read from the data set WORK.SEX.
WHERE sex not = 'F';
NOTE: DATA statement used (Total process time):
real time 1.85 seconds
cpu time 1.85 seconds
16 data _null_ ;
17 set sex ; where sex ne "F" ;
18 run ;
NOTE: There were 0 observations read from the data set WORK.SEX.
WHERE sex not = 'F';
NOTE: DATA statement used (Total process time):
real time 1.86 seconds
19 data _null_ ;
20 set sex ; if sex eq "F" ;
21 run ;
NOTE: There were 100000000 observations read from the data set WORK.SEX.
NOTE: DATA statement used (Total process time):
real time 3.52 seconds
cpu time 3.52 seconds
22 data _null_ ;
23 set sex ; if sex ne "F" ;
24 run ;
NOTE: There were 100000000 observations read from the data set WORK.SEX.
NOTE: DATA statement used (Total process time):
real time 3.47 seconds I've also repeated the test for 100 million records, and the results are, relatively speaking, the same. I recall many spears broken over WHERE over IF on SAS-L of yore and elsewhere; yet in the end, it's merely the balance between the time needed to evaluate a subsetting condition (IF is faster than WHERE the more, the more complex the condition is) and the amount of work needed to move stuff from the buffer (depends on how often WHERE evaluates the condition false and the record length). Paul D.
... View more