The two code examples you provide should give identical results, though WHERE may be faster to run. However there are many cases where WHERE and subsetting IF do not give identical output. Compare: data test; length x $5; input x; cards; One Two Three Four Five ; run; data subset_if; set test; sequence_number=_n_; previous_x=lag(x); if not(x="Three"); run; data subset_where; set test; sequence_number=_n_; previous_x=lag(x); where not(x="Three"); run; In creating SUBSET_IF, we read in and process all lines from TEST before dropping the third observation. Even though the third observation isn't output directly, it still affects what is output: sequence_number goes 1,2,4,5 and previous_x goes " ", "One", "Three", "Four". In creating SUBSET_WHERE, the observation with x="Three" is deleted at an earlier stage, and there's no sign in the output that it ever existed.
... View more