BookmarkSubscribeRSS Feed
mnew
Calcite | Level 5
Greetings:

Does any of your experts know whether a where statement in data step for subsetting is processed differently from where = dataset option, when there is only one input dataset?

Here is the only quote from SAS help that I found.

"The WHERE statement applies to all input data sets, whereas the WHERE= data set option selects observations only from the data set for which it is specified."

My friend also read from somewhere that a where = option is more efficient as it only brings the qualified observations into PDV, while a where statement will read in all the observations then discard the unqualified data. What he mentioned also sounds like the difference between IF and Where statement for subsetting, so I'm a bit confused.

Thank you!
6 REPLIES 6
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
The WHERE statement and WHERE= data set option are equivalent, for example, when you have only one dataset referenced on the SET statement (or when using a PROC that accepts WHERE filtering, when loading input data into the PDV. I find the WHERE statement easier to code.

Scott Barry
SBBWorks, Inc.

Suggested Google advanced search argument, this topic / post:

+"where statement" +where +"data set option" site:sas.com Message was edited by: sbb
Robert_Bardos
Fluorite | Level 6
Additionally the WHERE statement can be augmented ( by using WHERE ALSO). A technique that is sometimes (or for some people) easier to read/understand than complex AND constructs.

Sample code snippet:
[pre]
where age gt 12 and age lt 15 ;
where also weight gt 60 and weight lt 80 ;
[/pre]
mnew
Calcite | Level 5
First time heard of where also. This is better than putting a group of conditions on a new line (my clumsy way to chop them up). Thank you!
mnew
Calcite | Level 5
This is very helpful. I often could not find what I was looking for on sas.com. Now I can try this search trick you taught me!
Doc_Duke
Rhodochrosite | Level 12
To follow-up on Scott's post, because the WHERE statement is a "declarative" statement, it is addressed at the data step compile time and would have the same effect as a WHERE =.

A subsetting IF statement is an "executable" statement and it's location in a data step can make a significant difference in how fast a program runs.

See
http://support.sas.com/documentation/cdl/en/lrdict/63026/HTML/default/viewer.htm#a001225397.htm

Doc Muhlbaier
Duke
mnew
Calcite | Level 5
Wish I had the web page you recommended handy when I was taking the programming one class. Thanks!
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2304 views
  • 0 likes
  • 4 in conversation