BookmarkSubscribeRSS Feed
mnew
Calcite | Level 5
Greetings:

Does any of your experts know whether a where statement in data step for subsetting is processed differently from where = dataset option, when there is only one input dataset?

Here is the only quote from SAS help that I found.

"The WHERE statement applies to all input data sets, whereas the WHERE= data set option selects observations only from the data set for which it is specified."

My friend also read from somewhere that a where = option is more efficient as it only brings the qualified observations into PDV, while a where statement will read in all the observations then discard the unqualified data. What he mentioned also sounds like the difference between IF and Where statement for subsetting, so I'm a bit confused.

Thank you!
6 REPLIES 6
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
The WHERE statement and WHERE= data set option are equivalent, for example, when you have only one dataset referenced on the SET statement (or when using a PROC that accepts WHERE filtering, when loading input data into the PDV. I find the WHERE statement easier to code.

Scott Barry
SBBWorks, Inc.

Suggested Google advanced search argument, this topic / post:

+"where statement" +where +"data set option" site:sas.com Message was edited by: sbb
Robert_Bardos
Fluorite | Level 6
Additionally the WHERE statement can be augmented ( by using WHERE ALSO). A technique that is sometimes (or for some people) easier to read/understand than complex AND constructs.

Sample code snippet:
[pre]
where age gt 12 and age lt 15 ;
where also weight gt 60 and weight lt 80 ;
[/pre]
mnew
Calcite | Level 5
First time heard of where also. This is better than putting a group of conditions on a new line (my clumsy way to chop them up). Thank you!
mnew
Calcite | Level 5
This is very helpful. I often could not find what I was looking for on sas.com. Now I can try this search trick you taught me!
Doc_Duke
Rhodochrosite | Level 12
To follow-up on Scott's post, because the WHERE statement is a "declarative" statement, it is addressed at the data step compile time and would have the same effect as a WHERE =.

A subsetting IF statement is an "executable" statement and it's location in a data step can make a significant difference in how fast a program runs.

See
http://support.sas.com/documentation/cdl/en/lrdict/63026/HTML/default/viewer.htm#a001225397.htm

Doc Muhlbaier
Duke
mnew
Calcite | Level 5
Wish I had the web page you recommended handy when I was taking the programming one class. Thanks!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 826 views
  • 0 likes
  • 4 in conversation