BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
DUBS
Fluorite | Level 6

I got question 13 from the practice exam because I filtered the data with the statement "where group in ('A','B');"  instead of "if group in ('A','B');" .  Can you please help me to understand the difference between the IF and WHERE statements and when each one should be used? 

1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

There are many cases where either WHERE or IF could be used, and the results would be the same.  There are some cases where the results would be different.  If you can share the text of the question, it might help us explain why using a WHERE might have been wrong for this example.

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

View solution in original post

5 REPLIES 5
antonbcristina
SAS Employee

I like to think of the subsetting WHERE statement like a pre-processing of the data, and the subsetting IF statement as a post-processing of the data. You can only filter using WHERE on variables that already exist in the Program Data Vector (PDV) at the beginning of the DATA step. To filter on any variables created in the DATA step, you must use IF. The other important thing to note is that the IF statement can only be used inside of the DATA step, whereas the WHERE statement can be used in both DATA and PROC steps. 

 

Tom
Super User Tom
Super User

Let's fix this misstatement:

You can only filter using WHERE on variables that already exist in the Program Data Vector (PDV)

You can only filter using WHERE on variables that already exist in the DATASET being read.

 

That is because the WHERE filters the data BEFORE it is read into the PDV.

 

antonbcristina
SAS Employee
That's right @Tom, thanks for clarifying!
Kurt_Bremser
Super User

WHERE is handled by the part of SAS reading the dataset(s), that's why you can (and have to) use SQL syntax, and can only use variables present in incoming datasets. It is also the reason why WHERE can be used in any procedure that reads data from datasets, without involvement of the DATA step compiler.

The subsetting IF is part of the DATA step, can use all variables in the PDV (incoming and newly created), but is not available anywhere else.

Quentin
Super User

There are many cases where either WHERE or IF could be used, and the results would be the same.  There are some cases where the results would be different.  If you can share the text of the question, it might help us explain why using a WHERE might have been wrong for this example.

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

LIBNAME 101

Follow along as SAS technical trainer Dominique Weatherspoon expertly answers all your questions about SAS Libraries.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 3665 views
  • 15 likes
  • 5 in conversation