BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
DUBS
Fluorite | Level 6

I got question 13 from the practice exam because I filtered the data with the statement "where group in ('A','B');"  instead of "if group in ('A','B');" .  Can you please help me to understand the difference between the IF and WHERE statements and when each one should be used? 

1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

There are many cases where either WHERE or IF could be used, and the results would be the same.  There are some cases where the results would be different.  If you can share the text of the question, it might help us explain why using a WHERE might have been wrong for this example.

BASUG is hosting free webinars Next up: Mike Raithel presenting on validating data files on Wednesday July 17. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

View solution in original post

5 REPLIES 5
antonbcristina
SAS Employee

I like to think of the subsetting WHERE statement like a pre-processing of the data, and the subsetting IF statement as a post-processing of the data. You can only filter using WHERE on variables that already exist in the Program Data Vector (PDV) at the beginning of the DATA step. To filter on any variables created in the DATA step, you must use IF. The other important thing to note is that the IF statement can only be used inside of the DATA step, whereas the WHERE statement can be used in both DATA and PROC steps. 

 

Tom
Super User Tom
Super User

Let's fix this misstatement:

You can only filter using WHERE on variables that already exist in the Program Data Vector (PDV)

You can only filter using WHERE on variables that already exist in the DATASET being read.

 

That is because the WHERE filters the data BEFORE it is read into the PDV.

 

antonbcristina
SAS Employee
That's right @Tom, thanks for clarifying!
Kurt_Bremser
Super User

WHERE is handled by the part of SAS reading the dataset(s), that's why you can (and have to) use SQL syntax, and can only use variables present in incoming datasets. It is also the reason why WHERE can be used in any procedure that reads data from datasets, without involvement of the DATA step compiler.

The subsetting IF is part of the DATA step, can use all variables in the PDV (incoming and newly created), but is not available anywhere else.

Quentin
Super User

There are many cases where either WHERE or IF could be used, and the results would be the same.  There are some cases where the results would be different.  If you can share the text of the question, it might help us explain why using a WHERE might have been wrong for this example.

BASUG is hosting free webinars Next up: Mike Raithel presenting on validating data files on Wednesday July 17. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
LIBNAME 101

Follow along as SAS technical trainer Dominique Weatherspoon expertly answers all your questions about SAS Libraries.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 4004 views
  • 15 likes
  • 5 in conversation