Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Learn SAS
- /
- SAS Certification
- /
- WHERE versus IF IN subsetting unexpectedly different

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

☑ This topic is **solved**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 07-28-2023 12:33 PM
(853 views)

I understand that WHERE and IF IN can produce different results depending on where each statement is used. However, one of the questions on the (official) SAS 9.4 Certification Practice Exam has me puzzled. The question read:

This project will use data set

cert.input36. At any time, you may save your program asprogram36incert\programs.

Write a SAS program that will clean the data incert.input36as follows:

- Step 1:

- create a temporary data set,
cleandata36.- In this data set, convert all
groupvalues to upper case.- Then keep only observations with
groupequal to 'A' or 'B'.

- Step 2:

- Determine the MEDIAN value for the
Kilogramsvariable for eachgroup(A,B) in thecleandata36data set. Round MEDIAN to the nearest whole number.

- Step 3:

- create
results.output36fromcleandata36- Ensure that all values for variable
Kilogramsare between 40 and 200, inclusively.- If the value is missing or out of range, replace the value with the MEDIAN
Kilogramsvalue for the respectivegroup(A,B) calculated in step 2.

Run the program and use the results to answer the next 3 questions.

Their suggested code:

```
data work.cleandata36;
set cert.input36;
group=upcase(group);
if group in ('A','B');
run;
proc means data=work.cleandata36 median;
class group;
var kilograms;
run;
data results.output36;
set cleandata36;
if Kilograms < 40 or Kilograms > 200 then do;
if group='A' then kilograms=79;
else kilograms=89;
end;
run;
proc contents data=results.output36;
run;
```

My code:

```
data work.cleandata36;
set cert.input36;
group = upcase(group);
where group = 'A' or group = 'B';
run;
proc means data=work.cleandata36 mean min max median maxdec=1;
class group;
run;
data results.output36;
set cleandata36;
if kilograms < 40 or kilograms > 200 then do;
if group = 'A' then kilograms = 79;
else kilograms = 89;
end;
if kilograms =. then do;
if group = 'A' then kilograms = 79;
else kilograms = 89;
end;
run;
proc print data = results.output36;
run;
proc means data=results.output36 mean min max maxdec=1;
class group;
run;
```

I've attached input36.sas7bdate in case anyone wanted to test.

When asked how many observations are in results.output36, my code (with the WHERE statement) resulted in 4897 observations. This was wrong, as their code (using IF IN) resulted in 4992. I tested my code using IF IN instead of WHERE and got the same result they did, 4992. But I'm not sure why. It would seem that both statements would subset to the same number of observations.

Also, their code is missing the portion that accounts for the instruction to **replace missing values** with the median -- I'm surprised they wouldn't include it in their suggested answer.

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

ah! It took me few times rereading the documentation, but I think light finally dawned:

WHERE subsetted the data before UPCASE happened ("The WHERE statement selects observations before they are brought into the program data vector. "), and so even though I changed all values of "group" to upper case, it happened after WHERE selected its observations. Thus, my WHERE subset is smaller because it didn't have access to all the instances of "group" where 'a' and 'b' had been converted to upper case.

12 REPLIES 12

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

IF and WHERE in a data step work differently. This is explained in the documentation: https://documentation.sas.com/doc/en/pgmmvacdc/9.4/lestmtsref/p1cxl8ifdt8u0gn12wqbji8o5fq1.htm#p08fp...

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@TheresaM Here is a bit more of a hint: Notice this bullet from step 1:

*In this data set, convert all group values to upper case.*

Then learn the difference between how IF and WHERE process data using the resource from @PaigeMiller and I think you will see why your code produces a different result.

SAS How To Videos:youtube.com

Protecting the integrity of your SAS credential

Protecting the integrity of your SAS credential

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you, @PaigeMiller and @Mark2010 . I'll take some more time to understand the documentation!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

One way to explore this is to make a small test dataset, like:

```
data play ;
input group $1. ;
cards ;
A
B
C
D
.
a
b
c
d
;
```

Then you could use that as an input to your WHERE data step and your subsetting IF data step and see how the results differ.

Check out the Boston Area SAS Users Group (BASUG) video archives: https://www.basug.org/videos.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

ah! It took me few times rereading the documentation, but I think light finally dawned:

WHERE subsetted the data before UPCASE happened ("The WHERE statement selects observations before they are brought into the program data vector. "), and so even though I changed all values of "group" to upper case, it happened after WHERE selected its observations. Thus, my WHERE subset is smaller because it didn't have access to all the instances of "group" where 'a' and 'b' had been converted to upper case.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Bingo. Congrats on working through it.

SAS How To Videos:youtube.com

Protecting the integrity of your SAS credential

Protecting the integrity of your SAS credential

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You got it. So you could use WHERE, like:

`where upcase(group) = 'A' or upcase(group) = 'B';`

or:

`where upcase(group) IN ('A' , 'B');`

And it would select the same records as your IF. You would still need to use UPCASE inside the data step to upcase the values.

Check out the Boston Area SAS Users Group (BASUG) video archives: https://www.basug.org/videos.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@TheresaM I suggest you select your own answer as the solution. @Mark2010 gave you a good hint, but you came up with the solution and wrote it as an answer. You also did a good job of phrasing the question clearly, and showing sample code. So if anyone finds it in the future, it would be good for them to see your answer as the solution. You should be able to click on Mark's answer and see an option to "unmark as correct" or something like that.

Check out the Boston Area SAS Users Group (BASUG) video archives: https://www.basug.org/videos.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi @Quentin

Thank you for this. This actually made me really happy to read! Everything I thought I understood became much more of a struggle as I actually try to put it in practice (or answer practice exam questions). So, hearing that I actually "came up with a solution" gives me hope. 😊🎉

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi Paige,

I thought I understood all those differences as outlined in the page you reference. However, the way I read all that -- it would seem that my use of WHERE would produce the larger subset (since it selects observations before they're brought into PDV), yet somehow IF IN subsetted a larger number of observations.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I know this is a (very) lagged response. But ...

I find it most useful to regard the WHERE statement (or where option as a dataset name parameter) as a * process outsourced to the data engine*.

That's why WHERE can be used in PROC's as well as DATA steps - unlike IF.

--------------------------

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for

Allow PROC SORT to output multiple datasets

--------------------------

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for

Allow PROC SORT to output multiple datasets

--------------------------

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi @mkeintz ,

Thanks for this too! Where vs IF, when to use the = sign vs when not to, DO while vs DO until... so much to remember!! I know it will become easier to remember the more programs that I write, but until then, every tip helps. So thanks for responding to my post 🙂 🙂

This is a knowledge-sharing community for SAS Certified Professionals and anyone who wants to learn more about becoming SAS Certified. Ask questions and get answers fast. Share with others who are interested in certification and who are studying for certifications.To get the most from your community experience, use these getting-started resources:

Community Do's and Don'ts

How to add SAS syntax to your post

How to get fast, helpful answers

3 ways to show off your SAS skills

Upcoming Events