Obsidian | Level 7

## WHERE versus IF IN subsetting unexpectedly different

I understand that WHERE and IF IN can produce different results depending on where each statement is used. However, one of the questions on the (official) SAS 9.4 Certification Practice Exam has me puzzled.  The question read:

This project will use data set cert.input36. At any time, you may save your program as program36 in cert\programs.

Write a SAS program that will clean the data in cert.input36 as follows:

• Step 1:
• create a temporary data set, cleandata36.
• In this data set, convert all group values to upper case.
• Then keep only observations with group equal to 'A' or 'B'.
• Step 2:
• Determine the MEDIAN value for the Kilograms variable for each group (A,B) in the cleandata36 data set. Round MEDIAN to the nearest whole number.
• Step 3:
• create results.output36 from cleandata36
• Ensure that all values for variable Kilograms are between 40 and 200, inclusively.
• If the value is missing or out of range, replace the value with the MEDIAN Kilograms value for the respective group (A,B) calculated in step 2.

Run the program and use the results to answer the next 3 questions.

Their suggested code:

``````data work.cleandata36;
set cert.input36;
group=upcase(group);
if group in ('A','B');
run;

proc means data=work.cleandata36 median;
class group;
var kilograms;
run;

data results.output36;
set cleandata36;
if Kilograms < 40 or Kilograms > 200 then do;
if group='A' then kilograms=79;
else kilograms=89;
end;
run;

proc contents data=results.output36;
run;``````

My code:

``````data work.cleandata36;
set cert.input36;
group = upcase(group);
where group = 'A' or group = 'B';
run;

proc means data=work.cleandata36 mean min max median maxdec=1;
class group;
run;

data results.output36;
set cleandata36;
if kilograms < 40 or kilograms > 200 then do;
if group = 'A' then kilograms = 79;
else kilograms = 89;
end;
if kilograms =. then do;
if group = 'A' then kilograms = 79;
else kilograms = 89;
end;
run;

proc print data = results.output36;
run;

proc means data=results.output36 mean min max maxdec=1;
class group;
run;``````

I've attached input36.sas7bdate in case anyone wanted to test.

When asked how many observations are in results.output36, my code (with the WHERE statement)  resulted in 4897 observations.  This was wrong, as their code (using IF IN) resulted in 4992.  I tested my code using IF IN instead of WHERE and got the same result they did, 4992. But I'm not sure why. It would seem that both statements would subset to the same number of observations.

Also,  their code is missing the portion that accounts for the instruction to replace missing values with the median -- I'm surprised they wouldn't include it in their suggested answer.

1 ACCEPTED SOLUTION

Accepted Solutions
Obsidian | Level 7

## Re: WHERE versus IF IN subsetting unexpectedly different

ah!  It took me few times rereading the documentation, but I think light finally dawned:

WHERE subsetted the data before UPCASE happened ("The WHERE statement selects observations before they are brought into the program data vector. "), and so even though I changed all values of "group" to upper case, it happened after WHERE selected its observations.  Thus, my WHERE subset is smaller because it didn't have access to all the instances of "group" where 'a' and 'b' had been converted to upper case.

12 REPLIES 12
Diamond | Level 26

## Re: WHERE versus IF IN subsetting unexpectedly different

IF and WHERE in a data step work differently. This is explained in the documentation: https://documentation.sas.com/doc/en/pgmmvacdc/9.4/lestmtsref/p1cxl8ifdt8u0gn12wqbji8o5fq1.htm#p08fp...

--
Paige Miller
SAS Super FREQ

## Re: WHERE versus IF IN subsetting unexpectedly different

@TheresaM Here is a bit more of a hint: Notice this bullet from step 1:

• In this data set, convert all group values to upper case.

Then learn the difference between how IF and WHERE process data using the resource from @PaigeMiller and I think you will see why your code produces a different result.

Obsidian | Level 7

## Re: WHERE versus IF IN subsetting unexpectedly different

Thank you, @PaigeMiller and @Mark2010 .  I'll take some more time to understand the documentation!

PROC Star

## Re: WHERE versus IF IN subsetting unexpectedly different

One way to explore this is to make a small test dataset, like:

``````data play ;
input group \$1. ;
cards ;
A
B
C
D
.
a
b
c
d
;``````

Then you could use that as an input to your WHERE data step and your subsetting IF data step and see how the results differ.

Check out the Boston Area SAS Users Group (BASUG) video archives: https://www.basug.org/videos.
Obsidian | Level 7

## Re: WHERE versus IF IN subsetting unexpectedly different

ah!  It took me few times rereading the documentation, but I think light finally dawned:

WHERE subsetted the data before UPCASE happened ("The WHERE statement selects observations before they are brought into the program data vector. "), and so even though I changed all values of "group" to upper case, it happened after WHERE selected its observations.  Thus, my WHERE subset is smaller because it didn't have access to all the instances of "group" where 'a' and 'b' had been converted to upper case.

SAS Super FREQ

## Re: WHERE versus IF IN subsetting unexpectedly different

Bingo. Congrats on working through it.

PROC Star

## Re: WHERE versus IF IN subsetting unexpectedly different

You got it.  So you could use WHERE, like:

``where upcase(group) = 'A' or upcase(group) = 'B';``

or:

``where upcase(group) IN ('A' , 'B');``

And it would select the same records as your IF.   You would still need to use UPCASE inside the data step to upcase the values.

Check out the Boston Area SAS Users Group (BASUG) video archives: https://www.basug.org/videos.
PROC Star

## Re: WHERE versus IF IN subsetting unexpectedly different

@TheresaM I suggest you select your own answer as the solution.  @Mark2010 gave you a good hint, but you came up with the solution and wrote it as an answer.  You also did a good job of phrasing the question clearly, and showing sample code.  So if anyone finds it in the future, it would be good for them to see your answer as the solution.  You should be able to click on Mark's answer and see an option to "unmark as correct" or something like that.

Check out the Boston Area SAS Users Group (BASUG) video archives: https://www.basug.org/videos.
Obsidian | Level 7

## Re: WHERE versus IF IN subsetting unexpectedly different

Hi @Quentin

Thank you for this.  This actually made me really happy to read!  Everything I thought I understood became much more of a struggle as I actually try to put it in practice (or answer practice exam questions).  So, hearing that I actually "came up with a solution" gives me hope. 😊🎉

Obsidian | Level 7

## Re: WHERE versus IF IN subsetting unexpectedly different

Hi Paige,

I thought I understood all those differences as outlined in the page you reference. However, the way I read all that -- it would seem that my use of WHERE would produce the larger subset (since it selects observations before they're brought into PDV), yet somehow IF IN subsetted a larger number of observations.

## Re: WHERE versus IF IN subsetting unexpectedly different

I know this is a (very) lagged response.   But ...

I find it most useful to regard the WHERE statement (or where option as a dataset name parameter) as a process outsourced to the data engine

That's why WHERE can be used in PROC's as well as DATA steps - unlike IF.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Obsidian | Level 7

## Re: WHERE versus IF IN subsetting unexpectedly different

Hi @mkeintz ,

Thanks for this too!  Where vs IF, when to use the = sign vs when not to, DO while vs DO until... so much to remember!!  I know it will become easier to remember the more programs that I write, but until then, every tip helps.  So thanks for responding to my post 🙂 🙂

Discussion stats
• 12 replies
• 854 views
• 7 likes
• 5 in conversation