BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
TheresaM
Obsidian | Level 7

I understand that WHERE and IF IN can produce different results depending on where each statement is used. However, one of the questions on the (official) SAS 9.4 Certification Practice Exam has me puzzled.  The question read:

 

This project will use data set cert.input36. At any time, you may save your program as program36 in cert\programs.

Write a SAS program that will clean the data in cert.input36 as follows:

 

    • Step 1:
      • create a temporary data set, cleandata36.
      • In this data set, convert all group values to upper case.
      • Then keep only observations with group equal to 'A' or 'B'.
    • Step 2:
      • Determine the MEDIAN value for the Kilograms variable for each group (A,B) in the cleandata36 data set. Round MEDIAN to the nearest whole number.
    • Step 3:
      • create results.output36 from cleandata36
      • Ensure that all values for variable Kilograms are between 40 and 200, inclusively.
      • If the value is missing or out of range, replace the value with the MEDIAN Kilograms value for the respective group (A,B) calculated in step 2.

 

Run the program and use the results to answer the next 3 questions.

Their suggested code:

 

data work.cleandata36;
   set cert.input36;
   group=upcase(group);
   if group in ('A','B');
run;

proc means data=work.cleandata36 median;
  class group;
  var kilograms;
run;

data results.output36;
  set cleandata36;
  if Kilograms < 40 or Kilograms > 200 then do;
    if group='A' then kilograms=79;
    else kilograms=89;
  end;
run;

proc contents data=results.output36;
run;

My code:

data work.cleandata36;
	set cert.input36;
	group = upcase(group);
	where group = 'A' or group = 'B';
run;

proc means data=work.cleandata36 mean min max median maxdec=1;
	class group;
run;

data results.output36;
	set cleandata36;
	if kilograms < 40 or kilograms > 200 then do;
		if group = 'A' then kilograms = 79;
		else kilograms = 89;
	end;
	if kilograms =. then do;
		if group = 'A' then kilograms = 79;
		else kilograms = 89;
	end;
run;
	
proc print data = results.output36;
run;

proc means data=results.output36 mean min max maxdec=1;
	class group;
run;

 

I've attached input36.sas7bdate in case anyone wanted to test.

 

When asked how many observations are in results.output36, my code (with the WHERE statement)  resulted in 4897 observations.  This was wrong, as their code (using IF IN) resulted in 4992.  I tested my code using IF IN instead of WHERE and got the same result they did, 4992. But I'm not sure why. It would seem that both statements would subset to the same number of observations.

 

Also,  their code is missing the portion that accounts for the instruction to replace missing values with the median -- I'm surprised they wouldn't include it in their suggested answer.

1 ACCEPTED SOLUTION

Accepted Solutions
TheresaM
Obsidian | Level 7

ah!  It took me few times rereading the documentation, but I think light finally dawned:

 

WHERE subsetted the data before UPCASE happened ("The WHERE statement selects observations before they are brought into the program data vector. "), and so even though I changed all values of "group" to upper case, it happened after WHERE selected its observations.  Thus, my WHERE subset is smaller because it didn't have access to all the instances of "group" where 'a' and 'b' had been converted to upper case. 

View solution in original post

12 REPLIES 12
PaigeMiller
Diamond | Level 26

IF and WHERE in a data step work differently. This is explained in the documentation: https://documentation.sas.com/doc/en/pgmmvacdc/9.4/lestmtsref/p1cxl8ifdt8u0gn12wqbji8o5fq1.htm#p08fp...

--
Paige Miller
Mark2010
SAS Employee

@TheresaM Here is a bit more of a hint: Notice this bullet from step 1: 

  • In this data set, convert all group values to upper case.

Then learn the difference between how IF and WHERE process data using the resource from @PaigeMiller and I think you will see why your code produces a different result.

 

TheresaM
Obsidian | Level 7

Thank you, @PaigeMiller and @Mark2010 .  I'll take some more time to understand the documentation!

Quentin
Super User

One way to explore this is to make a small test dataset, like:

 

data play ;
  input group $1. ;
  cards ;
A
B
C
D
.
a
b
c
d
;

Then you could use that as an input to your WHERE data step and your subsetting IF data step and see how the results differ.

BASUG is hosting free webinars Next up: Mike Raithel presenting on validating data files on Wednesday July 17. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
TheresaM
Obsidian | Level 7

ah!  It took me few times rereading the documentation, but I think light finally dawned:

 

WHERE subsetted the data before UPCASE happened ("The WHERE statement selects observations before they are brought into the program data vector. "), and so even though I changed all values of "group" to upper case, it happened after WHERE selected its observations.  Thus, my WHERE subset is smaller because it didn't have access to all the instances of "group" where 'a' and 'b' had been converted to upper case. 

Quentin
Super User

You got it.  So you could use WHERE, like:

where upcase(group) = 'A' or upcase(group) = 'B';

or:

where upcase(group) IN ('A' , 'B');

And it would select the same records as your IF.   You would still need to use UPCASE inside the data step to upcase the values.

BASUG is hosting free webinars Next up: Mike Raithel presenting on validating data files on Wednesday July 17. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Quentin
Super User

@TheresaM I suggest you select your own answer as the solution.  @Mark2010 gave you a good hint, but you came up with the solution and wrote it as an answer.  You also did a good job of phrasing the question clearly, and showing sample code.  So if anyone finds it in the future, it would be good for them to see your answer as the solution.  You should be able to click on Mark's answer and see an option to "unmark as correct" or something like that.  

BASUG is hosting free webinars Next up: Mike Raithel presenting on validating data files on Wednesday July 17. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
TheresaM
Obsidian | Level 7

Hi @Quentin 

 

Thank you for this.  This actually made me really happy to read!  Everything I thought I understood became much more of a struggle as I actually try to put it in practice (or answer practice exam questions).  So, hearing that I actually "came up with a solution" gives me hope. 😊🎉 

TheresaM
Obsidian | Level 7

Hi Paige,

 

I thought I understood all those differences as outlined in the page you reference. However, the way I read all that -- it would seem that my use of WHERE would produce the larger subset (since it selects observations before they're brought into PDV), yet somehow IF IN subsetted a larger number of observations.

mkeintz
PROC Star

@TheresaM 

 

I know this is a (very) lagged response.   But ...

 

I find it most useful to regard the WHERE statement (or where option as a dataset name parameter) as a process outsourced to the data engine

 

That's why WHERE can be used in PROC's as well as DATA steps - unlike IF.  

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
TheresaM
Obsidian | Level 7

Hi @mkeintz ,

Thanks for this too!  Where vs IF, when to use the = sign vs when not to, DO while vs DO until... so much to remember!!  I know it will become easier to remember the more programs that I write, but until then, every tip helps.  So thanks for responding to my post 🙂 🙂 

Welcome to the Certification Community

 

This is a knowledge-sharing community for SAS Certified Professionals and anyone who wants to learn more about becoming SAS Certified. Ask questions and get answers fast. Share with others who are interested in certification and who are studying for certifications.To get the most from your community experience, use these getting-started resources:

Community Do's and Don'ts
How to add SAS syntax to your post
How to get fast, helpful answers

 

Why Get SAS Certified.jpg

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 2572 views
  • 7 likes
  • 5 in conversation