I understand that WHERE and IF IN can produce different results depending on where each statement is used. However, one of the questions on the (official) SAS 9.4 Certification Practice Exam has me puzzled. The question read: This project will use data set cert.input36. At any time, you may save your program as program36 in cert\programs. Write a SAS program that will clean the data in cert.input36 as follows: Step 1: create a temporary data set, cleandata36. In this data set, convert all group values to upper case. Then keep only observations with group equal to 'A' or 'B'. Step 2: Determine the MEDIAN value for the Kilograms variable for each group (A,B) in the cleandata36 data set. Round MEDIAN to the nearest whole number. Step 3: create results.output36 from cleandata36 Ensure that all values for variable Kilograms are between 40 and 200, inclusively. If the value is missing or out of range, replace the value with the MEDIAN Kilograms value for the respective group (A,B) calculated in step 2. Run the program and use the results to answer the next 3 questions. Their suggested code: data work.cleandata36;
set cert.input36;
group=upcase(group);
if group in ('A','B');
run;
proc means data=work.cleandata36 median;
class group;
var kilograms;
run;
data results.output36;
set cleandata36;
if Kilograms < 40 or Kilograms > 200 then do;
if group='A' then kilograms=79;
else kilograms=89;
end;
run;
proc contents data=results.output36;
run; My code: data work.cleandata36;
set cert.input36;
group = upcase(group);
where group = 'A' or group = 'B';
run;
proc means data=work.cleandata36 mean min max median maxdec=1;
class group;
run;
data results.output36;
set cleandata36;
if kilograms < 40 or kilograms > 200 then do;
if group = 'A' then kilograms = 79;
else kilograms = 89;
end;
if kilograms =. then do;
if group = 'A' then kilograms = 79;
else kilograms = 89;
end;
run;
proc print data = results.output36;
run;
proc means data=results.output36 mean min max maxdec=1;
class group;
run; I've attached input36.sas7bdate in case anyone wanted to test. When asked how many observations are in results.output36, my code (with the WHERE statement) resulted in 4897 observations. This was wrong, as their code (using IF IN) resulted in 4992. I tested my code using IF IN instead of WHERE and got the same result they did, 4992. But I'm not sure why. It would seem that both statements would subset to the same number of observations. Also, their code is missing the portion that accounts for the instruction to replace missing values with the median -- I'm surprised they wouldn't include it in their suggested answer.
... View more