Hello,
I have a doubt in the number of observations .
question : create a temporary data set, cleandata36.
In this data set, convert all group values to upper case.
Then keep only observations with group equal to 'A' or 'B'.
when I run below code
data cleandata36;
set cert.input36 ;
if upcase(group) in ('A','B');
run;
then number of observations are different
but when I run
data cleandata36;
set cert.input36 ;
group=upcase(group);
where group in ('A','B');
run;
the number of observations are different.
but the logic I want to implement is maybe same. Can please anyone tell me that why there is mismatch of observations 🙂
The Where Statement is a compile-time statement. It controls what observations are being read into the data step. Meaning that in the code below, it considers the values read - NOT the upcased values.
The Subsetting If Statement is considered at run-time. Therefore, in the code below, it considers the upcased values.
Hope this makes sense 🙂
data have;
input group :$1;
datalines;
A
a
B
b
;
data test1;
set have;
group = upcase(group);
where group in ('A', 'B');
run;
data test2;
set have;
group = upcase(group);
if group in ('A', 'B');
run;
The WHERE statement is applied to the input dataset, and therefore to the raw values. To make it work, use the function in the statement:
where upcase(group) in ('A','B');
The Where Statement is a compile-time statement. It controls what observations are being read into the data step. Meaning that in the code below, it considers the values read - NOT the upcased values.
The Subsetting If Statement is considered at run-time. Therefore, in the code below, it considers the upcased values.
Hope this makes sense 🙂
data have;
input group :$1;
datalines;
A
a
B
b
;
data test1;
set have;
group = upcase(group);
where group in ('A', 'B');
run;
data test2;
set have;
group = upcase(group);
if group in ('A', 'B');
run;
The position of a WHERE statement in the code is irrelevant, as it is applied in the read engine for the incoming dataset. A WHERE statement has the same function that a WHERE= dataset option has, and in a certain sense you can consider it to work "outside" the data step.
I also recommend that you study the documentation of the WHERE Statement thoroughly.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.