Hi,
Can anybody explain the reason for, when the code colored orange is executed it returns 5 records. Whereas code colored blue is executed it returns all records i.e. 7 records (Name and TotalMarks are missing)
Data ds_grade;
infile cards;
input grades $6. @;
*BELOW IF RETURNS 5 RECORDS;
if grades = 'Grade1' or grades = 'Grade2';
input name : $6. Total_Marks;
*BELOW IF THEN RETURNS ALL 7 RECORDS BUT NAME AND MARKS FOR GRADE3 AND GRADE4 MISSING;
/*
if grades = 'Grade1' or grades = 'Grade2' then
input name : $6. Total_Marks;
*/
cards;
Grade1 abcdef 500
Grade3 pqrstu 375
Grade1 xyz 510
Grade2 abc123 430
Grade2 eeeee 450
Grade4 hhhh 270
Grade1 jjjjjj 505
;
run;
Thanks in advance.
Anand
The first if statement you have operates like a where clause:
if grades = 'Grade1' or grades = 'Grade2'; input name : $6. Total_Marks;
It is a bit like saying: only keep records where grades is Grade1 or Grade2. Hence you get fewer records.
The second if statement:
if grades = 'Grade1' or grades = 'Grade2' then input name : $6. Total_Marks;
Is considered to be a conditional read. I.e. every row is read in, but the input only reads the data elements into the variables if the grades is Grade1 or Grade2.
So the first only records rows where the if is true, the second reads all rows, but populates variables only when the condition is true.
@RW9 wrote:
The first if statement you have operates like a where clause:
if grades = 'Grade1' or grades = 'Grade2'; input name : $6. Total_Marks;It is a bit like saying: only keep records where grades is Grade1 or Grade2. Hence you get fewer records.
That might be considered true in this example but subsetting IF and WHERE should not be considered synonymous.
The subsetting IF the one with NO THEN is like a gate. When true the gate is closed and execution is "returned to the top of the implied data step loop".
It's possible that I'm saying nothing new here ... just saying it differently.
As you have observed:
Using IF without THEN, you would expect to find only observations having GRADES of "Grade1" or "Grade2". All others would be deleted.
Using IF/THEN, you would expect to find all the observations that appear in the incoming data. However, the extra variables (NAME and TOTAL_MARKS) would be populated only for the "Grade1" and "Grade2" observations.
It's perhaps unfortunate that both statements contain the word IF. They have entirely different meanings.
Good luck.
Thanks Guys,
But what is the practical use of 2nd if statment (IF/THEN) as it is returning 7 observation in current case? In what situation you will go for 2nd if condition.
Typical use of the IF/THEN statement would be where:
For example, how would you read in data that looked like this:
cards;
Grade1 abcdef 500
Grade3 No Data Available
Grade1 xyz 510
Grade2 abc123 430
Grade2 eeeee 450
Grade4 No Data Available
Grade1 jjjjjj 505
;
You woud want to read in NAME and TOTAL_MARKS, but that should only apply when GRADE is "Grade1" or "Grade2".
Thanks everybody for reply.
In short if we want to exclude few observation then we have to use if with semicolan, but if we do not want to exclude any observation and take certain action basis certain condition than we use IF/THEN.
In this instance think of:
if <condition>;
as shorthand for:
if <condition> then output;
To be honest I would recommend the full text as it makes it clearer what it is doing, but thats just my preference.
I think your example would only be true if
IF <condition> THEN OUTPUT;
is the very last statement in the dataset as subsetting IF has implied RETURN.
I think it would be more accurate to say
IF <condition>;
is similar in function to
IF not(<condition>) then delete;
Both DELETE and the subsetting IF have the implied RETURN.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.