DATA Step, Macro, Functions and more

Problem stacking with IF-statement (DATA step)

Accepted Solution Solved
Reply
New Contributor
Posts: 3
Accepted Solution

Problem stacking with IF-statement (DATA step)

[ Edited ]

Hello,

I'm a SAS beginner and I have the following situation:

 

 

data a;
  student=1; age=27; output;
  student=2; age=22; output;
  student=3; age=25; output;
run;

data b;
  student=4; agegrp=20; output;
  student=5; agegrp=40; output;
  student=6; agegrp=30; output;
  student=7; agegrp=50; output;
run;

data c;
  teacher=9;
run;

/******* data set creation *******/ data all1; if _n_=1 then set c; set a b; if age NE . then agegrp=age; run; data all2; if _n_=1 then set c; set a b ; if agegrp =. then agegrp=age; run;

 

I want to append the values in the column "age" to the ones in column "agegrp", but I'm having different results depending on which if-statement I choose.

 

Could anyone please explain me the difference in interpretation from SAS that makes it to repeat the number "27" (age from student 1) in the first 3 rows of agegrp in "all2" and to not do it for the first data set (all1), doing it correctly?

 

Thank you!!


Accepted Solutions
Solution
‎05-30-2017 10:27 PM
Valued Guide
Posts: 765

Re: Problem stacking with IF-statement (DATA step)

[ Edited ]
Posted in reply to Sergio_15

Hi. All variables that are read in a SET statement are automatically RETAINED. In the second data step, even though you are not reading any values for AGEGROUP as you read observations from data set A, the value you assigned to AGEGROUP when you read the first observation from data set A is RETAINED and that value is 27. When you read the second observation from data set A, AGEGROUP is NOT MISSING (you gave it a value in the previous pass through the data step), so its value is not replaced by AGE (same for the third observation in A). Look at the LOG (I added PUT _ALL_ to the data setp just after the SET A B: and prior to the IF staement) ...

 

data all2;
if _n_=1 then set c;
set a b ;
put _all_;
if agegrp =. then agegrp=age;
run;

 

teacher=9 student=1 age=27 agegrp= _ERROR_=0 _N_=1
teacher=9 student=2 age=22 agegrp=27 _ERROR_=0 _N_=2
teacher=9 student=3 age=25 agegrp=27 _ERROR_=0 _N_=3
teacher=9 student=4 age=. agegrp=20 _ERROR_=0 _N_=4
teacher=9 student=5 age=. agegrp=40 _ERROR_=0 _N_=5
teacher=9 student=6 age=. agegrp=30 _ERROR_=0 _N_=6
teacher=9 student=7 age=. agegrp=50 _ERROR_=0 _N_=7

 

NOTICE that 27.

 

Try this and you'll see the same thing happen with a RETAINED value in data set B ...

 

data b;
student=4; agegrp='20'; age = 99; output; 
student=5; agegrp='40'; output;
student=6; agegrp='30'; output;
student=7; agegrp='50'; output;
run;

 

There's also an issue in both data steps with NUMERIC (AGE) and CHARACTER (AGEGROUP) variables ... the notes you get in the LOG point that out.

View solution in original post


All Replies
Super User
Posts: 19,768

Re: Problem stacking with IF-statement (DATA step)

Posted in reply to Sergio_15

I didn't run your code or even read it all but you're mixing types. 

You have character and numeric variables and are trying to use them interchangeably.  That likely won't work as you expect.

 

You say merge but you're appending so it's confusing. 

 

You need to add one more data set - what you want as output. 

New Contributor
Posts: 3

Re: Problem stacking with IF-statement (DATA step)

Yes, sorry for mixing names and types here (I fixed it in the question), but even with the same variable type this situation continues to happen.

Solution
‎05-30-2017 10:27 PM
Valued Guide
Posts: 765

Re: Problem stacking with IF-statement (DATA step)

[ Edited ]
Posted in reply to Sergio_15

Hi. All variables that are read in a SET statement are automatically RETAINED. In the second data step, even though you are not reading any values for AGEGROUP as you read observations from data set A, the value you assigned to AGEGROUP when you read the first observation from data set A is RETAINED and that value is 27. When you read the second observation from data set A, AGEGROUP is NOT MISSING (you gave it a value in the previous pass through the data step), so its value is not replaced by AGE (same for the third observation in A). Look at the LOG (I added PUT _ALL_ to the data setp just after the SET A B: and prior to the IF staement) ...

 

data all2;
if _n_=1 then set c;
set a b ;
put _all_;
if agegrp =. then agegrp=age;
run;

 

teacher=9 student=1 age=27 agegrp= _ERROR_=0 _N_=1
teacher=9 student=2 age=22 agegrp=27 _ERROR_=0 _N_=2
teacher=9 student=3 age=25 agegrp=27 _ERROR_=0 _N_=3
teacher=9 student=4 age=. agegrp=20 _ERROR_=0 _N_=4
teacher=9 student=5 age=. agegrp=40 _ERROR_=0 _N_=5
teacher=9 student=6 age=. agegrp=30 _ERROR_=0 _N_=6
teacher=9 student=7 age=. agegrp=50 _ERROR_=0 _N_=7

 

NOTICE that 27.

 

Try this and you'll see the same thing happen with a RETAINED value in data set B ...

 

data b;
student=4; agegrp='20'; age = 99; output; 
student=5; agegrp='40'; output;
student=6; agegrp='30'; output;
student=7; agegrp='50'; output;
run;

 

There's also an issue in both data steps with NUMERIC (AGE) and CHARACTER (AGEGROUP) variables ... the notes you get in the LOG point that out.

New Contributor
Posts: 3

Re: Problem stacking with IF-statement (DATA step)

I got it now!! This has solved the problem!

Thank you guys!
Super User
Posts: 5,495

Re: Problem stacking with IF-statement (DATA step)

Posted in reply to Sergio_15

The difference in type (AGE is numeric but AGEGRP is character) adds to the confusion.  But the issue you are encountering is this.

 

Any variable that comes from a SAS data set is automatically retained.  In both your DATA steps, the list of retained variables includes both AGE and AGEGRP.

 

Except for the type conversions, your first DATA step is working as expected.  But the second does not.  Here's what is happening when you create ALL2.

 

On the first observation, AGEGRP is missing.  Despite the type conversions, SAS figures this out.  So it gives AGEGRP a value of "27" (again, despite the type conversions).  On the second observation, because AGEGRP is retained, it is still "27".  Reading another observation from A doesn't change this, because A doesn't contain AGEGRP.  Therefore, when checking whether AGEGRP=. SAS finds the comparison is false and doesn't reassign AGEGRP.

 

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 191 views
  • 2 likes
  • 4 in conversation