Hello,
I'm a SAS beginner and I have the following situation:
data a;
student=1; age=27; output;
student=2; age=22; output;
student=3; age=25; output;
run;
data b;
student=4; agegrp=20; output;
student=5; agegrp=40; output;
student=6; agegrp=30; output;
student=7; agegrp=50; output;
run;
data c;
teacher=9;
run;
/******* data set creation *******/
data all1;
if _n_=1 then set c;
set a b;
if age NE . then agegrp=age;
run;
data all2;
if _n_=1 then set c;
set a b ;
if agegrp =. then agegrp=age;
run;
I want to append the values in the column "age" to the ones in column "agegrp", but I'm having different results depending on which if-statement I choose.
Could anyone please explain me the difference in interpretation from SAS that makes it to repeat the number "27" (age from student 1) in the first 3 rows of agegrp in "all2" and to not do it for the first data set (all1), doing it correctly?
Thank you!!
Hi. All variables that are read in a SET statement are automatically RETAINED. In the second data step, even though you are not reading any values for AGEGROUP as you read observations from data set A, the value you assigned to AGEGROUP when you read the first observation from data set A is RETAINED and that value is 27. When you read the second observation from data set A, AGEGROUP is NOT MISSING (you gave it a value in the previous pass through the data step), so its value is not replaced by AGE (same for the third observation in A). Look at the LOG (I added PUT _ALL_ to the data setp just after the SET A B: and prior to the IF staement) ...
data all2;
if _n_=1 then set c;
set a b ;
put _all_;
if agegrp =. then agegrp=age;
run;
teacher=9 student=1 age=27 agegrp= _ERROR_=0 _N_=1
teacher=9 student=2 age=22 agegrp=27 _ERROR_=0 _N_=2
teacher=9 student=3 age=25 agegrp=27 _ERROR_=0 _N_=3
teacher=9 student=4 age=. agegrp=20 _ERROR_=0 _N_=4
teacher=9 student=5 age=. agegrp=40 _ERROR_=0 _N_=5
teacher=9 student=6 age=. agegrp=30 _ERROR_=0 _N_=6
teacher=9 student=7 age=. agegrp=50 _ERROR_=0 _N_=7
NOTICE that 27.
Try this and you'll see the same thing happen with a RETAINED value in data set B ...
data b;
student=4; agegrp='20'; age = 99; output;
student=5; agegrp='40'; output;
student=6; agegrp='30'; output;
student=7; agegrp='50'; output;
run;
There's also an issue in both data steps with NUMERIC (AGE) and CHARACTER (AGEGROUP) variables ... the notes you get in the LOG point that out.
I didn't run your code or even read it all but you're mixing types.
You have character and numeric variables and are trying to use them interchangeably. That likely won't work as you expect.
You say merge but you're appending so it's confusing.
You need to add one more data set - what you want as output.
Yes, sorry for mixing names and types here (I fixed it in the question), but even with the same variable type this situation continues to happen.
Hi. All variables that are read in a SET statement are automatically RETAINED. In the second data step, even though you are not reading any values for AGEGROUP as you read observations from data set A, the value you assigned to AGEGROUP when you read the first observation from data set A is RETAINED and that value is 27. When you read the second observation from data set A, AGEGROUP is NOT MISSING (you gave it a value in the previous pass through the data step), so its value is not replaced by AGE (same for the third observation in A). Look at the LOG (I added PUT _ALL_ to the data setp just after the SET A B: and prior to the IF staement) ...
data all2;
if _n_=1 then set c;
set a b ;
put _all_;
if agegrp =. then agegrp=age;
run;
teacher=9 student=1 age=27 agegrp= _ERROR_=0 _N_=1
teacher=9 student=2 age=22 agegrp=27 _ERROR_=0 _N_=2
teacher=9 student=3 age=25 agegrp=27 _ERROR_=0 _N_=3
teacher=9 student=4 age=. agegrp=20 _ERROR_=0 _N_=4
teacher=9 student=5 age=. agegrp=40 _ERROR_=0 _N_=5
teacher=9 student=6 age=. agegrp=30 _ERROR_=0 _N_=6
teacher=9 student=7 age=. agegrp=50 _ERROR_=0 _N_=7
NOTICE that 27.
Try this and you'll see the same thing happen with a RETAINED value in data set B ...
data b;
student=4; agegrp='20'; age = 99; output;
student=5; agegrp='40'; output;
student=6; agegrp='30'; output;
student=7; agegrp='50'; output;
run;
There's also an issue in both data steps with NUMERIC (AGE) and CHARACTER (AGEGROUP) variables ... the notes you get in the LOG point that out.
The difference in type (AGE is numeric but AGEGRP is character) adds to the confusion. But the issue you are encountering is this.
Any variable that comes from a SAS data set is automatically retained. In both your DATA steps, the list of retained variables includes both AGE and AGEGRP.
Except for the type conversions, your first DATA step is working as expected. But the second does not. Here's what is happening when you create ALL2.
On the first observation, AGEGRP is missing. Despite the type conversions, SAS figures this out. So it gives AGEGRP a value of "27" (again, despite the type conversions). On the second observation, because AGEGRP is retained, it is still "27". Reading another observation from A doesn't change this, because A doesn't contain AGEGRP. Therefore, when checking whether AGEGRP=. SAS finds the comparison is false and doesn't reassign AGEGRP.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.