BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sergio_15
Fluorite | Level 6

Hello,

I'm a SAS beginner and I have the following situation:

 

 

data a;
  student=1; age=27; output;
  student=2; age=22; output;
  student=3; age=25; output;
run;

data b;
  student=4; agegrp=20; output;
  student=5; agegrp=40; output;
  student=6; agegrp=30; output;
  student=7; agegrp=50; output;
run;

data c;
  teacher=9;
run;

/******* data set creation *******/ data all1; if _n_=1 then set c; set a b; if age NE . then agegrp=age; run; data all2; if _n_=1 then set c; set a b ; if agegrp =. then agegrp=age; run;

 

I want to append the values in the column "age" to the ones in column "agegrp", but I'm having different results depending on which if-statement I choose.

 

Could anyone please explain me the difference in interpretation from SAS that makes it to repeat the number "27" (age from student 1) in the first 3 rows of agegrp in "all2" and to not do it for the first data set (all1), doing it correctly?

 

Thank you!!

1 ACCEPTED SOLUTION

Accepted Solutions
MikeZdeb
Rhodochrosite | Level 12

Hi. All variables that are read in a SET statement are automatically RETAINED. In the second data step, even though you are not reading any values for AGEGROUP as you read observations from data set A, the value you assigned to AGEGROUP when you read the first observation from data set A is RETAINED and that value is 27. When you read the second observation from data set A, AGEGROUP is NOT MISSING (you gave it a value in the previous pass through the data step), so its value is not replaced by AGE (same for the third observation in A). Look at the LOG (I added PUT _ALL_ to the data setp just after the SET A B: and prior to the IF staement) ...

 

data all2;
if _n_=1 then set c;
set a b ;
put _all_;
if agegrp =. then agegrp=age;
run;

 

teacher=9 student=1 age=27 agegrp= _ERROR_=0 _N_=1
teacher=9 student=2 age=22 agegrp=27 _ERROR_=0 _N_=2
teacher=9 student=3 age=25 agegrp=27 _ERROR_=0 _N_=3
teacher=9 student=4 age=. agegrp=20 _ERROR_=0 _N_=4
teacher=9 student=5 age=. agegrp=40 _ERROR_=0 _N_=5
teacher=9 student=6 age=. agegrp=30 _ERROR_=0 _N_=6
teacher=9 student=7 age=. agegrp=50 _ERROR_=0 _N_=7

 

NOTICE that 27.

 

Try this and you'll see the same thing happen with a RETAINED value in data set B ...

 

data b;
student=4; agegrp='20'; age = 99; output; 
student=5; agegrp='40'; output;
student=6; agegrp='30'; output;
student=7; agegrp='50'; output;
run;

 

There's also an issue in both data steps with NUMERIC (AGE) and CHARACTER (AGEGROUP) variables ... the notes you get in the LOG point that out.

View solution in original post

5 REPLIES 5
Reeza
Super User

I didn't run your code or even read it all but you're mixing types. 

You have character and numeric variables and are trying to use them interchangeably.  That likely won't work as you expect.

 

You say merge but you're appending so it's confusing. 

 

You need to add one more data set - what you want as output. 

Sergio_15
Fluorite | Level 6

Yes, sorry for mixing names and types here (I fixed it in the question), but even with the same variable type this situation continues to happen.

MikeZdeb
Rhodochrosite | Level 12

Hi. All variables that are read in a SET statement are automatically RETAINED. In the second data step, even though you are not reading any values for AGEGROUP as you read observations from data set A, the value you assigned to AGEGROUP when you read the first observation from data set A is RETAINED and that value is 27. When you read the second observation from data set A, AGEGROUP is NOT MISSING (you gave it a value in the previous pass through the data step), so its value is not replaced by AGE (same for the third observation in A). Look at the LOG (I added PUT _ALL_ to the data setp just after the SET A B: and prior to the IF staement) ...

 

data all2;
if _n_=1 then set c;
set a b ;
put _all_;
if agegrp =. then agegrp=age;
run;

 

teacher=9 student=1 age=27 agegrp= _ERROR_=0 _N_=1
teacher=9 student=2 age=22 agegrp=27 _ERROR_=0 _N_=2
teacher=9 student=3 age=25 agegrp=27 _ERROR_=0 _N_=3
teacher=9 student=4 age=. agegrp=20 _ERROR_=0 _N_=4
teacher=9 student=5 age=. agegrp=40 _ERROR_=0 _N_=5
teacher=9 student=6 age=. agegrp=30 _ERROR_=0 _N_=6
teacher=9 student=7 age=. agegrp=50 _ERROR_=0 _N_=7

 

NOTICE that 27.

 

Try this and you'll see the same thing happen with a RETAINED value in data set B ...

 

data b;
student=4; agegrp='20'; age = 99; output; 
student=5; agegrp='40'; output;
student=6; agegrp='30'; output;
student=7; agegrp='50'; output;
run;

 

There's also an issue in both data steps with NUMERIC (AGE) and CHARACTER (AGEGROUP) variables ... the notes you get in the LOG point that out.

Sergio_15
Fluorite | Level 6
I got it now!! This has solved the problem!

Thank you guys!
Astounding
PROC Star

The difference in type (AGE is numeric but AGEGRP is character) adds to the confusion.  But the issue you are encountering is this.

 

Any variable that comes from a SAS data set is automatically retained.  In both your DATA steps, the list of retained variables includes both AGE and AGEGRP.

 

Except for the type conversions, your first DATA step is working as expected.  But the second does not.  Here's what is happening when you create ALL2.

 

On the first observation, AGEGRP is missing.  Despite the type conversions, SAS figures this out.  So it gives AGEGRP a value of "27" (again, despite the type conversions).  On the second observation, because AGEGRP is retained, it is still "27".  Reading another observation from A doesn't change this, because A doesn't contain AGEGRP.  Therefore, when checking whether AGEGRP=. SAS finds the comparison is false and doesn't reassign AGEGRP.

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1018 views
  • 2 likes
  • 4 in conversation