Hey!
I have a dataset that looks somewhat like this:
EMPLID CURRENT_COLLEGE
1 ABC
1 ABC
1 XYZ
1 XYZ
2 CED
2 VIN
2 VIN
I have to compare the values for each row in current_college and create a separate column called 'Change' that captures if the college for each ID has changed compared to the first observation. The output table should look something like this:
EMPLID CURRENT_COLLEGE CHANGE
1 ABC 0
1 ABC 0
1 XYZ 1
1 XYZ 1
2 CED 0
2 VIN 1
2 VIN 1
I have been scratching my head over this for hours and I have just started working with SAS! Are there any efficient ways of grouping observations in SAS (like 'group' in dplyr package in R)?
Thanks 🙂
data want;
set have;
by EMPLID;
retain CURRENT_COLLEGE_ ;
if first.EMPLID then do;
CURRENT_COLLEGE_=CURRENT_COLLEGE;
CHANGE=0;
end;
else do;
CHANGE=(CURRENT_COLLEGE ^ =CURRENT_COLLEGE_);
end;
drop CURRENT_COLLEGE_;
run;
data have;
input (EMPLID CURRENT_COLLEGE ) ($);
cards;
1 ABC
1 ABC
1 XYZ
1 XYZ
1 kiy
1 kiy
2 CED
2 VIN
2 VIN
;
data want;
do _n_=0 by 1 until(last.emplid);
do until(last.CURRENT_COLLEGE);
set have;
by EMPLID CURRENT_COLLEGE;
if _n_ in (0,1) then change=_n_;
output;
end;
end;
run;
proc print noobs;run;
Hi @rj997 By the way, An ideal design should have incremental value for the change variable each time the change occurs. I have tweaked your sample with another set of group
1 kiy
1 kiy
The change should increment to 2 and so on and so forth.
If that makes sense, the above code is tweak to sum
data have;
input (EMPLID CURRENT_COLLEGE ) ($);
cards;
1 ABC
1 ABC
1 XYZ
1 XYZ
1 kiy
1 kiy
2 CED
2 VIN
2 VIN
;
data want;
do _n_=0 by 1 until(last.emplid);
do until(last.CURRENT_COLLEGE);
set have;
by EMPLID CURRENT_COLLEGE;
change=_n_;;
output;
end;
end;
run;
Thanks for the help!
Although, I am just trying to capture if the college values have changed from the first observation and record that with binary values (of 0 or 1).
data want;
set have;
by EMPLID;
retain CURRENT_COLLEGE_ ;
if first.EMPLID then do;
CURRENT_COLLEGE_=CURRENT_COLLEGE;
CHANGE=0;
end;
else do;
CHANGE=(CURRENT_COLLEGE ^ =CURRENT_COLLEGE_);
end;
drop CURRENT_COLLEGE_;
run;
This is a situation in which you should consider taking advantage of a property of "conditional SET statements" - namely that any variable read by a SET statement is automatically retained until that SET (or an "overlapping" SET) is executed again:
data have;
input (EMPLID CURRENT_COLLEGE ) ($);
cards;
1 ABC
1 ABC
1 XYZ
1 XYZ
2 CED
2 VIN
2 VIN
;
data want;
set have;
by emplid;
if first.emplid then set have (keep=current_college rename=(current_college=college1)) point=_n_;
change=(current_college^=college1);
run;
Whenever the beginning of an ID is encountered, that observation is re-read via a SET with a "POINT=" option. Because the current_college variable is renamed to college1, the college1 value is automatically retained until the beginning of the next ID.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.