Hey!
I have a dataset that looks somewhat like this:
EMPLID CURRENT_COLLEGE
1 ABC
1 ABC
1 XYZ
1 XYZ
2 CED
2 VIN
2 VIN
I have to compare the values for each row in current_college and create a separate column called 'Change' that captures if the college for each ID has changed compared to the first observation. The output table should look something like this:
EMPLID CURRENT_COLLEGE CHANGE
1 ABC 0
1 ABC 0
1 XYZ 1
1 XYZ 1
2 CED 0
2 VIN 1
2 VIN 1
I have been scratching my head over this for hours and I have just started working with SAS! Are there any efficient ways of grouping observations in SAS (like 'group' in dplyr package in R)?
Thanks 🙂
data want;
set have;
by EMPLID;
retain CURRENT_COLLEGE_ ;
if first.EMPLID then do;
CURRENT_COLLEGE_=CURRENT_COLLEGE;
CHANGE=0;
end;
else do;
CHANGE=(CURRENT_COLLEGE ^ =CURRENT_COLLEGE_);
end;
drop CURRENT_COLLEGE_;
run;
data have;
input (EMPLID CURRENT_COLLEGE ) ($);
cards;
1 ABC
1 ABC
1 XYZ
1 XYZ
1 kiy
1 kiy
2 CED
2 VIN
2 VIN
;
data want;
do _n_=0 by 1 until(last.emplid);
do until(last.CURRENT_COLLEGE);
set have;
by EMPLID CURRENT_COLLEGE;
if _n_ in (0,1) then change=_n_;
output;
end;
end;
run;
proc print noobs;run;
Hi @rj997 By the way, An ideal design should have incremental value for the change variable each time the change occurs. I have tweaked your sample with another set of group
1 kiy
1 kiy
The change should increment to 2 and so on and so forth.
If that makes sense, the above code is tweak to sum
data have;
input (EMPLID CURRENT_COLLEGE ) ($);
cards;
1 ABC
1 ABC
1 XYZ
1 XYZ
1 kiy
1 kiy
2 CED
2 VIN
2 VIN
;
data want;
do _n_=0 by 1 until(last.emplid);
do until(last.CURRENT_COLLEGE);
set have;
by EMPLID CURRENT_COLLEGE;
change=_n_;;
output;
end;
end;
run;
Thanks for the help!
Although, I am just trying to capture if the college values have changed from the first observation and record that with binary values (of 0 or 1).
data want;
set have;
by EMPLID;
retain CURRENT_COLLEGE_ ;
if first.EMPLID then do;
CURRENT_COLLEGE_=CURRENT_COLLEGE;
CHANGE=0;
end;
else do;
CHANGE=(CURRENT_COLLEGE ^ =CURRENT_COLLEGE_);
end;
drop CURRENT_COLLEGE_;
run;
This is a situation in which you should consider taking advantage of a property of "conditional SET statements" - namely that any variable read by a SET statement is automatically retained until that SET (or an "overlapping" SET) is executed again:
data have;
input (EMPLID CURRENT_COLLEGE ) ($);
cards;
1 ABC
1 ABC
1 XYZ
1 XYZ
2 CED
2 VIN
2 VIN
;
data want;
set have;
by emplid;
if first.emplid then set have (keep=current_college rename=(current_college=college1)) point=_n_;
change=(current_college^=college1);
run;
Whenever the beginning of an ID is encountered, that observation is re-read via a SET with a "POINT=" option. Because the current_college variable is renamed to college1, the college1 value is automatically retained until the beginning of the next ID.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.