Dear community,
I would like to ask for your help concerning the following issue.
The setting is as follows, with screenshots and sample data attached.
I have a list with a group id.
For every group id, I have multiple observations per date (monthly frequency).
For every group for every date, I have at least one staff id.
So e.g. for group id 1, on January 31st 1991, I observe three staff members with ids AA, BB and CC.
I want to chain them together, to have AABBCC. Afterward, I would just keep the row with only the chained IDs.
I was messing around with "set by group_id date" and trying to use the lag function with cat(group_id, lag(group_id)), but I have proven to be too stupid.
It would be great, if you could please help me.
Yours sincerely,
Sinistrum
have:
want:
data have;
input
group_ID date staff_ID $;
informat date date7.;
format date ddmmyy10.;
datalines;
1 31Jan91 AA
1 31Jan91 BB
1 31Jan91 CC
1 28Feb91 AA
1 28Feb91 CC
1 28Feb91 DD
2 31Jan91 GG
2 31Jan91 H
2 31Jan91 FF
2 28Feb91 TTT
2 28Feb91 D
2 28Feb91 Y
;
run;
data want;
input
group_ID date staff_ID $ staff_ID_cum $;
informat date date7.;
format date ddmmyy10.;
datalines;
1 31Jan91 AA AA
1 31Jan91 BB AABB
1 31Jan91 CC AABBCC
1 28Feb91 AA AA
1 28Feb91 CC AACC
1 28Feb91 DD AACCDD
2 31Jan91 GG GG
2 31Jan91 H GGH
2 31Jan91 FF GGHFF
2 28Feb91 TTT TTT
2 28Feb91 D TTTD
2 28Feb91 Y TTTDY
;
run;
Something like this?
data have;
input group_ID date:date9. staff_ID$;
format date date9.;
datalines;
1 31Jan91 AA
1 31Jan91 BB
1 31Jan91 CC
1 28Feb91 AA
1 28Feb91 CC
1 28Feb91 DD
2 31Jan91 GG
2 31Jan91 H
2 31Jan91 FF
2 28Feb91 TTT
2 28Feb91 D
2 28Feb91 Y
;
proc sort data=have;
by group_ID date;
run;
data want;
set have;
length staff_ID_cum $20;
by group_ID date;
if first.date then staff_ID_cum = staff_ID;
else staff_ID_cum = cats(staff_ID_cum, staff_ID);
if last.date then output;
retain staff_ID_cum;
run;
Something like this?
data have;
input group_ID date:date9. staff_ID$;
format date date9.;
datalines;
1 31Jan91 AA
1 31Jan91 BB
1 31Jan91 CC
1 28Feb91 AA
1 28Feb91 CC
1 28Feb91 DD
2 31Jan91 GG
2 31Jan91 H
2 31Jan91 FF
2 28Feb91 TTT
2 28Feb91 D
2 28Feb91 Y
;
proc sort data=have;
by group_ID date;
run;
data want;
set have;
length staff_ID_cum $20;
by group_ID date;
if first.date then staff_ID_cum = staff_ID;
else staff_ID_cum = cats(staff_ID_cum, staff_ID);
if last.date then output;
retain staff_ID_cum;
run;
Something like:
data want; set have; retain staff_id_cum i; by group_id; if first.group_id then do; i=1; staff_id_cum=staff_id; end; else do; if i=3 then do; output; i=1; staff_id_cum=staff_id; end; else do; i=i+1; staff_id_cum=cats(staff_id_cum,staff_id); end; end; run;
De facto, exactly like this - thank you very much
So quickly.
Your code, RW9, seems to delete too much observations as it leaves me with only two instead of the four (2 groups, 2 dates, implies 4 final observations to keep).
For date, it keeps the first one, though it posts the staff_id_cum for the first part, so there is a little mismatch, too.
Thank you either way.
Yours sincerely
Sinistrum
Yes, I just read the first part as being 3 records per group, did not even see the date as a secondary grouping variable.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.