Dear community,
I would like to ask for your help concerning the following issue.
The setting is as follows, with screenshots and sample data attached.
I have a list with a group id.
For every group id, I have multiple observations per date (monthly frequency).
For every group for every date, I have at least one staff id.
So e.g. for group id 1, on January 31st 1991, I observe three staff members with ids AA, BB and CC.
I want to chain them together, to have AABBCC. Afterward, I would just keep the row with only the chained IDs.
I was messing around with "set by group_id date" and trying to use the lag function with cat(group_id, lag(group_id)), but I have proven to be too stupid.
It would be great, if you could please help me.
Yours sincerely,
Sinistrum
have:
want:
data have;
input
group_ID date staff_ID $;
informat date date7.;
format date ddmmyy10.;
datalines;
1 31Jan91 AA
1 31Jan91 BB
1 31Jan91 CC
1 28Feb91 AA
1 28Feb91 CC
1 28Feb91 DD
2 31Jan91 GG
2 31Jan91 H
2 31Jan91 FF
2 28Feb91 TTT
2 28Feb91 D
2 28Feb91 Y
;
run;
data want;
input
group_ID date staff_ID $ staff_ID_cum $;
informat date date7.;
format date ddmmyy10.;
datalines;
1 31Jan91 AA AA
1 31Jan91 BB AABB
1 31Jan91 CC AABBCC
1 28Feb91 AA AA
1 28Feb91 CC AACC
1 28Feb91 DD AACCDD
2 31Jan91 GG GG
2 31Jan91 H GGH
2 31Jan91 FF GGHFF
2 28Feb91 TTT TTT
2 28Feb91 D TTTD
2 28Feb91 Y TTTDY
;
run;
Something like this?
data have;
input group_ID date:date9. staff_ID$;
format date date9.;
datalines;
1 31Jan91 AA
1 31Jan91 BB
1 31Jan91 CC
1 28Feb91 AA
1 28Feb91 CC
1 28Feb91 DD
2 31Jan91 GG
2 31Jan91 H
2 31Jan91 FF
2 28Feb91 TTT
2 28Feb91 D
2 28Feb91 Y
;
proc sort data=have;
by group_ID date;
run;
data want;
set have;
length staff_ID_cum $20;
by group_ID date;
if first.date then staff_ID_cum = staff_ID;
else staff_ID_cum = cats(staff_ID_cum, staff_ID);
if last.date then output;
retain staff_ID_cum;
run;
Something like this?
data have;
input group_ID date:date9. staff_ID$;
format date date9.;
datalines;
1 31Jan91 AA
1 31Jan91 BB
1 31Jan91 CC
1 28Feb91 AA
1 28Feb91 CC
1 28Feb91 DD
2 31Jan91 GG
2 31Jan91 H
2 31Jan91 FF
2 28Feb91 TTT
2 28Feb91 D
2 28Feb91 Y
;
proc sort data=have;
by group_ID date;
run;
data want;
set have;
length staff_ID_cum $20;
by group_ID date;
if first.date then staff_ID_cum = staff_ID;
else staff_ID_cum = cats(staff_ID_cum, staff_ID);
if last.date then output;
retain staff_ID_cum;
run;
Something like:
data want; set have; retain staff_id_cum i; by group_id; if first.group_id then do; i=1; staff_id_cum=staff_id; end; else do; if i=3 then do; output; i=1; staff_id_cum=staff_id; end; else do; i=i+1; staff_id_cum=cats(staff_id_cum,staff_id); end; end; run;
De facto, exactly like this - thank you very much
So quickly.
Your code, RW9, seems to delete too much observations as it leaves me with only two instead of the four (2 groups, 2 dates, implies 4 final observations to keep).
For date, it keeps the first one, though it posts the staff_id_cum for the first part, so there is a little mismatch, too.
Thank you either way.
Yours sincerely
Sinistrum
Yes, I just read the first part as being 3 records per group, did not even see the date as a secondary grouping variable.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.