DATA Step, Macro, Functions and more

Chain lagged string variables by group-id and date

Accepted Solution Solved
Reply
Contributor
Posts: 50
Accepted Solution

Chain lagged string variables by group-id and date

Dear community,

 

I would like to ask for your help concerning the following issue.

 

The setting is as follows, with screenshots and sample data attached.

I have a list with a group id.

For every group id, I have multiple observations per date (monthly frequency).

For every group for every date, I have at least one staff id.

So e.g. for group id 1, on January 31st 1991, I observe three staff members with ids AA, BB and CC.

I want to chain them together, to have AABBCC. Afterward, I would just keep the row with only the chained IDs.

 

I was messing around with "set by group_id date" and trying to use the lag function with cat(group_id, lag(group_id)), but I have proven to be too stupid.

 

It would be great, if you could please help me.

 

Yours sincerely,

Sinistrum

 

have:

have_group_staff.PNG

 

want:

want_group_staff.PNG

 

data	have;
	input
		group_ID	date staff_ID $;
		informat date date7.;
		format date ddmmyy10.;
	datalines;
1	31Jan91	AA
1	31Jan91	BB
1	31Jan91	CC
1	28Feb91	AA
1	28Feb91	CC
1	28Feb91	DD
2	31Jan91	GG
2	31Jan91	H
2	31Jan91	FF
2	28Feb91	TTT
2	28Feb91	D
2	28Feb91	Y
	;
run;
data	want;
	input
		group_ID	date staff_ID $	staff_ID_cum $;
		informat date date7.;
		format date ddmmyy10.;
	datalines;
1	31Jan91	AA	AA
1	31Jan91	BB	AABB
1	31Jan91	CC	AABBCC
1	28Feb91	AA	AA
1	28Feb91	CC	AACC
1	28Feb91	DD	AACCDD
2	31Jan91	GG	GG
2	31Jan91	H	GGH
2	31Jan91	FF	GGHFF
2	28Feb91	TTT	TTT
2	28Feb91	D	TTTD
2	28Feb91	Y	TTTDY
	;
run;

 


Accepted Solutions
Solution
‎11-17-2017 09:55 AM
PROC Star
Posts: 1,400

Re: Chain lagged string variables by group-id and date

[ Edited ]
Posted in reply to Sinistrum

Something like this?

 

 

data have;
input group_ID date:date9. staff_ID$;
format date date9.;
datalines;
1 31Jan91 AA
1 31Jan91 BB
1 31Jan91 CC
1 28Feb91 AA
1 28Feb91 CC
1 28Feb91 DD
2 31Jan91 GG
2 31Jan91 H
2 31Jan91 FF
2 28Feb91 TTT
2 28Feb91 D
2 28Feb91 Y
;

proc sort data=have;
	by group_ID date;
run;

data want;
	set have;
	length staff_ID_cum $20;
	by group_ID date;

	if first.date then staff_ID_cum = staff_ID;
	else staff_ID_cum = cats(staff_ID_cum, staff_ID);
	if last.date then output;

	retain staff_ID_cum;
run;

View solution in original post


All Replies
Solution
‎11-17-2017 09:55 AM
PROC Star
Posts: 1,400

Re: Chain lagged string variables by group-id and date

[ Edited ]
Posted in reply to Sinistrum

Something like this?

 

 

data have;
input group_ID date:date9. staff_ID$;
format date date9.;
datalines;
1 31Jan91 AA
1 31Jan91 BB
1 31Jan91 CC
1 28Feb91 AA
1 28Feb91 CC
1 28Feb91 DD
2 31Jan91 GG
2 31Jan91 H
2 31Jan91 FF
2 28Feb91 TTT
2 28Feb91 D
2 28Feb91 Y
;

proc sort data=have;
	by group_ID date;
run;

data want;
	set have;
	length staff_ID_cum $20;
	by group_ID date;

	if first.date then staff_ID_cum = staff_ID;
	else staff_ID_cum = cats(staff_ID_cum, staff_ID);
	if last.date then output;

	retain staff_ID_cum;
run;
Super User
Super User
Posts: 9,829

Re: Chain lagged string variables by group-id and date

[ Edited ]
Posted in reply to Sinistrum

Something like:

data want;
  set have;
  retain staff_id_cum i;
  by group_id;
  if first.group_id then do;
    i=1;
    staff_id_cum=staff_id;
  end;
  else do;
    if i=3 then do;
      output;
      i=1;
      staff_id_cum=staff_id;
    end;
    else do;
      i=i+1;
      staff_id_cum=cats(staff_id_cum,staff_id);
    end;
  end;
run;

 

Contributor
Posts: 50

Re: Chain lagged string variables by group-id and date

De facto, exactly like this - thank you very much

So quickly.

 

Your code, RW9, seems to delete too much observations as it leaves me with only two instead of the four (2 groups, 2 dates, implies 4 final observations to keep).

For date, it keeps the first one, though it posts the staff_id_cum for the first part, so there is a little mismatch, too.

Thank you either way.

 

Yours sincerely

Sinistrum

Super User
Super User
Posts: 9,829

Re: Chain lagged string variables by group-id and date

Posted in reply to Sinistrum

Yes, I just read the first part as being 3 records per group, did not even see the date as a secondary grouping variable.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 224 views
  • 3 likes
  • 3 in conversation