BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
d0816
Quartz | Level 8

Hi, I have this this dataset. For each "Time" category, for ID with multiple rows, I want to collapse it into one row per ID. Group category is seven types and I don't anticipate this max number to change. I also want to put the groups in one column and in alphabetical order separated by / and also want to count the number of groups for each ID in a separate column. How can I achieve this? Thank you for your help.

IDTimeGroupSalary_Q1Salary_Q2
12020-Q1GGGGGGG$295.00$88.00
22020-Q1AA$1,918.00$2,309.00
32020-Q1BB$0.00$0.00
42020-Q1CCC$0.00$0.00
52020-Q1EEE$2,900.00$2,466.00
62020-Q1DD$0.00$0.00
72020-Q1FF$0.00$0.00
82020-Q1EEE$0.00$0.00
82020-Q1AA$0.00$0.00
92020-Q1BB$0.00$0.00
92020-Q1CCC$0.00$0.00
102020-Q1CCC$0.00$0.00
102020-Q1AA$0.00$0.00
112020-Q1BB$3,964.00$2,886.00
112020-Q1DD$3,964.00$2,886.00
122020-Q1GGGGGGG$2,351.00$7,101.00
122020-Q1AA$2,351.00$7,101.00
132020-Q1GGGGGGG$0.00$550.00
132020-Q1EEE$0.00$550.00
132020-Q1AA$0.00$550.00
142020-Q1GGGGGGG$3,796.00$6,055.00
142020-Q1AA$3,796.00$6,055.00
142020-Q1BB$3,796.00$6,055.00
142020-Q1CCC$3,796.00$6,055.00
142020-Q1EEE$3,796.00$6,055.00
142020-Q1DD$3,796.00$6,055.00
142020-Q1FF$3,796.00$6,055.00
122020-Q2GGGGGGG$2,351.00$7,101.00
152020-Q2CCC$0.00$0.00
152020-Q2AA$0.00$0.00
132020-Q2BB$0.00$550.00
132020-Q2CCC$0.00$550.00
162020-Q2GGGGGGG$2,224.00$0.00
162020-Q2AA$2,224.00$0.00
162020-Q2BB$2,224.00$0.00
162020-Q2CCC$2,224.00$0.00
162020-Q2EEE$2,224.00$0.00
162020-Q2DD$2,224.00$0.00
162020-Q2FF$2,224.00$0.00

 

Dataset I want:

IDTimeGroupSalary_Q1Salary_Q2No.of Groups
12020-Q1GGGGGGG$295.00$88.001
22020-Q1AA$1,918.00$2,309.001
32020-Q1BB$0.00$0.001
42020-Q1CCC$0.00$0.001
52020-Q1EEE$2,900.00$2,466.001
62020-Q1DD$0.00$0.001
72020-Q1FF$0.00$0.001
82020-Q1AA/EEE$0.00$0.002
92020-Q1BB/CCC$0.00$0.002
102020-Q1AA/CCC$0.00$0.002
112020-Q1BB/DD$3,964.00$2,886.002
122020-Q1AA/GGGGGGG$2,351.00$7,101.002
132020-Q1AA/EEE/GGGGGGG$0.00$550.003
142020-Q1AA/BB/CCC/DD/EEE/FF/GGGGGGG$3,796.00$6,055.007
122020-Q2GGGGGGG$2,351.00$7,101.001
152020-Q2AA/CCC$0.00$0.002
132020-Q2BB/CCC$0.00$550.002
162020-Q2AA/BB/CCC/DD/EEE/FF/GGGGGGG$2,224.00$0.007
1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20


data have;
 input ID	(Time	Group	Salary_Q1	Salary_Q2) (:$50.);
 cards;
1	2020-Q1	GGGGGGG	$295.00	$88.00
2	2020-Q1	AA	$1,918.00	$2,309.00
3	2020-Q1	BB	$0.00	$0.00
4	2020-Q1	CCC	$0.00	$0.00
5	2020-Q1	EEE	$2,900.00	$2,466.00
6	2020-Q1	DD	$0.00	$0.00
7	2020-Q1	FF	$0.00	$0.00
8	2020-Q1	EEE	$0.00	$0.00
8	2020-Q1	AA	$0.00	$0.00
9	2020-Q1	BB	$0.00	$0.00
9	2020-Q1	CCC	$0.00	$0.00
10	2020-Q1	CCC	$0.00	$0.00
10	2020-Q1	AA	$0.00	$0.00
11	2020-Q1	BB	$3,964.00	$2,886.00
11	2020-Q1	DD	$3,964.00	$2,886.00
12	2020-Q1	GGGGGGG	$2,351.00	$7,101.00
12	2020-Q1	AA	$2,351.00	$7,101.00
13	2020-Q1	GGGGGGG	$0.00	$550.00
13	2020-Q1	EEE	$0.00	$550.00
13	2020-Q1	AA	$0.00	$550.00
14	2020-Q1	GGGGGGG	$3,796.00	$6,055.00
14	2020-Q1	AA	$3,796.00	$6,055.00
14	2020-Q1	BB	$3,796.00	$6,055.00
14	2020-Q1	CCC	$3,796.00	$6,055.00
14	2020-Q1	EEE	$3,796.00	$6,055.00
14	2020-Q1	DD	$3,796.00	$6,055.00
14	2020-Q1	FF	$3,796.00	$6,055.00
12	2020-Q2	GGGGGGG	$2,351.00	$7,101.00
15	2020-Q2	CCC	$0.00	$0.00
15	2020-Q2	AA	$0.00	$0.00
13	2020-Q2	BB	$0.00	$550.00
13	2020-Q2	CCC	$0.00	$550.00
16	2020-Q2	GGGGGGG	$2,224.00	$0.00
16	2020-Q2	AA	$2,224.00	$0.00
16	2020-Q2	BB	$2,224.00	$0.00
16	2020-Q2	CCC	$2,224.00	$0.00
16	2020-Q2	EEE	$2,224.00	$0.00
16	2020-Q2	DD	$2,224.00	$0.00
16	2020-Q2	FF	$2,224.00	$0.00
;

data want;
 if 0 then set have(keep=id time group);
 length _group $100;
 do until(last.id);
  set have;
  by id notsorted;
  _group=catx('/',_group,group);
  no_of_grps=sum(no_of_grps,1);
 end;
 drop group;
 rename _group=group;
run;

View solution in original post

8 REPLIES 8
novinosrin
Tourmaline | Level 20


data have;
 input ID	(Time	Group	Salary_Q1	Salary_Q2) (:$50.);
 cards;
1	2020-Q1	GGGGGGG	$295.00	$88.00
2	2020-Q1	AA	$1,918.00	$2,309.00
3	2020-Q1	BB	$0.00	$0.00
4	2020-Q1	CCC	$0.00	$0.00
5	2020-Q1	EEE	$2,900.00	$2,466.00
6	2020-Q1	DD	$0.00	$0.00
7	2020-Q1	FF	$0.00	$0.00
8	2020-Q1	EEE	$0.00	$0.00
8	2020-Q1	AA	$0.00	$0.00
9	2020-Q1	BB	$0.00	$0.00
9	2020-Q1	CCC	$0.00	$0.00
10	2020-Q1	CCC	$0.00	$0.00
10	2020-Q1	AA	$0.00	$0.00
11	2020-Q1	BB	$3,964.00	$2,886.00
11	2020-Q1	DD	$3,964.00	$2,886.00
12	2020-Q1	GGGGGGG	$2,351.00	$7,101.00
12	2020-Q1	AA	$2,351.00	$7,101.00
13	2020-Q1	GGGGGGG	$0.00	$550.00
13	2020-Q1	EEE	$0.00	$550.00
13	2020-Q1	AA	$0.00	$550.00
14	2020-Q1	GGGGGGG	$3,796.00	$6,055.00
14	2020-Q1	AA	$3,796.00	$6,055.00
14	2020-Q1	BB	$3,796.00	$6,055.00
14	2020-Q1	CCC	$3,796.00	$6,055.00
14	2020-Q1	EEE	$3,796.00	$6,055.00
14	2020-Q1	DD	$3,796.00	$6,055.00
14	2020-Q1	FF	$3,796.00	$6,055.00
12	2020-Q2	GGGGGGG	$2,351.00	$7,101.00
15	2020-Q2	CCC	$0.00	$0.00
15	2020-Q2	AA	$0.00	$0.00
13	2020-Q2	BB	$0.00	$550.00
13	2020-Q2	CCC	$0.00	$550.00
16	2020-Q2	GGGGGGG	$2,224.00	$0.00
16	2020-Q2	AA	$2,224.00	$0.00
16	2020-Q2	BB	$2,224.00	$0.00
16	2020-Q2	CCC	$2,224.00	$0.00
16	2020-Q2	EEE	$2,224.00	$0.00
16	2020-Q2	DD	$2,224.00	$0.00
16	2020-Q2	FF	$2,224.00	$0.00
;

data want;
 if 0 then set have(keep=id time group);
 length _group $100;
 do until(last.id);
  set have;
  by id notsorted;
  _group=catx('/',_group,group);
  no_of_grps=sum(no_of_grps,1);
 end;
 drop group;
 rename _group=group;
run;
novinosrin
Tourmaline | Level 20

Hi @d0816  My previous post didn't account for "in alphabetical order". My sincere apologies that I overlooked that part of your question. Here below I have modifed the code to handle the order-


data have;
 input ID	(Time	Group	Salary_Q1	Salary_Q2) (:$50.);
 cards;
1	2020-Q1	GGGGGGG	$295.00	$88.00
2	2020-Q1	AA	$1,918.00	$2,309.00
3	2020-Q1	BB	$0.00	$0.00
4	2020-Q1	CCC	$0.00	$0.00
5	2020-Q1	EEE	$2,900.00	$2,466.00
6	2020-Q1	DD	$0.00	$0.00
7	2020-Q1	FF	$0.00	$0.00
8	2020-Q1	EEE	$0.00	$0.00
8	2020-Q1	AA	$0.00	$0.00
9	2020-Q1	BB	$0.00	$0.00
9	2020-Q1	CCC	$0.00	$0.00
10	2020-Q1	CCC	$0.00	$0.00
10	2020-Q1	AA	$0.00	$0.00
11	2020-Q1	BB	$3,964.00	$2,886.00
11	2020-Q1	DD	$3,964.00	$2,886.00
12	2020-Q1	GGGGGGG	$2,351.00	$7,101.00
12	2020-Q1	AA	$2,351.00	$7,101.00
13	2020-Q1	GGGGGGG	$0.00	$550.00
13	2020-Q1	EEE	$0.00	$550.00
13	2020-Q1	AA	$0.00	$550.00
14	2020-Q1	GGGGGGG	$3,796.00	$6,055.00
14	2020-Q1	AA	$3,796.00	$6,055.00
14	2020-Q1	BB	$3,796.00	$6,055.00
14	2020-Q1	CCC	$3,796.00	$6,055.00
14	2020-Q1	EEE	$3,796.00	$6,055.00
14	2020-Q1	DD	$3,796.00	$6,055.00
14	2020-Q1	FF	$3,796.00	$6,055.00
12	2020-Q2	GGGGGGG	$2,351.00	$7,101.00
15	2020-Q2	CCC	$0.00	$0.00
15	2020-Q2	AA	$0.00	$0.00
13	2020-Q2	BB	$0.00	$550.00
13	2020-Q2	CCC	$0.00	$550.00
16	2020-Q2	GGGGGGG	$2,224.00	$0.00
16	2020-Q2	AA	$2,224.00	$0.00
16	2020-Q2	BB	$2,224.00	$0.00
16	2020-Q2	CCC	$2,224.00	$0.00
16	2020-Q2	EEE	$2,224.00	$0.00
16	2020-Q2	DD	$2,224.00	$0.00
16	2020-Q2	FF	$2,224.00	$0.00
;


data want;
 if 0 then set have(keep=id time group);
 length _group $100;
 array t(7) $100 _temporary_;
 call missing(of t(*));
 do _n_=1 by 1 until(last.id);
  set have;
  by id notsorted;
  t(_n_)=group;
  no_of_grps=sum(no_of_grps,1);
 end;
 call sortc(of t(*));
 _group=catx('/',of t(*));
 drop group;
 rename _group=group;
run;
d0816
Quartz | Level 8

@novinosrin  I hit the first one as solution when I intended to select this updated one as solution. Sorry about that.

d0816
Quartz | Level 8

@novinosrin Thank you. Appreciate your help. Code worked well.

r_behata
Barite | Level 11
data have;
	input ID 	Time $	Group	$ Salary_Q1	:dollar12.2 Salary_Q2 :dollar12.2;
	cards;
1 2020-Q1 GGGGGGG $295.00 $88.00
2 2020-Q1 AA $1,918.00 $2,309.00
3 2020-Q1 BB $0.00 $0.00
4 2020-Q1 CCC $0.00 $0.00
5 2020-Q1 EEE $2,900.00 $2,466.00
6 2020-Q1 DD $0.00 $0.00
7 2020-Q1 FF $0.00 $0.00
8 2020-Q1 EEE $0.00 $0.00
8 2020-Q1 AA $0.00 $0.00
9 2020-Q1 BB $0.00 $0.00
9 2020-Q1 CCC $0.00 $0.00
10 2020-Q1 CCC $0.00 $0.00
10 2020-Q1 AA $0.00 $0.00
11 2020-Q1 BB $3,964.00 $2,886.00
11 2020-Q1 DD $3,964.00 $2,886.00
12 2020-Q1 GGGGGGG $2,351.00 $7,101.00
12 2020-Q1 AA $2,351.00 $7,101.00
13 2020-Q1 GGGGGGG $0.00 $550.00
13 2020-Q1 EEE $0.00 $550.00
13 2020-Q1 AA $0.00 $550.00
14 2020-Q1 GGGGGGG $3,796.00 $6,055.00
14 2020-Q1 AA $3,796.00 $6,055.00
14 2020-Q1 BB $3,796.00 $6,055.00
14 2020-Q1 CCC $3,796.00 $6,055.00
14 2020-Q1 EEE $3,796.00 $6,055.00
14 2020-Q1 DD $3,796.00 $6,055.00
14 2020-Q1 FF $3,796.00 $6,055.00
12 2020-Q2 GGGGGGG $2,351.00 $7,101.00
15 2020-Q2 CCC $0.00 $0.00
15 2020-Q2 AA $0.00 $0.00
13 2020-Q2 BB $0.00 $550.00
13 2020-Q2 CCC $0.00 $550.00
16 2020-Q2 GGGGGGG $2,224.00 $0.00
16 2020-Q2 AA $2,224.00 $0.00
16 2020-Q2 BB $2,224.00 $0.00
16 2020-Q2 CCC $2,224.00 $0.00
16 2020-Q2 EEE $2,224.00 $0.00
16 2020-Q2 DD $2,224.00 $0.00
16 2020-Q2 FF $2,224.00 $0.00
 ;
run;

proc sort data=have;
	by id  Time group;
run;

data want(rename=(Group_=Group));
	length Group_ $50.;
	No_of_Groups=0;
	Group_='';

	do  until(last.Time);
		set have;
		by id  Time group;
		No_of_Groups=sum(No_of_Groups,1);
		Group_=catx('/',Group_,group);
	end;

	drop Group;
run;
d0816
Quartz | Level 8

@r_behata This code also worked. Thank you. Appreciate your help. 

mkeintz
PROC Star

 

data have;
	input ID 	Time $	Group	$ Salary_Q1	:dollar12.2 Salary_Q2 :dollar12.2;
	cards;
1 2020-Q1 GGGGGGG $295.00 $88.00
2 2020-Q1 AA $1,918.00 $2,309.00
3 2020-Q1 BB $0.00 $0.00
4 2020-Q1 CCC $0.00 $0.00
5 2020-Q1 EEE $2,900.00 $2,466.00
6 2020-Q1 DD $0.00 $0.00
7 2020-Q1 FF $0.00 $0.00
8 2020-Q1 EEE $0.00 $0.00
8 2020-Q1 AA $0.00 $0.00
9 2020-Q1 BB $0.00 $0.00
9 2020-Q1 CCC $0.00 $0.00
10 2020-Q1 CCC $0.00 $0.00
10 2020-Q1 AA $0.00 $0.00
11 2020-Q1 BB $3,964.00 $2,886.00
11 2020-Q1 DD $3,964.00 $2,886.00
12 2020-Q1 GGGGGGG $2,351.00 $7,101.00
12 2020-Q1 AA $2,351.00 $7,101.00
13 2020-Q1 GGGGGGG $0.00 $550.00
13 2020-Q1 EEE $0.00 $550.00
13 2020-Q1 AA $0.00 $550.00
14 2020-Q1 GGGGGGG $3,796.00 $6,055.00
14 2020-Q1 AA $3,796.00 $6,055.00
14 2020-Q1 BB $3,796.00 $6,055.00
14 2020-Q1 CCC $3,796.00 $6,055.00
14 2020-Q1 EEE $3,796.00 $6,055.00
14 2020-Q1 DD $3,796.00 $6,055.00
14 2020-Q1 FF $3,796.00 $6,055.00
12 2020-Q2 GGGGGGG $2,351.00 $7,101.00
15 2020-Q2 CCC $0.00 $0.00
15 2020-Q2 AA $0.00 $0.00
13 2020-Q2 BB $0.00 $550.00
13 2020-Q2 CCC $0.00 $550.00
16 2020-Q2 GGGGGGG $2,224.00 $0.00
16 2020-Q2 AA $2,224.00 $0.00
16 2020-Q2 BB $2,224.00 $0.00
16 2020-Q2 CCC $2,224.00 $0.00
16 2020-Q2 EEE $2,224.00 $0.00
16 2020-Q2 DD $2,224.00 $0.00
16 2020-Q2 FF $2,224.00 $0.00
run;

data want;
  set have;
  by id time notsorted;
  length groups $50;
  array grps {0:19} $8 _temporary_;
  grps{mod(_n_,20)}=group;
  if last.time;
  n_grps=_n_-coalesce(lag(_n_),0);
  call sortc(of grps{*});
  groups=catx('/',of grps{*});
  call missing(of grps{*});
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
d0816
Quartz | Level 8

@mkeintz  Thank you. Appreciate your help. This code also works.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1821 views
  • 4 likes
  • 4 in conversation