BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
d0816
Quartz | Level 8

Hi, I have this this dataset. For each "Time" category, for ID with multiple rows, I want to collapse it into one row per ID. Group category is seven types and I don't anticipate this max number to change. I also want to put the groups in one column and in alphabetical order separated by / and also want to count the number of groups for each ID in a separate column. How can I achieve this? Thank you for your help.

IDTimeGroupSalary_Q1Salary_Q2
12020-Q1GGGGGGG$295.00$88.00
22020-Q1AA$1,918.00$2,309.00
32020-Q1BB$0.00$0.00
42020-Q1CCC$0.00$0.00
52020-Q1EEE$2,900.00$2,466.00
62020-Q1DD$0.00$0.00
72020-Q1FF$0.00$0.00
82020-Q1EEE$0.00$0.00
82020-Q1AA$0.00$0.00
92020-Q1BB$0.00$0.00
92020-Q1CCC$0.00$0.00
102020-Q1CCC$0.00$0.00
102020-Q1AA$0.00$0.00
112020-Q1BB$3,964.00$2,886.00
112020-Q1DD$3,964.00$2,886.00
122020-Q1GGGGGGG$2,351.00$7,101.00
122020-Q1AA$2,351.00$7,101.00
132020-Q1GGGGGGG$0.00$550.00
132020-Q1EEE$0.00$550.00
132020-Q1AA$0.00$550.00
142020-Q1GGGGGGG$3,796.00$6,055.00
142020-Q1AA$3,796.00$6,055.00
142020-Q1BB$3,796.00$6,055.00
142020-Q1CCC$3,796.00$6,055.00
142020-Q1EEE$3,796.00$6,055.00
142020-Q1DD$3,796.00$6,055.00
142020-Q1FF$3,796.00$6,055.00
122020-Q2GGGGGGG$2,351.00$7,101.00
152020-Q2CCC$0.00$0.00
152020-Q2AA$0.00$0.00
132020-Q2BB$0.00$550.00
132020-Q2CCC$0.00$550.00
162020-Q2GGGGGGG$2,224.00$0.00
162020-Q2AA$2,224.00$0.00
162020-Q2BB$2,224.00$0.00
162020-Q2CCC$2,224.00$0.00
162020-Q2EEE$2,224.00$0.00
162020-Q2DD$2,224.00$0.00
162020-Q2FF$2,224.00$0.00

 

Dataset I want:

IDTimeGroupSalary_Q1Salary_Q2No.of Groups
12020-Q1GGGGGGG$295.00$88.001
22020-Q1AA$1,918.00$2,309.001
32020-Q1BB$0.00$0.001
42020-Q1CCC$0.00$0.001
52020-Q1EEE$2,900.00$2,466.001
62020-Q1DD$0.00$0.001
72020-Q1FF$0.00$0.001
82020-Q1AA/EEE$0.00$0.002
92020-Q1BB/CCC$0.00$0.002
102020-Q1AA/CCC$0.00$0.002
112020-Q1BB/DD$3,964.00$2,886.002
122020-Q1AA/GGGGGGG$2,351.00$7,101.002
132020-Q1AA/EEE/GGGGGGG$0.00$550.003
142020-Q1AA/BB/CCC/DD/EEE/FF/GGGGGGG$3,796.00$6,055.007
122020-Q2GGGGGGG$2,351.00$7,101.001
152020-Q2AA/CCC$0.00$0.002
132020-Q2BB/CCC$0.00$550.002
162020-Q2AA/BB/CCC/DD/EEE/FF/GGGGGGG$2,224.00$0.007
1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20


data have;
 input ID	(Time	Group	Salary_Q1	Salary_Q2) (:$50.);
 cards;
1	2020-Q1	GGGGGGG	$295.00	$88.00
2	2020-Q1	AA	$1,918.00	$2,309.00
3	2020-Q1	BB	$0.00	$0.00
4	2020-Q1	CCC	$0.00	$0.00
5	2020-Q1	EEE	$2,900.00	$2,466.00
6	2020-Q1	DD	$0.00	$0.00
7	2020-Q1	FF	$0.00	$0.00
8	2020-Q1	EEE	$0.00	$0.00
8	2020-Q1	AA	$0.00	$0.00
9	2020-Q1	BB	$0.00	$0.00
9	2020-Q1	CCC	$0.00	$0.00
10	2020-Q1	CCC	$0.00	$0.00
10	2020-Q1	AA	$0.00	$0.00
11	2020-Q1	BB	$3,964.00	$2,886.00
11	2020-Q1	DD	$3,964.00	$2,886.00
12	2020-Q1	GGGGGGG	$2,351.00	$7,101.00
12	2020-Q1	AA	$2,351.00	$7,101.00
13	2020-Q1	GGGGGGG	$0.00	$550.00
13	2020-Q1	EEE	$0.00	$550.00
13	2020-Q1	AA	$0.00	$550.00
14	2020-Q1	GGGGGGG	$3,796.00	$6,055.00
14	2020-Q1	AA	$3,796.00	$6,055.00
14	2020-Q1	BB	$3,796.00	$6,055.00
14	2020-Q1	CCC	$3,796.00	$6,055.00
14	2020-Q1	EEE	$3,796.00	$6,055.00
14	2020-Q1	DD	$3,796.00	$6,055.00
14	2020-Q1	FF	$3,796.00	$6,055.00
12	2020-Q2	GGGGGGG	$2,351.00	$7,101.00
15	2020-Q2	CCC	$0.00	$0.00
15	2020-Q2	AA	$0.00	$0.00
13	2020-Q2	BB	$0.00	$550.00
13	2020-Q2	CCC	$0.00	$550.00
16	2020-Q2	GGGGGGG	$2,224.00	$0.00
16	2020-Q2	AA	$2,224.00	$0.00
16	2020-Q2	BB	$2,224.00	$0.00
16	2020-Q2	CCC	$2,224.00	$0.00
16	2020-Q2	EEE	$2,224.00	$0.00
16	2020-Q2	DD	$2,224.00	$0.00
16	2020-Q2	FF	$2,224.00	$0.00
;

data want;
 if 0 then set have(keep=id time group);
 length _group $100;
 do until(last.id);
  set have;
  by id notsorted;
  _group=catx('/',_group,group);
  no_of_grps=sum(no_of_grps,1);
 end;
 drop group;
 rename _group=group;
run;

View solution in original post

8 REPLIES 8
novinosrin
Tourmaline | Level 20


data have;
 input ID	(Time	Group	Salary_Q1	Salary_Q2) (:$50.);
 cards;
1	2020-Q1	GGGGGGG	$295.00	$88.00
2	2020-Q1	AA	$1,918.00	$2,309.00
3	2020-Q1	BB	$0.00	$0.00
4	2020-Q1	CCC	$0.00	$0.00
5	2020-Q1	EEE	$2,900.00	$2,466.00
6	2020-Q1	DD	$0.00	$0.00
7	2020-Q1	FF	$0.00	$0.00
8	2020-Q1	EEE	$0.00	$0.00
8	2020-Q1	AA	$0.00	$0.00
9	2020-Q1	BB	$0.00	$0.00
9	2020-Q1	CCC	$0.00	$0.00
10	2020-Q1	CCC	$0.00	$0.00
10	2020-Q1	AA	$0.00	$0.00
11	2020-Q1	BB	$3,964.00	$2,886.00
11	2020-Q1	DD	$3,964.00	$2,886.00
12	2020-Q1	GGGGGGG	$2,351.00	$7,101.00
12	2020-Q1	AA	$2,351.00	$7,101.00
13	2020-Q1	GGGGGGG	$0.00	$550.00
13	2020-Q1	EEE	$0.00	$550.00
13	2020-Q1	AA	$0.00	$550.00
14	2020-Q1	GGGGGGG	$3,796.00	$6,055.00
14	2020-Q1	AA	$3,796.00	$6,055.00
14	2020-Q1	BB	$3,796.00	$6,055.00
14	2020-Q1	CCC	$3,796.00	$6,055.00
14	2020-Q1	EEE	$3,796.00	$6,055.00
14	2020-Q1	DD	$3,796.00	$6,055.00
14	2020-Q1	FF	$3,796.00	$6,055.00
12	2020-Q2	GGGGGGG	$2,351.00	$7,101.00
15	2020-Q2	CCC	$0.00	$0.00
15	2020-Q2	AA	$0.00	$0.00
13	2020-Q2	BB	$0.00	$550.00
13	2020-Q2	CCC	$0.00	$550.00
16	2020-Q2	GGGGGGG	$2,224.00	$0.00
16	2020-Q2	AA	$2,224.00	$0.00
16	2020-Q2	BB	$2,224.00	$0.00
16	2020-Q2	CCC	$2,224.00	$0.00
16	2020-Q2	EEE	$2,224.00	$0.00
16	2020-Q2	DD	$2,224.00	$0.00
16	2020-Q2	FF	$2,224.00	$0.00
;

data want;
 if 0 then set have(keep=id time group);
 length _group $100;
 do until(last.id);
  set have;
  by id notsorted;
  _group=catx('/',_group,group);
  no_of_grps=sum(no_of_grps,1);
 end;
 drop group;
 rename _group=group;
run;
novinosrin
Tourmaline | Level 20

Hi @d0816  My previous post didn't account for "in alphabetical order". My sincere apologies that I overlooked that part of your question. Here below I have modifed the code to handle the order-


data have;
 input ID	(Time	Group	Salary_Q1	Salary_Q2) (:$50.);
 cards;
1	2020-Q1	GGGGGGG	$295.00	$88.00
2	2020-Q1	AA	$1,918.00	$2,309.00
3	2020-Q1	BB	$0.00	$0.00
4	2020-Q1	CCC	$0.00	$0.00
5	2020-Q1	EEE	$2,900.00	$2,466.00
6	2020-Q1	DD	$0.00	$0.00
7	2020-Q1	FF	$0.00	$0.00
8	2020-Q1	EEE	$0.00	$0.00
8	2020-Q1	AA	$0.00	$0.00
9	2020-Q1	BB	$0.00	$0.00
9	2020-Q1	CCC	$0.00	$0.00
10	2020-Q1	CCC	$0.00	$0.00
10	2020-Q1	AA	$0.00	$0.00
11	2020-Q1	BB	$3,964.00	$2,886.00
11	2020-Q1	DD	$3,964.00	$2,886.00
12	2020-Q1	GGGGGGG	$2,351.00	$7,101.00
12	2020-Q1	AA	$2,351.00	$7,101.00
13	2020-Q1	GGGGGGG	$0.00	$550.00
13	2020-Q1	EEE	$0.00	$550.00
13	2020-Q1	AA	$0.00	$550.00
14	2020-Q1	GGGGGGG	$3,796.00	$6,055.00
14	2020-Q1	AA	$3,796.00	$6,055.00
14	2020-Q1	BB	$3,796.00	$6,055.00
14	2020-Q1	CCC	$3,796.00	$6,055.00
14	2020-Q1	EEE	$3,796.00	$6,055.00
14	2020-Q1	DD	$3,796.00	$6,055.00
14	2020-Q1	FF	$3,796.00	$6,055.00
12	2020-Q2	GGGGGGG	$2,351.00	$7,101.00
15	2020-Q2	CCC	$0.00	$0.00
15	2020-Q2	AA	$0.00	$0.00
13	2020-Q2	BB	$0.00	$550.00
13	2020-Q2	CCC	$0.00	$550.00
16	2020-Q2	GGGGGGG	$2,224.00	$0.00
16	2020-Q2	AA	$2,224.00	$0.00
16	2020-Q2	BB	$2,224.00	$0.00
16	2020-Q2	CCC	$2,224.00	$0.00
16	2020-Q2	EEE	$2,224.00	$0.00
16	2020-Q2	DD	$2,224.00	$0.00
16	2020-Q2	FF	$2,224.00	$0.00
;


data want;
 if 0 then set have(keep=id time group);
 length _group $100;
 array t(7) $100 _temporary_;
 call missing(of t(*));
 do _n_=1 by 1 until(last.id);
  set have;
  by id notsorted;
  t(_n_)=group;
  no_of_grps=sum(no_of_grps,1);
 end;
 call sortc(of t(*));
 _group=catx('/',of t(*));
 drop group;
 rename _group=group;
run;
d0816
Quartz | Level 8

@novinosrin  I hit the first one as solution when I intended to select this updated one as solution. Sorry about that.

d0816
Quartz | Level 8

@novinosrin Thank you. Appreciate your help. Code worked well.

r_behata
Barite | Level 11
data have;
	input ID 	Time $	Group	$ Salary_Q1	:dollar12.2 Salary_Q2 :dollar12.2;
	cards;
1 2020-Q1 GGGGGGG $295.00 $88.00
2 2020-Q1 AA $1,918.00 $2,309.00
3 2020-Q1 BB $0.00 $0.00
4 2020-Q1 CCC $0.00 $0.00
5 2020-Q1 EEE $2,900.00 $2,466.00
6 2020-Q1 DD $0.00 $0.00
7 2020-Q1 FF $0.00 $0.00
8 2020-Q1 EEE $0.00 $0.00
8 2020-Q1 AA $0.00 $0.00
9 2020-Q1 BB $0.00 $0.00
9 2020-Q1 CCC $0.00 $0.00
10 2020-Q1 CCC $0.00 $0.00
10 2020-Q1 AA $0.00 $0.00
11 2020-Q1 BB $3,964.00 $2,886.00
11 2020-Q1 DD $3,964.00 $2,886.00
12 2020-Q1 GGGGGGG $2,351.00 $7,101.00
12 2020-Q1 AA $2,351.00 $7,101.00
13 2020-Q1 GGGGGGG $0.00 $550.00
13 2020-Q1 EEE $0.00 $550.00
13 2020-Q1 AA $0.00 $550.00
14 2020-Q1 GGGGGGG $3,796.00 $6,055.00
14 2020-Q1 AA $3,796.00 $6,055.00
14 2020-Q1 BB $3,796.00 $6,055.00
14 2020-Q1 CCC $3,796.00 $6,055.00
14 2020-Q1 EEE $3,796.00 $6,055.00
14 2020-Q1 DD $3,796.00 $6,055.00
14 2020-Q1 FF $3,796.00 $6,055.00
12 2020-Q2 GGGGGGG $2,351.00 $7,101.00
15 2020-Q2 CCC $0.00 $0.00
15 2020-Q2 AA $0.00 $0.00
13 2020-Q2 BB $0.00 $550.00
13 2020-Q2 CCC $0.00 $550.00
16 2020-Q2 GGGGGGG $2,224.00 $0.00
16 2020-Q2 AA $2,224.00 $0.00
16 2020-Q2 BB $2,224.00 $0.00
16 2020-Q2 CCC $2,224.00 $0.00
16 2020-Q2 EEE $2,224.00 $0.00
16 2020-Q2 DD $2,224.00 $0.00
16 2020-Q2 FF $2,224.00 $0.00
 ;
run;

proc sort data=have;
	by id  Time group;
run;

data want(rename=(Group_=Group));
	length Group_ $50.;
	No_of_Groups=0;
	Group_='';

	do  until(last.Time);
		set have;
		by id  Time group;
		No_of_Groups=sum(No_of_Groups,1);
		Group_=catx('/',Group_,group);
	end;

	drop Group;
run;
d0816
Quartz | Level 8

@r_behata This code also worked. Thank you. Appreciate your help. 

mkeintz
PROC Star

 

data have;
	input ID 	Time $	Group	$ Salary_Q1	:dollar12.2 Salary_Q2 :dollar12.2;
	cards;
1 2020-Q1 GGGGGGG $295.00 $88.00
2 2020-Q1 AA $1,918.00 $2,309.00
3 2020-Q1 BB $0.00 $0.00
4 2020-Q1 CCC $0.00 $0.00
5 2020-Q1 EEE $2,900.00 $2,466.00
6 2020-Q1 DD $0.00 $0.00
7 2020-Q1 FF $0.00 $0.00
8 2020-Q1 EEE $0.00 $0.00
8 2020-Q1 AA $0.00 $0.00
9 2020-Q1 BB $0.00 $0.00
9 2020-Q1 CCC $0.00 $0.00
10 2020-Q1 CCC $0.00 $0.00
10 2020-Q1 AA $0.00 $0.00
11 2020-Q1 BB $3,964.00 $2,886.00
11 2020-Q1 DD $3,964.00 $2,886.00
12 2020-Q1 GGGGGGG $2,351.00 $7,101.00
12 2020-Q1 AA $2,351.00 $7,101.00
13 2020-Q1 GGGGGGG $0.00 $550.00
13 2020-Q1 EEE $0.00 $550.00
13 2020-Q1 AA $0.00 $550.00
14 2020-Q1 GGGGGGG $3,796.00 $6,055.00
14 2020-Q1 AA $3,796.00 $6,055.00
14 2020-Q1 BB $3,796.00 $6,055.00
14 2020-Q1 CCC $3,796.00 $6,055.00
14 2020-Q1 EEE $3,796.00 $6,055.00
14 2020-Q1 DD $3,796.00 $6,055.00
14 2020-Q1 FF $3,796.00 $6,055.00
12 2020-Q2 GGGGGGG $2,351.00 $7,101.00
15 2020-Q2 CCC $0.00 $0.00
15 2020-Q2 AA $0.00 $0.00
13 2020-Q2 BB $0.00 $550.00
13 2020-Q2 CCC $0.00 $550.00
16 2020-Q2 GGGGGGG $2,224.00 $0.00
16 2020-Q2 AA $2,224.00 $0.00
16 2020-Q2 BB $2,224.00 $0.00
16 2020-Q2 CCC $2,224.00 $0.00
16 2020-Q2 EEE $2,224.00 $0.00
16 2020-Q2 DD $2,224.00 $0.00
16 2020-Q2 FF $2,224.00 $0.00
run;

data want;
  set have;
  by id time notsorted;
  length groups $50;
  array grps {0:19} $8 _temporary_;
  grps{mod(_n_,20)}=group;
  if last.time;
  n_grps=_n_-coalesce(lag(_n_),0);
  call sortc(of grps{*});
  groups=catx('/',of grps{*});
  call missing(of grps{*});
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
d0816
Quartz | Level 8

@mkeintz  Thank you. Appreciate your help. This code also works.

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 3010 views
  • 4 likes
  • 4 in conversation