Hi, I have this this dataset. For each "Time" category, for ID with multiple rows, I want to collapse it into one row per ID. Group category is seven types and I don't anticipate this max number to change. I also want to put the groups in one column and in alphabetical order separated by / and also want to count the number of groups for each ID in a separate column. How can I achieve this? Thank you for your help.
ID | Time | Group | Salary_Q1 | Salary_Q2 |
1 | 2020-Q1 | GGGGGGG | $295.00 | $88.00 |
2 | 2020-Q1 | AA | $1,918.00 | $2,309.00 |
3 | 2020-Q1 | BB | $0.00 | $0.00 |
4 | 2020-Q1 | CCC | $0.00 | $0.00 |
5 | 2020-Q1 | EEE | $2,900.00 | $2,466.00 |
6 | 2020-Q1 | DD | $0.00 | $0.00 |
7 | 2020-Q1 | FF | $0.00 | $0.00 |
8 | 2020-Q1 | EEE | $0.00 | $0.00 |
8 | 2020-Q1 | AA | $0.00 | $0.00 |
9 | 2020-Q1 | BB | $0.00 | $0.00 |
9 | 2020-Q1 | CCC | $0.00 | $0.00 |
10 | 2020-Q1 | CCC | $0.00 | $0.00 |
10 | 2020-Q1 | AA | $0.00 | $0.00 |
11 | 2020-Q1 | BB | $3,964.00 | $2,886.00 |
11 | 2020-Q1 | DD | $3,964.00 | $2,886.00 |
12 | 2020-Q1 | GGGGGGG | $2,351.00 | $7,101.00 |
12 | 2020-Q1 | AA | $2,351.00 | $7,101.00 |
13 | 2020-Q1 | GGGGGGG | $0.00 | $550.00 |
13 | 2020-Q1 | EEE | $0.00 | $550.00 |
13 | 2020-Q1 | AA | $0.00 | $550.00 |
14 | 2020-Q1 | GGGGGGG | $3,796.00 | $6,055.00 |
14 | 2020-Q1 | AA | $3,796.00 | $6,055.00 |
14 | 2020-Q1 | BB | $3,796.00 | $6,055.00 |
14 | 2020-Q1 | CCC | $3,796.00 | $6,055.00 |
14 | 2020-Q1 | EEE | $3,796.00 | $6,055.00 |
14 | 2020-Q1 | DD | $3,796.00 | $6,055.00 |
14 | 2020-Q1 | FF | $3,796.00 | $6,055.00 |
12 | 2020-Q2 | GGGGGGG | $2,351.00 | $7,101.00 |
15 | 2020-Q2 | CCC | $0.00 | $0.00 |
15 | 2020-Q2 | AA | $0.00 | $0.00 |
13 | 2020-Q2 | BB | $0.00 | $550.00 |
13 | 2020-Q2 | CCC | $0.00 | $550.00 |
16 | 2020-Q2 | GGGGGGG | $2,224.00 | $0.00 |
16 | 2020-Q2 | AA | $2,224.00 | $0.00 |
16 | 2020-Q2 | BB | $2,224.00 | $0.00 |
16 | 2020-Q2 | CCC | $2,224.00 | $0.00 |
16 | 2020-Q2 | EEE | $2,224.00 | $0.00 |
16 | 2020-Q2 | DD | $2,224.00 | $0.00 |
16 | 2020-Q2 | FF | $2,224.00 | $0.00 |
Dataset I want:
ID | Time | Group | Salary_Q1 | Salary_Q2 | No.of Groups |
1 | 2020-Q1 | GGGGGGG | $295.00 | $88.00 | 1 |
2 | 2020-Q1 | AA | $1,918.00 | $2,309.00 | 1 |
3 | 2020-Q1 | BB | $0.00 | $0.00 | 1 |
4 | 2020-Q1 | CCC | $0.00 | $0.00 | 1 |
5 | 2020-Q1 | EEE | $2,900.00 | $2,466.00 | 1 |
6 | 2020-Q1 | DD | $0.00 | $0.00 | 1 |
7 | 2020-Q1 | FF | $0.00 | $0.00 | 1 |
8 | 2020-Q1 | AA/EEE | $0.00 | $0.00 | 2 |
9 | 2020-Q1 | BB/CCC | $0.00 | $0.00 | 2 |
10 | 2020-Q1 | AA/CCC | $0.00 | $0.00 | 2 |
11 | 2020-Q1 | BB/DD | $3,964.00 | $2,886.00 | 2 |
12 | 2020-Q1 | AA/GGGGGGG | $2,351.00 | $7,101.00 | 2 |
13 | 2020-Q1 | AA/EEE/GGGGGGG | $0.00 | $550.00 | 3 |
14 | 2020-Q1 | AA/BB/CCC/DD/EEE/FF/GGGGGGG | $3,796.00 | $6,055.00 | 7 |
12 | 2020-Q2 | GGGGGGG | $2,351.00 | $7,101.00 | 1 |
15 | 2020-Q2 | AA/CCC | $0.00 | $0.00 | 2 |
13 | 2020-Q2 | BB/CCC | $0.00 | $550.00 | 2 |
16 | 2020-Q2 | AA/BB/CCC/DD/EEE/FF/GGGGGGG | $2,224.00 | $0.00 | 7 |
data have;
input ID (Time Group Salary_Q1 Salary_Q2) (:$50.);
cards;
1 2020-Q1 GGGGGGG $295.00 $88.00
2 2020-Q1 AA $1,918.00 $2,309.00
3 2020-Q1 BB $0.00 $0.00
4 2020-Q1 CCC $0.00 $0.00
5 2020-Q1 EEE $2,900.00 $2,466.00
6 2020-Q1 DD $0.00 $0.00
7 2020-Q1 FF $0.00 $0.00
8 2020-Q1 EEE $0.00 $0.00
8 2020-Q1 AA $0.00 $0.00
9 2020-Q1 BB $0.00 $0.00
9 2020-Q1 CCC $0.00 $0.00
10 2020-Q1 CCC $0.00 $0.00
10 2020-Q1 AA $0.00 $0.00
11 2020-Q1 BB $3,964.00 $2,886.00
11 2020-Q1 DD $3,964.00 $2,886.00
12 2020-Q1 GGGGGGG $2,351.00 $7,101.00
12 2020-Q1 AA $2,351.00 $7,101.00
13 2020-Q1 GGGGGGG $0.00 $550.00
13 2020-Q1 EEE $0.00 $550.00
13 2020-Q1 AA $0.00 $550.00
14 2020-Q1 GGGGGGG $3,796.00 $6,055.00
14 2020-Q1 AA $3,796.00 $6,055.00
14 2020-Q1 BB $3,796.00 $6,055.00
14 2020-Q1 CCC $3,796.00 $6,055.00
14 2020-Q1 EEE $3,796.00 $6,055.00
14 2020-Q1 DD $3,796.00 $6,055.00
14 2020-Q1 FF $3,796.00 $6,055.00
12 2020-Q2 GGGGGGG $2,351.00 $7,101.00
15 2020-Q2 CCC $0.00 $0.00
15 2020-Q2 AA $0.00 $0.00
13 2020-Q2 BB $0.00 $550.00
13 2020-Q2 CCC $0.00 $550.00
16 2020-Q2 GGGGGGG $2,224.00 $0.00
16 2020-Q2 AA $2,224.00 $0.00
16 2020-Q2 BB $2,224.00 $0.00
16 2020-Q2 CCC $2,224.00 $0.00
16 2020-Q2 EEE $2,224.00 $0.00
16 2020-Q2 DD $2,224.00 $0.00
16 2020-Q2 FF $2,224.00 $0.00
;
data want;
if 0 then set have(keep=id time group);
length _group $100;
do until(last.id);
set have;
by id notsorted;
_group=catx('/',_group,group);
no_of_grps=sum(no_of_grps,1);
end;
drop group;
rename _group=group;
run;
data have;
input ID (Time Group Salary_Q1 Salary_Q2) (:$50.);
cards;
1 2020-Q1 GGGGGGG $295.00 $88.00
2 2020-Q1 AA $1,918.00 $2,309.00
3 2020-Q1 BB $0.00 $0.00
4 2020-Q1 CCC $0.00 $0.00
5 2020-Q1 EEE $2,900.00 $2,466.00
6 2020-Q1 DD $0.00 $0.00
7 2020-Q1 FF $0.00 $0.00
8 2020-Q1 EEE $0.00 $0.00
8 2020-Q1 AA $0.00 $0.00
9 2020-Q1 BB $0.00 $0.00
9 2020-Q1 CCC $0.00 $0.00
10 2020-Q1 CCC $0.00 $0.00
10 2020-Q1 AA $0.00 $0.00
11 2020-Q1 BB $3,964.00 $2,886.00
11 2020-Q1 DD $3,964.00 $2,886.00
12 2020-Q1 GGGGGGG $2,351.00 $7,101.00
12 2020-Q1 AA $2,351.00 $7,101.00
13 2020-Q1 GGGGGGG $0.00 $550.00
13 2020-Q1 EEE $0.00 $550.00
13 2020-Q1 AA $0.00 $550.00
14 2020-Q1 GGGGGGG $3,796.00 $6,055.00
14 2020-Q1 AA $3,796.00 $6,055.00
14 2020-Q1 BB $3,796.00 $6,055.00
14 2020-Q1 CCC $3,796.00 $6,055.00
14 2020-Q1 EEE $3,796.00 $6,055.00
14 2020-Q1 DD $3,796.00 $6,055.00
14 2020-Q1 FF $3,796.00 $6,055.00
12 2020-Q2 GGGGGGG $2,351.00 $7,101.00
15 2020-Q2 CCC $0.00 $0.00
15 2020-Q2 AA $0.00 $0.00
13 2020-Q2 BB $0.00 $550.00
13 2020-Q2 CCC $0.00 $550.00
16 2020-Q2 GGGGGGG $2,224.00 $0.00
16 2020-Q2 AA $2,224.00 $0.00
16 2020-Q2 BB $2,224.00 $0.00
16 2020-Q2 CCC $2,224.00 $0.00
16 2020-Q2 EEE $2,224.00 $0.00
16 2020-Q2 DD $2,224.00 $0.00
16 2020-Q2 FF $2,224.00 $0.00
;
data want;
if 0 then set have(keep=id time group);
length _group $100;
do until(last.id);
set have;
by id notsorted;
_group=catx('/',_group,group);
no_of_grps=sum(no_of_grps,1);
end;
drop group;
rename _group=group;
run;
Hi @d0816 My previous post didn't account for "in alphabetical order". My sincere apologies that I overlooked that part of your question. Here below I have modifed the code to handle the order-
data have;
input ID (Time Group Salary_Q1 Salary_Q2) (:$50.);
cards;
1 2020-Q1 GGGGGGG $295.00 $88.00
2 2020-Q1 AA $1,918.00 $2,309.00
3 2020-Q1 BB $0.00 $0.00
4 2020-Q1 CCC $0.00 $0.00
5 2020-Q1 EEE $2,900.00 $2,466.00
6 2020-Q1 DD $0.00 $0.00
7 2020-Q1 FF $0.00 $0.00
8 2020-Q1 EEE $0.00 $0.00
8 2020-Q1 AA $0.00 $0.00
9 2020-Q1 BB $0.00 $0.00
9 2020-Q1 CCC $0.00 $0.00
10 2020-Q1 CCC $0.00 $0.00
10 2020-Q1 AA $0.00 $0.00
11 2020-Q1 BB $3,964.00 $2,886.00
11 2020-Q1 DD $3,964.00 $2,886.00
12 2020-Q1 GGGGGGG $2,351.00 $7,101.00
12 2020-Q1 AA $2,351.00 $7,101.00
13 2020-Q1 GGGGGGG $0.00 $550.00
13 2020-Q1 EEE $0.00 $550.00
13 2020-Q1 AA $0.00 $550.00
14 2020-Q1 GGGGGGG $3,796.00 $6,055.00
14 2020-Q1 AA $3,796.00 $6,055.00
14 2020-Q1 BB $3,796.00 $6,055.00
14 2020-Q1 CCC $3,796.00 $6,055.00
14 2020-Q1 EEE $3,796.00 $6,055.00
14 2020-Q1 DD $3,796.00 $6,055.00
14 2020-Q1 FF $3,796.00 $6,055.00
12 2020-Q2 GGGGGGG $2,351.00 $7,101.00
15 2020-Q2 CCC $0.00 $0.00
15 2020-Q2 AA $0.00 $0.00
13 2020-Q2 BB $0.00 $550.00
13 2020-Q2 CCC $0.00 $550.00
16 2020-Q2 GGGGGGG $2,224.00 $0.00
16 2020-Q2 AA $2,224.00 $0.00
16 2020-Q2 BB $2,224.00 $0.00
16 2020-Q2 CCC $2,224.00 $0.00
16 2020-Q2 EEE $2,224.00 $0.00
16 2020-Q2 DD $2,224.00 $0.00
16 2020-Q2 FF $2,224.00 $0.00
;
data want;
if 0 then set have(keep=id time group);
length _group $100;
array t(7) $100 _temporary_;
call missing(of t(*));
do _n_=1 by 1 until(last.id);
set have;
by id notsorted;
t(_n_)=group;
no_of_grps=sum(no_of_grps,1);
end;
call sortc(of t(*));
_group=catx('/',of t(*));
drop group;
rename _group=group;
run;
@novinosrin I hit the first one as solution when I intended to select this updated one as solution. Sorry about that.
@novinosrin Thank you. Appreciate your help. Code worked well.
data have;
input ID Time $ Group $ Salary_Q1 :dollar12.2 Salary_Q2 :dollar12.2;
cards;
1 2020-Q1 GGGGGGG $295.00 $88.00
2 2020-Q1 AA $1,918.00 $2,309.00
3 2020-Q1 BB $0.00 $0.00
4 2020-Q1 CCC $0.00 $0.00
5 2020-Q1 EEE $2,900.00 $2,466.00
6 2020-Q1 DD $0.00 $0.00
7 2020-Q1 FF $0.00 $0.00
8 2020-Q1 EEE $0.00 $0.00
8 2020-Q1 AA $0.00 $0.00
9 2020-Q1 BB $0.00 $0.00
9 2020-Q1 CCC $0.00 $0.00
10 2020-Q1 CCC $0.00 $0.00
10 2020-Q1 AA $0.00 $0.00
11 2020-Q1 BB $3,964.00 $2,886.00
11 2020-Q1 DD $3,964.00 $2,886.00
12 2020-Q1 GGGGGGG $2,351.00 $7,101.00
12 2020-Q1 AA $2,351.00 $7,101.00
13 2020-Q1 GGGGGGG $0.00 $550.00
13 2020-Q1 EEE $0.00 $550.00
13 2020-Q1 AA $0.00 $550.00
14 2020-Q1 GGGGGGG $3,796.00 $6,055.00
14 2020-Q1 AA $3,796.00 $6,055.00
14 2020-Q1 BB $3,796.00 $6,055.00
14 2020-Q1 CCC $3,796.00 $6,055.00
14 2020-Q1 EEE $3,796.00 $6,055.00
14 2020-Q1 DD $3,796.00 $6,055.00
14 2020-Q1 FF $3,796.00 $6,055.00
12 2020-Q2 GGGGGGG $2,351.00 $7,101.00
15 2020-Q2 CCC $0.00 $0.00
15 2020-Q2 AA $0.00 $0.00
13 2020-Q2 BB $0.00 $550.00
13 2020-Q2 CCC $0.00 $550.00
16 2020-Q2 GGGGGGG $2,224.00 $0.00
16 2020-Q2 AA $2,224.00 $0.00
16 2020-Q2 BB $2,224.00 $0.00
16 2020-Q2 CCC $2,224.00 $0.00
16 2020-Q2 EEE $2,224.00 $0.00
16 2020-Q2 DD $2,224.00 $0.00
16 2020-Q2 FF $2,224.00 $0.00
;
run;
proc sort data=have;
by id Time group;
run;
data want(rename=(Group_=Group));
length Group_ $50.;
No_of_Groups=0;
Group_='';
do until(last.Time);
set have;
by id Time group;
No_of_Groups=sum(No_of_Groups,1);
Group_=catx('/',Group_,group);
end;
drop Group;
run;
@r_behata This code also worked. Thank you. Appreciate your help.
data have;
input ID Time $ Group $ Salary_Q1 :dollar12.2 Salary_Q2 :dollar12.2;
cards;
1 2020-Q1 GGGGGGG $295.00 $88.00
2 2020-Q1 AA $1,918.00 $2,309.00
3 2020-Q1 BB $0.00 $0.00
4 2020-Q1 CCC $0.00 $0.00
5 2020-Q1 EEE $2,900.00 $2,466.00
6 2020-Q1 DD $0.00 $0.00
7 2020-Q1 FF $0.00 $0.00
8 2020-Q1 EEE $0.00 $0.00
8 2020-Q1 AA $0.00 $0.00
9 2020-Q1 BB $0.00 $0.00
9 2020-Q1 CCC $0.00 $0.00
10 2020-Q1 CCC $0.00 $0.00
10 2020-Q1 AA $0.00 $0.00
11 2020-Q1 BB $3,964.00 $2,886.00
11 2020-Q1 DD $3,964.00 $2,886.00
12 2020-Q1 GGGGGGG $2,351.00 $7,101.00
12 2020-Q1 AA $2,351.00 $7,101.00
13 2020-Q1 GGGGGGG $0.00 $550.00
13 2020-Q1 EEE $0.00 $550.00
13 2020-Q1 AA $0.00 $550.00
14 2020-Q1 GGGGGGG $3,796.00 $6,055.00
14 2020-Q1 AA $3,796.00 $6,055.00
14 2020-Q1 BB $3,796.00 $6,055.00
14 2020-Q1 CCC $3,796.00 $6,055.00
14 2020-Q1 EEE $3,796.00 $6,055.00
14 2020-Q1 DD $3,796.00 $6,055.00
14 2020-Q1 FF $3,796.00 $6,055.00
12 2020-Q2 GGGGGGG $2,351.00 $7,101.00
15 2020-Q2 CCC $0.00 $0.00
15 2020-Q2 AA $0.00 $0.00
13 2020-Q2 BB $0.00 $550.00
13 2020-Q2 CCC $0.00 $550.00
16 2020-Q2 GGGGGGG $2,224.00 $0.00
16 2020-Q2 AA $2,224.00 $0.00
16 2020-Q2 BB $2,224.00 $0.00
16 2020-Q2 CCC $2,224.00 $0.00
16 2020-Q2 EEE $2,224.00 $0.00
16 2020-Q2 DD $2,224.00 $0.00
16 2020-Q2 FF $2,224.00 $0.00
run;
data want;
set have;
by id time notsorted;
length groups $50;
array grps {0:19} $8 _temporary_;
grps{mod(_n_,20)}=group;
if last.time;
n_grps=_n_-coalesce(lag(_n_),0);
call sortc(of grps{*});
groups=catx('/',of grps{*});
call missing(of grps{*});
run;
@mkeintz Thank you. Appreciate your help. This code also works.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.