Hi, I have this this dataset. For each "Time" category, for ID with multiple rows, I want to collapse it into one row per ID. Group category is seven types and I don't anticipate this max number to change. I also want to put the groups in one column and in alphabetical order separated by / and also want to count the number of groups for each ID in a separate column. How can I achieve this? Thank you for your help.
ID | Time | Group | Salary_Q1 | Salary_Q2 |
1 | 2020-Q1 | GGGGGGG | $295.00 | $88.00 |
2 | 2020-Q1 | AA | $1,918.00 | $2,309.00 |
3 | 2020-Q1 | BB | $0.00 | $0.00 |
4 | 2020-Q1 | CCC | $0.00 | $0.00 |
5 | 2020-Q1 | EEE | $2,900.00 | $2,466.00 |
6 | 2020-Q1 | DD | $0.00 | $0.00 |
7 | 2020-Q1 | FF | $0.00 | $0.00 |
8 | 2020-Q1 | EEE | $0.00 | $0.00 |
8 | 2020-Q1 | AA | $0.00 | $0.00 |
9 | 2020-Q1 | BB | $0.00 | $0.00 |
9 | 2020-Q1 | CCC | $0.00 | $0.00 |
10 | 2020-Q1 | CCC | $0.00 | $0.00 |
10 | 2020-Q1 | AA | $0.00 | $0.00 |
11 | 2020-Q1 | BB | $3,964.00 | $2,886.00 |
11 | 2020-Q1 | DD | $3,964.00 | $2,886.00 |
12 | 2020-Q1 | GGGGGGG | $2,351.00 | $7,101.00 |
12 | 2020-Q1 | AA | $2,351.00 | $7,101.00 |
13 | 2020-Q1 | GGGGGGG | $0.00 | $550.00 |
13 | 2020-Q1 | EEE | $0.00 | $550.00 |
13 | 2020-Q1 | AA | $0.00 | $550.00 |
14 | 2020-Q1 | GGGGGGG | $3,796.00 | $6,055.00 |
14 | 2020-Q1 | AA | $3,796.00 | $6,055.00 |
14 | 2020-Q1 | BB | $3,796.00 | $6,055.00 |
14 | 2020-Q1 | CCC | $3,796.00 | $6,055.00 |
14 | 2020-Q1 | EEE | $3,796.00 | $6,055.00 |
14 | 2020-Q1 | DD | $3,796.00 | $6,055.00 |
14 | 2020-Q1 | FF | $3,796.00 | $6,055.00 |
12 | 2020-Q2 | GGGGGGG | $2,351.00 | $7,101.00 |
15 | 2020-Q2 | CCC | $0.00 | $0.00 |
15 | 2020-Q2 | AA | $0.00 | $0.00 |
13 | 2020-Q2 | BB | $0.00 | $550.00 |
13 | 2020-Q2 | CCC | $0.00 | $550.00 |
16 | 2020-Q2 | GGGGGGG | $2,224.00 | $0.00 |
16 | 2020-Q2 | AA | $2,224.00 | $0.00 |
16 | 2020-Q2 | BB | $2,224.00 | $0.00 |
16 | 2020-Q2 | CCC | $2,224.00 | $0.00 |
16 | 2020-Q2 | EEE | $2,224.00 | $0.00 |
16 | 2020-Q2 | DD | $2,224.00 | $0.00 |
16 | 2020-Q2 | FF | $2,224.00 | $0.00 |
Dataset I want:
ID | Time | Group | Salary_Q1 | Salary_Q2 | No.of Groups |
1 | 2020-Q1 | GGGGGGG | $295.00 | $88.00 | 1 |
2 | 2020-Q1 | AA | $1,918.00 | $2,309.00 | 1 |
3 | 2020-Q1 | BB | $0.00 | $0.00 | 1 |
4 | 2020-Q1 | CCC | $0.00 | $0.00 | 1 |
5 | 2020-Q1 | EEE | $2,900.00 | $2,466.00 | 1 |
6 | 2020-Q1 | DD | $0.00 | $0.00 | 1 |
7 | 2020-Q1 | FF | $0.00 | $0.00 | 1 |
8 | 2020-Q1 | AA/EEE | $0.00 | $0.00 | 2 |
9 | 2020-Q1 | BB/CCC | $0.00 | $0.00 | 2 |
10 | 2020-Q1 | AA/CCC | $0.00 | $0.00 | 2 |
11 | 2020-Q1 | BB/DD | $3,964.00 | $2,886.00 | 2 |
12 | 2020-Q1 | AA/GGGGGGG | $2,351.00 | $7,101.00 | 2 |
13 | 2020-Q1 | AA/EEE/GGGGGGG | $0.00 | $550.00 | 3 |
14 | 2020-Q1 | AA/BB/CCC/DD/EEE/FF/GGGGGGG | $3,796.00 | $6,055.00 | 7 |
12 | 2020-Q2 | GGGGGGG | $2,351.00 | $7,101.00 | 1 |
15 | 2020-Q2 | AA/CCC | $0.00 | $0.00 | 2 |
13 | 2020-Q2 | BB/CCC | $0.00 | $550.00 | 2 |
16 | 2020-Q2 | AA/BB/CCC/DD/EEE/FF/GGGGGGG | $2,224.00 | $0.00 | 7 |
data have;
input ID (Time Group Salary_Q1 Salary_Q2) (:$50.);
cards;
1 2020-Q1 GGGGGGG $295.00 $88.00
2 2020-Q1 AA $1,918.00 $2,309.00
3 2020-Q1 BB $0.00 $0.00
4 2020-Q1 CCC $0.00 $0.00
5 2020-Q1 EEE $2,900.00 $2,466.00
6 2020-Q1 DD $0.00 $0.00
7 2020-Q1 FF $0.00 $0.00
8 2020-Q1 EEE $0.00 $0.00
8 2020-Q1 AA $0.00 $0.00
9 2020-Q1 BB $0.00 $0.00
9 2020-Q1 CCC $0.00 $0.00
10 2020-Q1 CCC $0.00 $0.00
10 2020-Q1 AA $0.00 $0.00
11 2020-Q1 BB $3,964.00 $2,886.00
11 2020-Q1 DD $3,964.00 $2,886.00
12 2020-Q1 GGGGGGG $2,351.00 $7,101.00
12 2020-Q1 AA $2,351.00 $7,101.00
13 2020-Q1 GGGGGGG $0.00 $550.00
13 2020-Q1 EEE $0.00 $550.00
13 2020-Q1 AA $0.00 $550.00
14 2020-Q1 GGGGGGG $3,796.00 $6,055.00
14 2020-Q1 AA $3,796.00 $6,055.00
14 2020-Q1 BB $3,796.00 $6,055.00
14 2020-Q1 CCC $3,796.00 $6,055.00
14 2020-Q1 EEE $3,796.00 $6,055.00
14 2020-Q1 DD $3,796.00 $6,055.00
14 2020-Q1 FF $3,796.00 $6,055.00
12 2020-Q2 GGGGGGG $2,351.00 $7,101.00
15 2020-Q2 CCC $0.00 $0.00
15 2020-Q2 AA $0.00 $0.00
13 2020-Q2 BB $0.00 $550.00
13 2020-Q2 CCC $0.00 $550.00
16 2020-Q2 GGGGGGG $2,224.00 $0.00
16 2020-Q2 AA $2,224.00 $0.00
16 2020-Q2 BB $2,224.00 $0.00
16 2020-Q2 CCC $2,224.00 $0.00
16 2020-Q2 EEE $2,224.00 $0.00
16 2020-Q2 DD $2,224.00 $0.00
16 2020-Q2 FF $2,224.00 $0.00
;
data want;
if 0 then set have(keep=id time group);
length _group $100;
do until(last.id);
set have;
by id notsorted;
_group=catx('/',_group,group);
no_of_grps=sum(no_of_grps,1);
end;
drop group;
rename _group=group;
run;
data have;
input ID (Time Group Salary_Q1 Salary_Q2) (:$50.);
cards;
1 2020-Q1 GGGGGGG $295.00 $88.00
2 2020-Q1 AA $1,918.00 $2,309.00
3 2020-Q1 BB $0.00 $0.00
4 2020-Q1 CCC $0.00 $0.00
5 2020-Q1 EEE $2,900.00 $2,466.00
6 2020-Q1 DD $0.00 $0.00
7 2020-Q1 FF $0.00 $0.00
8 2020-Q1 EEE $0.00 $0.00
8 2020-Q1 AA $0.00 $0.00
9 2020-Q1 BB $0.00 $0.00
9 2020-Q1 CCC $0.00 $0.00
10 2020-Q1 CCC $0.00 $0.00
10 2020-Q1 AA $0.00 $0.00
11 2020-Q1 BB $3,964.00 $2,886.00
11 2020-Q1 DD $3,964.00 $2,886.00
12 2020-Q1 GGGGGGG $2,351.00 $7,101.00
12 2020-Q1 AA $2,351.00 $7,101.00
13 2020-Q1 GGGGGGG $0.00 $550.00
13 2020-Q1 EEE $0.00 $550.00
13 2020-Q1 AA $0.00 $550.00
14 2020-Q1 GGGGGGG $3,796.00 $6,055.00
14 2020-Q1 AA $3,796.00 $6,055.00
14 2020-Q1 BB $3,796.00 $6,055.00
14 2020-Q1 CCC $3,796.00 $6,055.00
14 2020-Q1 EEE $3,796.00 $6,055.00
14 2020-Q1 DD $3,796.00 $6,055.00
14 2020-Q1 FF $3,796.00 $6,055.00
12 2020-Q2 GGGGGGG $2,351.00 $7,101.00
15 2020-Q2 CCC $0.00 $0.00
15 2020-Q2 AA $0.00 $0.00
13 2020-Q2 BB $0.00 $550.00
13 2020-Q2 CCC $0.00 $550.00
16 2020-Q2 GGGGGGG $2,224.00 $0.00
16 2020-Q2 AA $2,224.00 $0.00
16 2020-Q2 BB $2,224.00 $0.00
16 2020-Q2 CCC $2,224.00 $0.00
16 2020-Q2 EEE $2,224.00 $0.00
16 2020-Q2 DD $2,224.00 $0.00
16 2020-Q2 FF $2,224.00 $0.00
;
data want;
if 0 then set have(keep=id time group);
length _group $100;
do until(last.id);
set have;
by id notsorted;
_group=catx('/',_group,group);
no_of_grps=sum(no_of_grps,1);
end;
drop group;
rename _group=group;
run;
Hi @d0816 My previous post didn't account for "in alphabetical order". My sincere apologies that I overlooked that part of your question. Here below I have modifed the code to handle the order-
data have;
input ID (Time Group Salary_Q1 Salary_Q2) (:$50.);
cards;
1 2020-Q1 GGGGGGG $295.00 $88.00
2 2020-Q1 AA $1,918.00 $2,309.00
3 2020-Q1 BB $0.00 $0.00
4 2020-Q1 CCC $0.00 $0.00
5 2020-Q1 EEE $2,900.00 $2,466.00
6 2020-Q1 DD $0.00 $0.00
7 2020-Q1 FF $0.00 $0.00
8 2020-Q1 EEE $0.00 $0.00
8 2020-Q1 AA $0.00 $0.00
9 2020-Q1 BB $0.00 $0.00
9 2020-Q1 CCC $0.00 $0.00
10 2020-Q1 CCC $0.00 $0.00
10 2020-Q1 AA $0.00 $0.00
11 2020-Q1 BB $3,964.00 $2,886.00
11 2020-Q1 DD $3,964.00 $2,886.00
12 2020-Q1 GGGGGGG $2,351.00 $7,101.00
12 2020-Q1 AA $2,351.00 $7,101.00
13 2020-Q1 GGGGGGG $0.00 $550.00
13 2020-Q1 EEE $0.00 $550.00
13 2020-Q1 AA $0.00 $550.00
14 2020-Q1 GGGGGGG $3,796.00 $6,055.00
14 2020-Q1 AA $3,796.00 $6,055.00
14 2020-Q1 BB $3,796.00 $6,055.00
14 2020-Q1 CCC $3,796.00 $6,055.00
14 2020-Q1 EEE $3,796.00 $6,055.00
14 2020-Q1 DD $3,796.00 $6,055.00
14 2020-Q1 FF $3,796.00 $6,055.00
12 2020-Q2 GGGGGGG $2,351.00 $7,101.00
15 2020-Q2 CCC $0.00 $0.00
15 2020-Q2 AA $0.00 $0.00
13 2020-Q2 BB $0.00 $550.00
13 2020-Q2 CCC $0.00 $550.00
16 2020-Q2 GGGGGGG $2,224.00 $0.00
16 2020-Q2 AA $2,224.00 $0.00
16 2020-Q2 BB $2,224.00 $0.00
16 2020-Q2 CCC $2,224.00 $0.00
16 2020-Q2 EEE $2,224.00 $0.00
16 2020-Q2 DD $2,224.00 $0.00
16 2020-Q2 FF $2,224.00 $0.00
;
data want;
if 0 then set have(keep=id time group);
length _group $100;
array t(7) $100 _temporary_;
call missing(of t(*));
do _n_=1 by 1 until(last.id);
set have;
by id notsorted;
t(_n_)=group;
no_of_grps=sum(no_of_grps,1);
end;
call sortc(of t(*));
_group=catx('/',of t(*));
drop group;
rename _group=group;
run;
@novinosrin I hit the first one as solution when I intended to select this updated one as solution. Sorry about that.
@novinosrin Thank you. Appreciate your help. Code worked well.
data have;
input ID Time $ Group $ Salary_Q1 :dollar12.2 Salary_Q2 :dollar12.2;
cards;
1 2020-Q1 GGGGGGG $295.00 $88.00
2 2020-Q1 AA $1,918.00 $2,309.00
3 2020-Q1 BB $0.00 $0.00
4 2020-Q1 CCC $0.00 $0.00
5 2020-Q1 EEE $2,900.00 $2,466.00
6 2020-Q1 DD $0.00 $0.00
7 2020-Q1 FF $0.00 $0.00
8 2020-Q1 EEE $0.00 $0.00
8 2020-Q1 AA $0.00 $0.00
9 2020-Q1 BB $0.00 $0.00
9 2020-Q1 CCC $0.00 $0.00
10 2020-Q1 CCC $0.00 $0.00
10 2020-Q1 AA $0.00 $0.00
11 2020-Q1 BB $3,964.00 $2,886.00
11 2020-Q1 DD $3,964.00 $2,886.00
12 2020-Q1 GGGGGGG $2,351.00 $7,101.00
12 2020-Q1 AA $2,351.00 $7,101.00
13 2020-Q1 GGGGGGG $0.00 $550.00
13 2020-Q1 EEE $0.00 $550.00
13 2020-Q1 AA $0.00 $550.00
14 2020-Q1 GGGGGGG $3,796.00 $6,055.00
14 2020-Q1 AA $3,796.00 $6,055.00
14 2020-Q1 BB $3,796.00 $6,055.00
14 2020-Q1 CCC $3,796.00 $6,055.00
14 2020-Q1 EEE $3,796.00 $6,055.00
14 2020-Q1 DD $3,796.00 $6,055.00
14 2020-Q1 FF $3,796.00 $6,055.00
12 2020-Q2 GGGGGGG $2,351.00 $7,101.00
15 2020-Q2 CCC $0.00 $0.00
15 2020-Q2 AA $0.00 $0.00
13 2020-Q2 BB $0.00 $550.00
13 2020-Q2 CCC $0.00 $550.00
16 2020-Q2 GGGGGGG $2,224.00 $0.00
16 2020-Q2 AA $2,224.00 $0.00
16 2020-Q2 BB $2,224.00 $0.00
16 2020-Q2 CCC $2,224.00 $0.00
16 2020-Q2 EEE $2,224.00 $0.00
16 2020-Q2 DD $2,224.00 $0.00
16 2020-Q2 FF $2,224.00 $0.00
;
run;
proc sort data=have;
by id Time group;
run;
data want(rename=(Group_=Group));
length Group_ $50.;
No_of_Groups=0;
Group_='';
do until(last.Time);
set have;
by id Time group;
No_of_Groups=sum(No_of_Groups,1);
Group_=catx('/',Group_,group);
end;
drop Group;
run;
@r_behata This code also worked. Thank you. Appreciate your help.
data have;
input ID Time $ Group $ Salary_Q1 :dollar12.2 Salary_Q2 :dollar12.2;
cards;
1 2020-Q1 GGGGGGG $295.00 $88.00
2 2020-Q1 AA $1,918.00 $2,309.00
3 2020-Q1 BB $0.00 $0.00
4 2020-Q1 CCC $0.00 $0.00
5 2020-Q1 EEE $2,900.00 $2,466.00
6 2020-Q1 DD $0.00 $0.00
7 2020-Q1 FF $0.00 $0.00
8 2020-Q1 EEE $0.00 $0.00
8 2020-Q1 AA $0.00 $0.00
9 2020-Q1 BB $0.00 $0.00
9 2020-Q1 CCC $0.00 $0.00
10 2020-Q1 CCC $0.00 $0.00
10 2020-Q1 AA $0.00 $0.00
11 2020-Q1 BB $3,964.00 $2,886.00
11 2020-Q1 DD $3,964.00 $2,886.00
12 2020-Q1 GGGGGGG $2,351.00 $7,101.00
12 2020-Q1 AA $2,351.00 $7,101.00
13 2020-Q1 GGGGGGG $0.00 $550.00
13 2020-Q1 EEE $0.00 $550.00
13 2020-Q1 AA $0.00 $550.00
14 2020-Q1 GGGGGGG $3,796.00 $6,055.00
14 2020-Q1 AA $3,796.00 $6,055.00
14 2020-Q1 BB $3,796.00 $6,055.00
14 2020-Q1 CCC $3,796.00 $6,055.00
14 2020-Q1 EEE $3,796.00 $6,055.00
14 2020-Q1 DD $3,796.00 $6,055.00
14 2020-Q1 FF $3,796.00 $6,055.00
12 2020-Q2 GGGGGGG $2,351.00 $7,101.00
15 2020-Q2 CCC $0.00 $0.00
15 2020-Q2 AA $0.00 $0.00
13 2020-Q2 BB $0.00 $550.00
13 2020-Q2 CCC $0.00 $550.00
16 2020-Q2 GGGGGGG $2,224.00 $0.00
16 2020-Q2 AA $2,224.00 $0.00
16 2020-Q2 BB $2,224.00 $0.00
16 2020-Q2 CCC $2,224.00 $0.00
16 2020-Q2 EEE $2,224.00 $0.00
16 2020-Q2 DD $2,224.00 $0.00
16 2020-Q2 FF $2,224.00 $0.00
run;
data want;
set have;
by id time notsorted;
length groups $50;
array grps {0:19} $8 _temporary_;
grps{mod(_n_,20)}=group;
if last.time;
n_grps=_n_-coalesce(lag(_n_),0);
call sortc(of grps{*});
groups=catx('/',of grps{*});
call missing(of grps{*});
run;
@mkeintz Thank you. Appreciate your help. This code also works.
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.