Hello!
I am hoping someone can help me. I have the following data set below (this is just a sample of the dataset) and need the second dataset shown below. Note that Count can range from 1 to 6 in my dataset. Essentially if I capture the rows where count is the maximum for each value of ID, I will have what I need. Any ideas on how to do that?
My SAS code:
data output;
set input;
if count ne 1 then AllDepts=catx(",",AllDepts,Dept);
else AllDepts=Dept;
retain AllDepts;
run;
Dataset I have: | |||
Count | ID | Name | Dept |
1 | 123 | Person 1 | Dept A |
2 | 123 | Person 1 | Dept C |
3 | 123 | Person 1 | Dept E |
1 | 456 | Person 2 | Dept B |
2 | 456 | Person 2 | Dept C |
3 | 456 | Person 2 | Dept F |
4 | 456 | Person 2 | Dept Q |
1 | 567 | Person 3 | Dept B |
Dataset I need: | |||
Count | ID | Name | AllDepts |
3 | 123 | Person 1 | Dept A, Dept C, Dept E |
4 | 456 | Person 2 | Dept B, Dept C, Dept F, Dept Q |
1 | 567 | Person 3 | Dept B |
Dataset I am getting: | |||
Count | ID | Name | AllDepts |
1 | 123 | Person 1 | Dept A |
2 | 123 | Person 1 | Dept A, Dept C |
3 | 123 | Person 1 | Dept A, Dept C, Dept E |
1 | 456 | Person 2 | Dept B |
2 | 456 | Person 2 | Dept B, Dept C |
3 | 456 | Person 2 | Dept B, Dept C, Dept F |
4 | 456 | Person 2 | Dept B, Dept C, Dept F, Dept Q |
1 | 567 | Person 3 | Dept B |
Thanks,
KelseyB
Isn't it necessary to declare AllDepts as a character variable (assuming that it is not contained in the input dataset)? If I run either of your data steps, my SAS complains about "invalid numeric data" and makes AllDepts a numeric variable.
My solution would be:
proc sort data=input;
by ID count;
run;
data output;
do until(last.ID);
set input;
length AllDepts $46; /* to be adapted depending on max. length of Dept */
by ID;
AllDepts=catx(', ', AllDepts, Dept);
end;
drop Dept;
run;
This applies the "DOW loop" technique (cf. http://www2.sas.com/proceedings/sugi28/099-28.pdf). The purpose of the unusual position of the (declarative) LENGTH statement is just to obtain the desired column order.
Use BY group processing.
proc sort data=input;
by ID count;
run;
data output;
set input;
BY ID;
retain AllDepts;
if not first.ID then AllDepts=catx(",",AllDepts,Dept);
else AllDepts=Dept;
if last.ID then output;
run;
Reeza,
Thank you for your help with this. Unfortunately, I got an empty dataset when I ran this. It may have been something I did because I dummied my variables to send in the code, however. I got a solution from another contributor, but I appreciate your quick response.
Thanks again,
KelseyB
Isn't it necessary to declare AllDepts as a character variable (assuming that it is not contained in the input dataset)? If I run either of your data steps, my SAS complains about "invalid numeric data" and makes AllDepts a numeric variable.
My solution would be:
proc sort data=input;
by ID count;
run;
data output;
do until(last.ID);
set input;
length AllDepts $46; /* to be adapted depending on max. length of Dept */
by ID;
AllDepts=catx(', ', AllDepts, Dept);
end;
drop Dept;
run;
This applies the "DOW loop" technique (cf. http://www2.sas.com/proceedings/sugi28/099-28.pdf). The purpose of the unusual position of the (declarative) LENGTH statement is just to obtain the desired column order.
FreelanceReinhard,
You are correct. I accidentally left out the length statement for AllDepts when I copied my code. It was actually in my program though so I didn't get an error.
This works perfectly! Thank you so much for your help.
Best regards,
KelseyB
Something like below should do:
data have;
infile datalines dlm=',' dsd truncover;
input (Count ID Name Dept) ($);
datalines;
1,123,Person 1,Dept A
2,123,Person 1,Dept C
3,123,Person 1,Dept E
1,456,Person 2,Dept B
2,456,Person 2,Dept C
3,456,Person 2,Dept F
4,456,Person 2,Dept Q
1,567,Person 3,Dept B
;
run;
data want(keep=Count ID Name AllDepts);
set have;
by id count;
length AllDepts $100;
retain AllDepts;
AllDepts=catx(', ',AllDepts,Dept);
if last.id then
do;
output;
call missing(AllDepts);
end;
run;
Thanks Patrick. This works perfectly. I had already accepted another respondents' answer, but I greatly appreciate this. Two ways to do something is better than none:)
Best Regards,
KelseyB
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.