I have a SAS dataset with 3 columns that could have one or more values separated by comma. I would have to add a new column with distinct values in 3 columns separated by comma in a new field "Final.
Have:
ID | Y2017 | Y2018 | Y2019 |
a | a2, a3 | a2,a3 | a4 |
b | a2 | a4 | |
c | a1, a2 | a2,a3 | a4,a3 |
Want:
ID | Y2017 | Y2018 | Y2019 | Final |
a | a2, a3 | a2,a3 | a4 | a2, a3, a4 |
b | a2 | a4 | a2, a4 | |
c | a1, a2 | a2,a3 | a4,a3 | a1, a2, a3, a4 |
Thanks!
data have;
infile cards expandtabs truncover;
input (ID Y2017 Y2018 Y2019) (:$10.);
cards;
a a2,a3 a2,a3 a4
b a2 . a4 .
c a1,a2 a2,a3 a4,a3
;
data want;
set have;
length t1 t2 final $1000;
array y Y2017 Y2018 Y2019;
t1=catx(',',of y(*));
do _n_=1 to countw(t1,', ');
t2=scan(t1,_n_,',');
if not index(final,strip(t2)) then final=catx(',',final,t2);
end;
drop t:;
run;
Did you do something to force the multiple values per cell?
If so you may find it easier to back before that step was done and start over.
Also, is the final order at all important? Will you be doing something where you need to treat a1,a2,a3 the same as a3,a1,a2?
data have;
infile cards expandtabs truncover;
input (ID Y2017 Y2018 Y2019) (:$10.);
cards;
a a2,a3 a2,a3 a4
b a2 . a4 .
c a1,a2 a2,a3 a4,a3
;
data want;
set have;
length t1 t2 final $1000;
array y Y2017 Y2018 Y2019;
t1=catx(',',of y(*));
do _n_=1 to countw(t1,', ');
t2=scan(t1,_n_,',');
if not index(final,strip(t2)) then final=catx(',',final,t2);
end;
drop t:;
run;
Hi AviS,
Here is a possible solution to your problem:
data A;
length ID $2 Y2017-Y2019 $10;
input ID 1-2 Y2017 4-10 Y2018 13-18 Y2019 20-25;
datalines;
a a2, a3 a2,a3 a4
b a2 a4
c a1, a2 a2,a3 a4,a3
;
data B (keep=ID VAL);
set A;
length ALL $200 VAL $10;
ALL = catx(',',Y2017,Y2018,Y2019);
do i=1 to countw(ALL,',');
VAL = strip(scan(ALL,i,','));
output;
end;
run;
proc sort data=B out=C nodupkey;
by ID VAL;
run;
data D (drop=VAL);
merge A C;
by ID;
length FINAL $200;
retain FINAL;
if first.ID then FINAL = '';
FINAL = catx(', ', FINAL,VAL);
if last.ID then output;
run;
Run this code to make sure it works for you.
Is that what you want?
Best,
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.