I have an unusual task where I am given 3 lists that I merge together, and I am supposed to only report a single list of all distinct values found in all lists. Below is some code that I made up to illustrate the idea.
proc sql;
create table test (ID char(3), LIST1 char(10), LIST2 char(10), LIST3 char(10));
insert into test (id, list1, list2, list3)
values ('001', 'A,B,C', 'A,B,C', 'A,B,C')
values ('002', 'A,C,D', 'C', 'A,C,D')
values ('003', 'A,B,C,D,E', '', '')
values ('004', 'A,B,C,D,E', 'A,B', 'C,D,F');
quit;
I am wondering if there is a simple way to end up with the following results:
ID | LIST |
001 | A,B,C |
002 | A,C,D |
003 | A,B,C,D,E |
004 | A,B,C,D,E,F |
The best I can think is to parse each item from each list into a separate variable, transpose from horizontal to vertical, remove any duplicate values, transpose from vertical to horizontal, then CATX everything again into a single list. Please tell me there is better way!
Hi @djbateman
I think your idea is worth following. If I understood your input correct, so my test data is something similar to what you have, this would do it:
data have;
infile datalines truncover;
input ID $3. @5 List $char10.;
datalines;
001 A,B,C
001 A,B,C
001 A,B,C
002 A,C,D
002 C
002 A,C,D
003 A,B,C,D,E
004 A,B,C,D,E
004 A,B
004 C,D,F
;
run;
data t1;
set have;
do i = 1 to countw(List,',');
Item = scan(List,i,',');
output;
end;
run;
proc sql;
create table t2 as
select distinct ID, Item
from t1
group by ID;
quit;
data want (keep = ID List);
set t2;
by ID;
length list $12.;
retain List;
if first.ID then call missing (List);
List = catx(',', List, Item);
if last.ID then output;
run;
Hi @djbateman
I think your idea is worth following. If I understood your input correct, so my test data is something similar to what you have, this would do it:
data have;
infile datalines truncover;
input ID $3. @5 List $char10.;
datalines;
001 A,B,C
001 A,B,C
001 A,B,C
002 A,C,D
002 C
002 A,C,D
003 A,B,C,D,E
004 A,B,C,D,E
004 A,B
004 C,D,F
;
run;
data t1;
set have;
do i = 1 to countw(List,',');
Item = scan(List,i,',');
output;
end;
run;
proc sql;
create table t2 as
select distinct ID, Item
from t1
group by ID;
quit;
data want (keep = ID List);
set t2;
by ID;
length list $12.;
retain List;
if first.ID then call missing (List);
List = catx(',', List, Item);
if last.ID then output;
run;
Hi,
An alternative method:
data want(keep = id list);
set test;
length
lists $ 100
list $ 100
;
lists = catx(',',list1,list2,list3);
do i = 1 to length(lists);
/* add a letter to a list if it is not found in the list */
list = ifc(find(list,scan(lists,i)), list, catx(',',list,scan(lists,i)));
end;
run;
Thanks & kind regards,
Amir.
Hi @djbateman,
If your "values" are single letters as in your sample data, you could select the distinct values from the collating sequence using the COMPRESS function and then, if needed, insert the commas using PRXCHANGE:
data want(keep=id list);
set test;
length list $30;
list=prxchange('s/(\w\B)/$1,/',-1,compress(collate(65),cats(of list:),'k'));
run;
Otherwise, @Amir's one-step approach could easily be modified to work also with longer words.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.