BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
djbateman
Lapis Lazuli | Level 10

I have an unusual task where I am given 3 lists that I merge together, and I am supposed to only report a single list of all distinct values found in all lists.  Below is some code that I made up to illustrate the idea.

 

proc sql;
	create table test (ID char(3), LIST1 char(10), LIST2 char(10), LIST3 char(10));
		insert into test (id, list1, list2, list3)
			values ('001', 'A,B,C', 'A,B,C', 'A,B,C')
			values ('002', 'A,C,D', 'C', 'A,C,D')
			values ('003', 'A,B,C,D,E', '', '')
			values ('004', 'A,B,C,D,E', 'A,B', 'C,D,F');
quit;

I am wondering if there is a simple way to end up with the following results:

ID LIST
001 A,B,C
002 A,C,D
003 A,B,C,D,E
004 A,B,C,D,E,F

 

The best I can think is to parse each item from each list into a separate variable, transpose from horizontal to vertical, remove any duplicate values, transpose from vertical to horizontal, then CATX everything again into a single list.  Please tell me there is better way!

1 ACCEPTED SOLUTION

Accepted Solutions
ErikLund_Jensen
Rhodochrosite | Level 12

Hi @djbateman 

I think your idea is worth following. If I understood your input correct, so my test data is something similar to what you have, this would do it:

 

data have;
  infile datalines truncover;
  input ID $3. @5 List $char10.;
  datalines;
001 A,B,C
001 A,B,C
001 A,B,C
002 A,C,D
002 C
002 A,C,D
003 A,B,C,D,E
004 A,B,C,D,E
004 A,B
004 C,D,F
;
run;

data t1; 
  set have;
  do i = 1 to countw(List,',');
    Item = scan(List,i,',');
    output;
  end;
run;

proc sql;
  create table t2 as
    select distinct ID, Item
    from t1
    group by ID;
quit;

data want (keep = ID List);
  set t2;
  by ID;
  length list $12.;
  retain List;
  if first.ID then call missing (List);
  List = catx(',', List, Item);
  if last.ID then output;
run;

 

View solution in original post

4 REPLIES 4
ErikLund_Jensen
Rhodochrosite | Level 12

Hi @djbateman 

I think your idea is worth following. If I understood your input correct, so my test data is something similar to what you have, this would do it:

 

data have;
  infile datalines truncover;
  input ID $3. @5 List $char10.;
  datalines;
001 A,B,C
001 A,B,C
001 A,B,C
002 A,C,D
002 C
002 A,C,D
003 A,B,C,D,E
004 A,B,C,D,E
004 A,B
004 C,D,F
;
run;

data t1; 
  set have;
  do i = 1 to countw(List,',');
    Item = scan(List,i,',');
    output;
  end;
run;

proc sql;
  create table t2 as
    select distinct ID, Item
    from t1
    group by ID;
quit;

data want (keep = ID List);
  set t2;
  by ID;
  length list $12.;
  retain List;
  if first.ID then call missing (List);
  List = catx(',', List, Item);
  if last.ID then output;
run;

 

djbateman
Lapis Lazuli | Level 10
Thank you so much! I think this did the trick. It is basically the same process as I spelled out, but yours was a bit cleaner than mine. You did yours in just a few blocks while I used several small blocks.
Amir
PROC Star

Hi,

 

An alternative method:

 

data want(keep = id list);
  set test;
  
  length
    lists $ 100
    list  $ 100
  ;

  lists = catx(',',list1,list2,list3);
  
  do i = 1 to length(lists);
    /* add a letter to a list if it is not found in the list */
    list = ifc(find(list,scan(lists,i)), list, catx(',',list,scan(lists,i)));
  end;
run;

 

 

Thanks & kind regards,

Amir.

FreelanceReinh
Jade | Level 19

Hi @djbateman,

 

If your "values" are single letters as in your sample data, you could select the distinct values from the collating sequence using the COMPRESS function and then, if needed, insert the commas using PRXCHANGE:

data want(keep=id list);
set test;
length list $30;
list=prxchange('s/(\w\B)/$1,/',-1,compress(collate(65),cats(of list:),'k'));
run;

 

Otherwise, @Amir's one-step approach could easily be modified to work also with longer words.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 570 views
  • 1 like
  • 4 in conversation