Solved: Find Distinct Values in a Horizontal List

djbateman · Posted 10-12-2023 12:02 PM

I have an unusual task where I am given 3 lists that I merge together, and I am supposed to only report a single list of all distinct values found in all lists. Below is some code that I made up to illustrate the idea.

proc sql;
	create table test (ID char(3), LIST1 char(10), LIST2 char(10), LIST3 char(10));
		insert into test (id, list1, list2, list3)
			values ('001', 'A,B,C', 'A,B,C', 'A,B,C')
			values ('002', 'A,C,D', 'C', 'A,C,D')
			values ('003', 'A,B,C,D,E', '', '')
			values ('004', 'A,B,C,D,E', 'A,B', 'C,D,F');
quit;

I am wondering if there is a simple way to end up with the following results:

ID	LIST
001	A,B,C
002	A,C,D
003	A,B,C,D,E
004	A,B,C,D,E,F

The best I can think is to parse each item from each list into a separate variable, transpose from horizontal to vertical, remove any duplicate values, transpose from vertical to horizontal, then CATX everything again into a single list. Please tell me there is better way!

ErikLund_Jensen · Posted 10-12-2023 12:49 PM

Hi @djbateman

I think your idea is worth following. If I understood your input correct, so my test data is something similar to what you have, this would do it:

data have;
  infile datalines truncover;
  input ID $3. @5 List $char10.;
  datalines;
001 A,B,C
001 A,B,C
001 A,B,C
002 A,C,D
002 C
002 A,C,D
003 A,B,C,D,E
004 A,B,C,D,E
004 A,B
004 C,D,F
;
run;

data t1; 
  set have;
  do i = 1 to countw(List,',');
    Item = scan(List,i,',');
    output;
  end;
run;

proc sql;
  create table t2 as
    select distinct ID, Item
    from t1
    group by ID;
quit;

data want (keep = ID List);
  set t2;
  by ID;
  length list $12.;
  retain List;
  if first.ID then call missing (List);
  List = catx(',', List, Item);
  if last.ID then output;
run;

View solution in original post

ErikLund_Jensen · Posted 10-12-2023 12:49 PM

Hi @djbateman

I think your idea is worth following. If I understood your input correct, so my test data is something similar to what you have, this would do it:

data have;
  infile datalines truncover;
  input ID $3. @5 List $char10.;
  datalines;
001 A,B,C
001 A,B,C
001 A,B,C
002 A,C,D
002 C
002 A,C,D
003 A,B,C,D,E
004 A,B,C,D,E
004 A,B
004 C,D,F
;
run;

data t1; 
  set have;
  do i = 1 to countw(List,',');
    Item = scan(List,i,',');
    output;
  end;
run;

proc sql;
  create table t2 as
    select distinct ID, Item
    from t1
    group by ID;
quit;

data want (keep = ID List);
  set t2;
  by ID;
  length list $12.;
  retain List;
  if first.ID then call missing (List);
  List = catx(',', List, Item);
  if last.ID then output;
run;

djbateman · Posted 10-12-2023 01:05 PM

Thank you so much! I think this did the trick. It is basically the same process as I spelled out, but yours was a bit cleaner than mine. You did yours in just a few blocks while I used several small blocks.

Amir · Posted 10-12-2023 02:08 PM

Hi,

An alternative method:

data want(keep = id list);
  set test;
  
  length
    lists $ 100
    list  $ 100
  ;

  lists = catx(',',list1,list2,list3);
  
  do i = 1 to length(lists);
    /* add a letter to a list if it is not found in the list */
    list = ifc(find(list,scan(lists,i)), list, catx(',',list,scan(lists,i)));
  end;
run;

Thanks & kind regards,

Amir.

FreelanceReinh · Posted 10-12-2023 02:38 PM

Hi @djbateman,

If your "values" are single letters as in your sample data, you could select the distinct values from the collating sequence using the COMPRESS function and then, if needed, insert the commas using PRXCHANGE:

data want(keep=id list);
set test;
length list $30;
list=prxchange('s/(\w\B)/$1,/',-1,compress(collate(65),cats(of list:),'k'));
run;

Otherwise, @Amir's one-step approach could easily be modified to work also with longer words.

Find Distinct Values in a Horizontal List

Re: Find Distinct Values in a Horizontal List

Re: Find Distinct Values in a Horizontal List

Re: Find Distinct Values in a Horizontal List

Re: Find Distinct Values in a Horizontal List

Re: Find Distinct Values in a Horizontal List

Find Distinct Values in a Horizontal List

Re: Find Distinct Values in a Horizontal List

Re: Find Distinct Values in a Horizontal List

Re: Find Distinct Values in a Horizontal List

Re: Find Distinct Values in a Horizontal List

Re: Find Distinct Values in a Horizontal List

Registration is open

SAS Training: Just a Click Away