Solved: How to de-dup/find unique values in the same cell?

ernie86 · Posted 09-24-2015 10:32 AM

I have Group and Cities columns in my dataset, and some of the cities are duplicates. I want to find the unique cities. How can I code that?

Group	Cities	Unique Cities
A	Houston, Houston, New York, Los Angeles, Los Angeles	Houstin, New York, Los Angeles
B	Chicago, Boston, Boston	Chicago, Boston
C	Miami, Worcester, Worcester, Springfield, Springfield, Atlanta, Atlanta	Miami, Worcester, Springfield, Atlanta
D	Oakland, Oakland, Oakland, Dallas, Dallas	Oakland, Dallas
E	Lancaster, Boston, Madison, Madison, Madison, Madison	Lancaster, Boston, Madison

ballardw · Posted 09-24-2015 12:17 PM

A single pass with a data step will work:

data want;
   set have;
   length unique $ 100;
   do i = 1 to countw(cities,',');
      if indexw(unique,scan(cities,i,','))=0 then unique=catx(', ',unique,scan(cities,i,','));
   end;
   drop i;
run;

The Unique variable should have the same length as the cities variable.

View solution in original post

Steelers_In_DC · Posted 09-24-2015 11:31 AM

Here's a rough version, for the final I would add some macros so you don't have to know how many cities their are, macro variables should be able to take care of that and shorten the code:

data have;
infile cards dsd dlm='*';
informat group $1. cities $100.;
input group$ cities$;
cards;
A*Houston, Houston, New York, Los Angeles, Los Angeles
B*Chicago, Boston, Boston
C*Miami, Worcester, Worcester, Springfield, Springfield, Atlanta, Atlanta
D*Oakland, Oakland, Oakland, Dallas, Dallas
E*Lancaster, Boston, Madison, Madison, Madison, Madison
;

data start;
set have;
comma=countc(cities,',')+1;
scan1=strip(scan(cities,comma-0,','));
scan2=strip(scan(cities,comma-1,','));
scan3=strip(scan(cities,comma-2,','));
scan4=strip(scan(cities,comma-3,','));
scan5=strip(scan(cities,comma-4,','));
scan6=strip(scan(cities,comma-5,','));
scan7=strip(scan(cities,comma-6,','));
run;

proc transpose data=start out=tran(drop=_NAME_);by group;var scan:;

proc sort data=tran nodup;by group col1;where not missing(col1);

proc transpose data=tran out=tran2(drop=_NAME_);by group;var col1;

data want(keep=group unique);
set tran2;
unique=catx(',',of col1-col4);
run;

ballardw · Posted 09-24-2015 12:17 PM

A single pass with a data step will work:

data want;
   set have;
   length unique $ 100;
   do i = 1 to countw(cities,',');
      if indexw(unique,scan(cities,i,','))=0 then unique=catx(', ',unique,scan(cities,i,','));
   end;
   drop i;
run;

The Unique variable should have the same length as the cities variable.

How to de-dup/find unique values in the same cell?

Re: How to de-dup/find unique values in the same cell?

Re: How to de-dup/find unique values in the same cell?

Re: How to de-dup/find unique values in the same cell?

Catch up on SAS Innovate 2026

How to de-dup/find unique values in the same cell?

Re: How to de-dup/find unique values in the same cell?

Re: How to de-dup/find unique values in the same cell?

Re: How to de-dup/find unique values in the same cell?

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away