Solved
Contributor
Posts: 26

# How to de-dup/find unique values in the same cell?

I have Group and Cities columns in my dataset, and some of the cities are duplicates. I want to find the unique cities. How can I code that?

 Group Cities Unique Cities A Houston, Houston, New York, Los Angeles, Los Angeles Houstin, New York, Los Angeles B Chicago, Boston, Boston Chicago, Boston C Miami, Worcester, Worcester, Springfield, Springfield, Atlanta, Atlanta Miami, Worcester, Springfield, Atlanta D Oakland, Oakland, Oakland, Dallas, Dallas Oakland, Dallas E Lancaster, Boston, Madison, Madison, Madison, Madison Lancaster, Boston, Madison

Accepted Solutions
Solution
‎09-25-2015 06:23 AM
Super User
Posts: 13,583

## Re: How to de-dup/find unique values in the same cell?

A single pass with a data step will work:

``````data want;
set have;
length unique \$ 100;
do i = 1 to countw(cities,',');
if indexw(unique,scan(cities,i,','))=0 then unique=catx(', ',unique,scan(cities,i,','));
end;
drop i;
run;``````

The Unique variable should have the same length as the cities variable.

All Replies
Valued Guide
Posts: 864

## Re: How to de-dup/find unique values in the same cell?

Here's a rough version, for the final I would add some macros so you don't have to know how many cities their are, macro variables should be able to take care of that and shorten the code:

data have;
infile cards dsd dlm='*';
informat group \$1. cities \$100.;
input group\$ cities\$;
cards;
A*Houston, Houston, New York, Los Angeles, Los Angeles
B*Chicago, Boston, Boston
C*Miami, Worcester, Worcester, Springfield, Springfield, Atlanta, Atlanta
D*Oakland, Oakland, Oakland, Dallas, Dallas
;

data start;
set have;
comma=countc(cities,',')+1;
scan1=strip(scan(cities,comma-0,','));
scan2=strip(scan(cities,comma-1,','));
scan3=strip(scan(cities,comma-2,','));
scan4=strip(scan(cities,comma-3,','));
scan5=strip(scan(cities,comma-4,','));
scan6=strip(scan(cities,comma-5,','));
scan7=strip(scan(cities,comma-6,','));
run;

proc transpose data=start out=tran(drop=_NAME_);by group;var scan:;

proc sort data=tran nodup;by group col1;where not missing(col1);

proc transpose data=tran out=tran2(drop=_NAME_);by group;var col1;

data want(keep=group unique);
set tran2;
unique=catx(',',of col1-col4);
run;

Solution
‎09-25-2015 06:23 AM
Super User
Posts: 13,583

## Re: How to de-dup/find unique values in the same cell?

A single pass with a data step will work:

``````data want;
set have;
length unique \$ 100;
do i = 1 to countw(cities,',');
if indexw(unique,scan(cities,i,','))=0 then unique=catx(', ',unique,scan(cities,i,','));
end;
drop i;
run;``````

The Unique variable should have the same length as the cities variable.

🔒 This topic is solved and locked.