I have Group and Cities columns in my dataset, and some of the cities are duplicates. I want to find the unique cities. How can I code that?
Group | Cities | Unique Cities |
A | Houston, Houston, New York, Los Angeles, Los Angeles | Houstin, New York, Los Angeles |
B | Chicago, Boston, Boston | Chicago, Boston |
C | Miami, Worcester, Worcester, Springfield, Springfield, Atlanta, Atlanta | Miami, Worcester, Springfield, Atlanta |
D | Oakland, Oakland, Oakland, Dallas, Dallas | Oakland, Dallas |
E | Lancaster, Boston, Madison, Madison, Madison, Madison | Lancaster, Boston, Madison |
A single pass with a data step will work:
data want;
set have;
length unique $ 100;
do i = 1 to countw(cities,',');
if indexw(unique,scan(cities,i,','))=0 then unique=catx(', ',unique,scan(cities,i,','));
end;
drop i;
run;
The Unique variable should have the same length as the cities variable.
Here's a rough version, for the final I would add some macros so you don't have to know how many cities their are, macro variables should be able to take care of that and shorten the code:
data have;
infile cards dsd dlm='*';
informat group $1. cities $100.;
input group$ cities$;
cards;
A*Houston, Houston, New York, Los Angeles, Los Angeles
B*Chicago, Boston, Boston
C*Miami, Worcester, Worcester, Springfield, Springfield, Atlanta, Atlanta
D*Oakland, Oakland, Oakland, Dallas, Dallas
E*Lancaster, Boston, Madison, Madison, Madison, Madison
;
data start;
set have;
comma=countc(cities,',')+1;
scan1=strip(scan(cities,comma-0,','));
scan2=strip(scan(cities,comma-1,','));
scan3=strip(scan(cities,comma-2,','));
scan4=strip(scan(cities,comma-3,','));
scan5=strip(scan(cities,comma-4,','));
scan6=strip(scan(cities,comma-5,','));
scan7=strip(scan(cities,comma-6,','));
run;
proc transpose data=start out=tran(drop=_NAME_);by group;var scan:;
proc sort data=tran nodup;by group col1;where not missing(col1);
proc transpose data=tran out=tran2(drop=_NAME_);by group;var col1;
data want(keep=group unique);
set tran2;
unique=catx(',',of col1-col4);
run;
A single pass with a data step will work:
data want;
set have;
length unique $ 100;
do i = 1 to countw(cities,',');
if indexw(unique,scan(cities,i,','))=0 then unique=catx(', ',unique,scan(cities,i,','));
end;
drop i;
run;
The Unique variable should have the same length as the cities variable.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.