How to standardize U.S. cities variable in SAS

Reply
Occasional Contributor
Posts: 10

How to standardize U.S. cities variable in SAS

I have a list of city names in the U.S.  They are typed in by end-users.  A lot of them contains minor typos.  See below 


Is there an internal list of U.S. city names in the SAS that I can compare my city name list to the correct list (maybe by using COMPLEV???) in order to minimize distance and then

standardize city names.  Alternatively, is there a function that transform the first column into code or standard city names so that I can do proc geocode on them? 

I tried to manually correct these, almost killed myself... 

Thanks a lot, guys!

Chicago,Chicago
ChicaoChicago
ChichagoChicago
CincinattiCINCINNATI
CincinnatiCINCINNATI
CincinnattiCINCINNATI
CinncinatiCINCINNATI
Super User
Posts: 19,772

Re: How to standardize U.S. cities variable in SAS

There's the maps.uscity table that has City Names, and lat/long.

But city names aren't unique so you need to be careful,

proc freq data=maps.uscity noprint;

table city/out=city_names;

run;

proc sort data=city_names; by descending count ;

run;

Occasional Contributor
Posts: 10

Re: How to standardize U.S. cities variable in SAS

Thank you for your help -

Super User
Posts: 5,499

Re: How to standardize U.S. cities variable in SAS

Another consideration:  Many city names are similar.  For example, if you see the misspelling "Parris", there are many legitimate city names that could match it:  Paris, Perris, Parrish.  Would you like a computer to make the decision for you?

One viable approach would be to take all the corrections that you so studiously made already and automate them.  Apply the same changes you made to the next batch of data.  Then see which spellings still don't match a legitimate city and augment your list.

Good luck.

Super User
Posts: 11,343

Re: How to standardize U.S. cities variable in SAS

There should be a ZIP code data set SASHelp.Zipcode that has city and state information including some "alternate" city names associated. If by any chance your data also contains zip codes the function ZIPCity may be of interest to you.

Good luck.

Occasional Contributor
Posts: 10

Re: How to standardize U.S. cities variable in SAS

Unfortunately, the zip code variable is not available.  But I also have state variable I can use to triangulate.  Could I ask in this case, what would be a better solution?  Thank you, guys -

Super User
Posts: 11,343

Re: How to standardize U.S. cities variable in SAS

My general approach would be to find the matches by state on one or more of the city variables in the ZIP code data set first. For the ones that don't match I would start by examining the difference between the (distinct) cities in the data with ZIP code set looking for the minimum COMPGED or COMPLEV and/ or maximimum COMPARE function results.

Occasional Contributor
Posts: 10

Re: How to standardize U.S. cities variable in SAS

That is a good method also, thanks!

Ask a Question
Discussion stats
  • 7 replies
  • 443 views
  • 6 likes
  • 4 in conversation