06-05-2013 02:06 PM
I have a list of city names in the U.S. They are typed in by end-users. A lot of them contains minor typos. See below
Is there an internal list of U.S. city names in the SAS that I can compare my city name list to the correct list (maybe by using COMPLEV???) in order to minimize distance and then
standardize city names. Alternatively, is there a function that transform the first column into code or standard city names so that I can do proc geocode on them?
I tried to manually correct these, almost killed myself...
Thanks a lot, guys!
06-05-2013 02:50 PM
There's the maps.uscity table that has City Names, and lat/long.
But city names aren't unique so you need to be careful,
proc freq data=maps.uscity noprint;
proc sort data=city_names; by descending count ;
06-05-2013 03:35 PM
Another consideration: Many city names are similar. For example, if you see the misspelling "Parris", there are many legitimate city names that could match it: Paris, Perris, Parrish. Would you like a computer to make the decision for you?
One viable approach would be to take all the corrections that you so studiously made already and automate them. Apply the same changes you made to the next batch of data. Then see which spellings still don't match a legitimate city and augment your list.
06-05-2013 03:41 PM
There should be a ZIP code data set SASHelp.Zipcode that has city and state information including some "alternate" city names associated. If by any chance your data also contains zip codes the function ZIPCity may be of interest to you.
06-05-2013 05:11 PM
Unfortunately, the zip code variable is not available. But I also have state variable I can use to triangulate. Could I ask in this case, what would be a better solution? Thank you, guys -
06-05-2013 06:27 PM
My general approach would be to find the matches by state on one or more of the city variables in the ZIP code data set first. For the ones that don't match I would start by examining the difference between the (distinct) cities in the data with ZIP code set looking for the minimum COMPGED or COMPLEV and/ or maximimum COMPARE function results.