BookmarkSubscribeRSS Feed
sasjoker
Calcite | Level 5

I have a list of city names in the U.S.  They are typed in by end-users.  A lot of them contains minor typos.  See below 


Is there an internal list of U.S. city names in the SAS that I can compare my city name list to the correct list (maybe by using COMPLEV???) in order to minimize distance and then

standardize city names.  Alternatively, is there a function that transform the first column into code or standard city names so that I can do proc geocode on them? 

I tried to manually correct these, almost killed myself... 

Thanks a lot, guys!

Chicago,Chicago
ChicaoChicago
ChichagoChicago
CincinattiCINCINNATI
CincinnatiCINCINNATI
CincinnattiCINCINNATI
CinncinatiCINCINNATI
9 REPLIES 9
Reeza
Super User

There's the maps.uscity table that has City Names, and lat/long.

But city names aren't unique so you need to be careful,

proc freq data=maps.uscity noprint;

table city/out=city_names;

run;

proc sort data=city_names; by descending count ;

run;

sasjoker
Calcite | Level 5

Thank you for your help -

Astounding
PROC Star

Another consideration:  Many city names are similar.  For example, if you see the misspelling "Parris", there are many legitimate city names that could match it:  Paris, Perris, Parrish.  Would you like a computer to make the decision for you?

One viable approach would be to take all the corrections that you so studiously made already and automate them.  Apply the same changes you made to the next batch of data.  Then see which spellings still don't match a legitimate city and augment your list.

Good luck.

ballardw
Super User

There should be a ZIP code data set SASHelp.Zipcode that has city and state information including some "alternate" city names associated. If by any chance your data also contains zip codes the function ZIPCity may be of interest to you.

Good luck.

sasjoker
Calcite | Level 5

Unfortunately, the zip code variable is not available.  But I also have state variable I can use to triangulate.  Could I ask in this case, what would be a better solution?  Thank you, guys -

ballardw
Super User

My general approach would be to find the matches by state on one or more of the city variables in the ZIP code data set first. For the ones that don't match I would start by examining the difference between the (distinct) cities in the data with ZIP code set looking for the minimum COMPGED or COMPLEV and/ or maximimum COMPARE function results.

sasjoker
Calcite | Level 5

That is a good method also, thanks!

addicted4frinds
Calcite | Level 5
How can i have first name and last name in signle column without using functions. By using length and colon. can any1 help me with this please
Kurt_Bremser
Super User
  1. Start your own thread, and give it a meaningful subject line; adding a post to a thread that is 8(!) years old won't get you many responses
  2. Provide example data, so we know what you are talking about. Usable example data means posting a DATA step with DATALINES (or a simple assignment if one observation is sufficient)
  3. Show what you expect to get out of that
  4. Posting unreasonable conditions makes no sense; if a certain function is ideal for your task, use it

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1514 views
  • 6 likes
  • 6 in conversation