I am trying to clean address data by correcting different spellings of the same city (Ex. St. Lous, Saint Louis...). I am able to use PROC GEOCODE to turn street addresses into Y X cordinates. SAS suggests alternative addresses and states (M_ADDR and M_STATE). However the M_CITY variable field is blank for every observation.
I wanted the M_CITY variable because I thought it could help standardize city spellings. According to the SAS documentation,
Question:
Can anyone help me figure out why the M_CITY variable is blank?
Details:
I am using the SASHELP.USM dataset to look up street locations.
I am also open to other solutions to standadize city spellings.
Code:
PROC GEOCODE
METHOD=STREET
DATA=WORK.ADDRESS_TEST1
OUT=WORK.GEO_ADDR_TEST2
lookupSTREET=SASHELP.USM;
RUN;
Thanks,
TB
I did a little more research on M_CITY. It was only returned if we did a city match (not a street-level match or zip match, etc). It was suggested that having that information for a street match would be useful so we added the M_CITY information. That appears to be added in 9.4m1. So, 9.3 definitely does not have M_CITY.
It has been a while since I did my last geocoding, I can only speak from my impression. You may need to download the most current TIGER file from census site to do a thorough geocoding. The dataset in sashelp is just for demo, I believe. It is neither complete nor current. Please reference this paper for details:
http://support.sas.com/resources/papers/proceedings10/332-2010.pdf
I am using the 2014 TIGER file. I noticed they have 2015 available, but thought it might not be necessary for my purposes. I am comfortable assuming that the addresses in my file were around in 2014. Unless you can think of a new featurture in the 2015 TIGER data set that would help.
Thanks for your paper suggestion.
Understood what you are saying, but just to confirm, otherwise I will be utterly lost, is this the actual code you were running?
lookupSTREET=SASHELP.USM;
As this is refering to sashelp library, which by the way, can NOT be reassigned. So are you saying in your real code, you were using 2014 Tiger in a different library, meaning you real code will be using a different non-reserved libname to address that.
Other than that, the only thing I know is that if the city level match is failed , then the city will be left blank.
I did not download the TIGER dataset myself. However when I open the SASHELP.USM dataset, it says that the source is Tiger 2014.
Hmm. Something isn't quite right about your USM file. This is mine, mine is 2014 Tiger as well. Yours doesn't even have city columns.
So you may want to check your lookup table.
I am aware of you have a big data to tackle, so this is just FYI in case you need it in the future. The following link provides the most updated golden standard geocoding for financial industry:
https://geomap.ffiec.gov/FFIECGeocMap/GeocodeMap1.aspx
They don't provide API for automated search and web scrapping, but google will help.
You may want to look at http://support.sas.com/rnd/datavisualization/mapsonline/html/geocode.html for additional datasets.
I believe the SASHELP.USM is a subset of data from TIGER.
FYI, SAS's Data Quality aka SAS Dataflux, with an address matching plug-in like Loqate provides an automated approach to your problem. It does geo-coding, address standardisation and cleansing, zip codes and then some. Its really an enterprise solution for fixing data quality issues but thought you would like to know about it.
There was a defect in Proc GECOCODE that M_CITY didn't display in the output data set. I know it works in 9.4M2. I do not know how far back the problem went.
I am using SAS Enterprise Guide Version 5.1 (5.100.0.13304 Hot fix 17)
Do you think this issue effects my version of SAS?
With EG 5.1 you are probably using SAS 9.3. You can check the SAS version by submitting:
%put &SYSVLONG;
Or by right-clicking on the server in your Server list, select Properties, then look at the Software tab in the window.
I am using SAS 9.3. So I guess it is possible that the defect is the reason that the M_CITY field is empty.
Oh, and it looks like there was a related defect (if not the one @Darrell_sas mentions) fixed at 9.4 Maint 1.
I did a little more research on M_CITY. It was only returned if we did a city match (not a street-level match or zip match, etc). It was suggested that having that information for a street match would be useful so we added the M_CITY information. That appears to be added in 9.4m1. So, 9.3 definitely does not have M_CITY.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.