I'm trying to use PROC GIS for geocoding for entire US data. However the below code is useful only for Wake County, NC. How to change the lookup table so that I can get entire US data? I'm using the following code available at "http://support.sas.com/documentation/cdl/en/apdatgis/65034/HTML/default/viewer.htm#p05a52iwq7jb9cn1s..."
/*--- Copy the base map to the WORK library ---*/ proc gis; copy MAPS.WAKE.TRACT.GISMAP / /* Map entry to copy */ destlib = WORK /* Destination library */ destcat = WORK.WAKE /* Destination catalog */ sel = (_all_) /* Copy all map components */ blank /* Clear internal map path */ replace; /* Overwrite existing entry */ quit; /*--- Create the address data set to geocode ---*/ data WORK.ADDRESSES (label='Data set to geocode'); input address $ 1-23 /* Street address */ resident $ 24-48 /* Person at the location */ zip $ 49-53 /* 5-digit US postal code */ city $ 55-69 /* City name */ state $ 70-71; /* US state name */ cards; 700 Madison Avenue Patricia Smith 27513 Cary 506 Reedy Creek Road Jean Francois Dumas 27513 Cary 1106 Medlin Drive Michael Garriss 27511 NC 1150 Maynard Road Kaspar Gutman 27511 Cary 138 Dry Ave. Susan Lang 27511 NC 3112 Banks Road Roy Hobbs 27603 Raleigh NC 305 Mill Creek Drive Alan Picard 27526 Fuquay-Varina NC 1998 S. Main St. Guillermo Ugarte Wake Forest 7825 Old Middlesex Rd Capt. Jeffrey Spaulding 27807 Bailey NC 5550 Old Stage Road Emily Joyner 27603 Raleigh NC 3212 Avent Ferry Road Fred C. Dobbs 27540 NC 1050 King Charles Rd. Karin Schmidt . Raleigh NC 6819 Buffaloe Road Ferdinand Paulin 27604 NC 3211 Constant Circle Gordon Miller 34121 6111 Old Faison Road Alan Picard 27545 Knightdale 725 N. Raleigh Street Evan Rudde 27501 Angier NC ; run; /*--- Set up variables for the Batch Geocoding program ---*/ %gcbatch( glib = WORK, /* Geocoding library */ geod = WORK.ADDRESSES, /* Address data to geocode */ nv = RESIDENT, /* Who's at the address */ av = ADDRESS, /* Address variable */ cv = CITY, /* Place name */ sv = STATE, /* State name */ zv = ZIP, /* ZIP code (5-digit) */ pv = TRACT, /* AREA value from map data */ mname = WORK.WAKE.TRACT); /* Map data used for geocoding */ /*--- Run the Batch Geocoding program ---*/ dm 'af cat=SASHELP.GIS.GEOCODEB.SCL';
Appreciate your inputs here as I'm a novice in PROC GIS.
The example in your link is still using the old SAS/GIS batch geocoder, not the newer Geocode procedure. I have attached the STREET_GEOCODE_US.sas example program from the 2013 SGF paper noted previously. It uses the Geocode procedure to locate school addresses.
The attached example uses the sample lookup data sets shipped in SASHELP which cover only Wake County, NC. To geocode addresses in other areas you will need the larger nationwide lookup data sets. Comments in that SAS example discuss downloading those street lookup data sets from the SAS MapsOnline site. Under the 'Street Geocoding' section of the MapsOnline page, you will need to download the StreetLookupData (9.4)-2015.zip or StreetLookupData (9.3)-2015.zip file, depending on whether you are running 9.4 or 9.3 SAS
Once the proper zip file is downloaded and unzipped, follow the instructions in its ReadMe.txt file to import the CSV files into SAS data sets. The import program will assign the libref LOOKUP to the location where you have installed those. You can then geocode your addresses as shown in the attached example program but using LOOKUPSTREET=LOOKUP.USM instead of LOOKUPSTREET=SASHELP.GEOEXM in your PROC GEOCODE syntax. That will use the nationwide U.S. data sets you installed for your geocoding runs.
Generating the lookup data for the enitre U.S.for use with the SAS/GIS geocoder takes quite a bit of effort. You have to first create a map of the entire U.S. using the interactive GIS desktop product. The batch geocoder then creates the lookup data from that map data.
Rather than using the SAS/GIS batch geocoder, you might consider using the Geocode procedure in SAS/GRAPH. The GIS geocoder has not been updated for many releases whereas the Geocode procedure is still being maintained.
Also, the nationwide lookup data used by the Geocode procedure for the entire U.S. can be downloaded from SAS MapsOnline. You do not have to create it. Just download the zip file with the prebuilt lookup data, unzip it, and follow the instructions in the ReadMe.txt file to install the lookup data on your machine.
A good overview of the Geocode procedure's capabilities with examples is in the SGF 2013 paper PROC GEOCODE: Finding Locations Outside the U.S. You can download a zip file with the paper's example programs from the SAS Support site.
Thanks a lot.
I found the following material which I think gives an example of PROC GIS doing the task (for complete US): http://support.sas.com/rnd/datavisualization/mapsonline/html/geocode_badzips.html
Do you think this will work?
The example in your link is still using the old SAS/GIS batch geocoder, not the newer Geocode procedure. I have attached the STREET_GEOCODE_US.sas example program from the 2013 SGF paper noted previously. It uses the Geocode procedure to locate school addresses.
The attached example uses the sample lookup data sets shipped in SASHELP which cover only Wake County, NC. To geocode addresses in other areas you will need the larger nationwide lookup data sets. Comments in that SAS example discuss downloading those street lookup data sets from the SAS MapsOnline site. Under the 'Street Geocoding' section of the MapsOnline page, you will need to download the StreetLookupData (9.4)-2015.zip or StreetLookupData (9.3)-2015.zip file, depending on whether you are running 9.4 or 9.3 SAS
Once the proper zip file is downloaded and unzipped, follow the instructions in its ReadMe.txt file to import the CSV files into SAS data sets. The import program will assign the libref LOOKUP to the location where you have installed those. You can then geocode your addresses as shown in the attached example program but using LOOKUPSTREET=LOOKUP.USM instead of LOOKUPSTREET=SASHELP.GEOEXM in your PROC GEOCODE syntax. That will use the nationwide U.S. data sets you installed for your geocoding runs.
Thanks again for the help.
After running PROC GEOCODE, for some records I have the following observation and would request your help to diagnose.
The street name matches (along with zip city and state) but the house number is outside the range of values in lookup data for the matching street. However, the note says just NOCT ZC(score=10). But it should give the following notes also:
1. AD (as street name also matches)
2. ENDNM (as house number is outside the range)
The acronyms are listed in the following link:
Is it possible to pass an option which will tell the procedure to ignore house number? (similar to nozip nocity option)
I used my house address with the number outside the range and got _NOTES_: AD ZC ENDNM TS.
What version of SAS are you using?
Can you send the entire record from the output of proc Geocode for the address that you say is giving a problem?
I'm using SAS 9.3
I have attached the log message after adding debug=4 for the last record (the problematic one), address obs: 336
Can you help me in interpreting the meanings of "TDS where", "MDS where" etc.?
For the problematic record,
address_match=860 MONTCLAIR RD STE 156
city_code_num=7000
zip=35213
state_code=1
I'm using the following code:
proc geocode debug=4
method = street /* Street method */
addressvar = address_match
ADDRESSCITYVAR=city_code_num
ADDRESSZIPVAR=Zip
ADDRESSSTATEVAR=state_code
data = y /* Address data to geocode */
out = work.geocoded /* Geocoded output data set */
lookupstreet = lookup.USM; /* Street method lookup data */
run;
quit;
Output:
_MATCHED_=ZIP
_STATUS_=ZIP match
_NOTES_=NOCT ZC
_SCORE_=10
I see three problems in your input data set work.y:
If you contact SAS Technical Support and open a track with them for item number 3 above, it may be possible to add the ability to clean unit identifiers from input addresses in a future release.
If you change your input data and geocode syntax as below, you will get a street location for this address. Just be aware that your lookup data downloaded from SAS MapsOnline must match your SAS version which you indicated is 9.3.
data y;
address = '860 Montclair Rd'; /* Remove the 'Ste 156' */
city = 'Birmingham'; /* Use city name, not FIPS place code */
state = 'AL'; /* Use state abbreviation, not FIPS state code */
zip = 35213; /* ZIP code is correct */
run;
proc geocode
method = street /* Street method */
data = y /* Address data to geocode */
out = geocoded /* Geocoded output data set */
lookupstreet = lookup.USM; /* Street method lookup data for your SAS version */
run;
Thanks!
PROC GEOCODE gives an output dataset where M_ADDR is not null for 95% records (because address matches with lookup table).
For the remaining 5% records, I want to find if there exists a record in the lookup table with similar street name for that zip-city-state.
For example, if there is a street "920 COMPASSION CIR", I want to check if there is a street with a slightly different spelling (say "COMPASION") in lookup table.
Of course I can do extensive programming in SAS to achieve this: for example, I can split the street name in my table into different words (in this case: 920, COMPASSION and CIR) and find out if any of these words exist in the lookup table for that zip-city-state.
Is there a more efficient way PROC GEOCODE (or any other procedure) can help me achieve this?
The Geocode Procedure uses exact matching to find a street name within a specified ZIP code or city. However there are several SAS functions which allow fuzzy text matching. I have not used them, but this article discusses several of those functions. The SAS documentation on those functions will also have examples.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.