Data visualization with SAS programming

PROC GIS need lookup table for entire US

Accepted Solution Solved
Reply
Contributor
Posts: 35
Accepted Solution

PROC GIS need lookup table for entire US

[ Edited ]

I'm trying to use PROC GIS for geocoding for entire US data. However the below code is useful only for Wake County, NC. How to change the lookup table so that I can get entire US data? I'm using the following code available at "http://support.sas.com/documentation/cdl/en/apdatgis/65034/HTML/default/viewer.htm#p05a52iwq7jb9cn1s..."

/*--- Copy the base map to the WORK library ---*/
proc gis;
   copy MAPS.WAKE.TRACT.GISMAP / /* Map entry to copy */
           destlib = WORK        /* Destination library */
           destcat = WORK.WAKE   /* Destination catalog */
           sel     = (_all_)     /* Copy all map components */
           blank                 /* Clear internal map path */
           replace;              /* Overwrite existing entry */
quit;

/*--- Create the address data set to geocode ---*/
data WORK.ADDRESSES (label='Data set to geocode');
   input address  $ 1-23    /* Street address */
         resident $ 24-48   /* Person at the location */
         zip      $ 49-53   /* 5-digit US postal code */
         city     $ 55-69   /* City name */
         state    $ 70-71;  /* US state name */
cards;
700  Madison Avenue    Patricia Smith           27513 Cary
506  Reedy Creek Road  Jean Francois Dumas      27513 Cary
1106 Medlin Drive      Michael Garriss          27511                NC
1150 Maynard Road      Kaspar Gutman            27511 Cary
138  Dry Ave.          Susan Lang               27511                NC
3112 Banks Road        Roy Hobbs                27603 Raleigh        NC
305  Mill Creek Drive  Alan Picard              27526 Fuquay-Varina  NC
1998 S. Main St.       Guillermo Ugarte               Wake Forest
7825 Old Middlesex Rd  Capt. Jeffrey Spaulding  27807 Bailey         NC
5550 Old Stage Road    Emily Joyner             27603 Raleigh        NC
3212 Avent Ferry Road  Fred C. Dobbs            27540                NC
1050 King Charles Rd.  Karin Schmidt                . Raleigh        NC
6819 Buffaloe Road     Ferdinand Paulin         27604                NC
3211 Constant Circle   Gordon Miller            34121
6111 Old Faison Road   Alan Picard              27545 Knightdale
725  N. Raleigh Street Evan Rudde               27501 Angier         NC
;
run;

/*--- Set up variables for the Batch Geocoding program ---*/
%gcbatch( glib  = WORK,             /* Geocoding library */
          geod  = WORK.ADDRESSES,   /* Address data to geocode */
          nv    = RESIDENT,         /* Who's at the address */
          av    = ADDRESS,          /* Address variable */
          cv    = CITY,             /* Place name */
          sv    = STATE,            /* State name */
          zv    = ZIP,              /* ZIP code (5-digit) */
          pv    = TRACT,            /* AREA value from map data */
          mname = WORK.WAKE.TRACT); /* Map data used for geocoding */

/*--- Run the Batch Geocoding program ---*/
dm 'af cat=SASHELP.GIS.GEOCODEB.SCL';

 

 Appreciate your inputs here as I'm a novice in PROC GIS.


Accepted Solutions
Solution
‎08-25-2016 02:13 AM
SAS Employee
Posts: 23

Re: PROC GIS need lookup table for entire US

The example in your link is still using the old SAS/GIS batch geocoder, not the newer Geocode procedure. I have attached the STREET_GEOCODE_US.sas example program from the 2013 SGF paper noted previously. It uses the Geocode procedure to locate school addresses.

 

The attached example uses the sample lookup data sets shipped in SASHELP which cover only Wake County, NC. To geocode addresses in other areas you will need the larger nationwide lookup data sets. Comments in that SAS example discuss downloading those street lookup data sets from the SAS MapsOnline site. Under the 'Street Geocoding' section of the MapsOnline page, you will need to download the StreetLookupData (9.4)-2015.zip or StreetLookupData (9.3)-2015.zip file, depending on whether you are running 9.4 or 9.3 SAS

 

Once the proper zip file is downloaded and unzipped, follow the instructions in its ReadMe.txt file to import the CSV files into SAS data sets. The import program will assign the libref LOOKUP to the location where you have installed those. You can then geocode your addresses as shown in the attached example program but using LOOKUPSTREET=LOOKUP.USM instead of LOOKUPSTREET=SASHELP.GEOEXM in your PROC GEOCODE syntax. That will use the nationwide U.S. data sets you installed for your geocoding runs.

 

 

View solution in original post

Attachment

All Replies
SAS Employee
Posts: 23

Re: PROC GIS need lookup table for entire US

Generating the lookup data for the enitre U.S.for use with the SAS/GIS geocoder takes quite a bit of effort. You have to first create a map of the entire U.S. using the interactive GIS desktop product. The batch geocoder then creates the lookup data from that map data.

 

Rather than using the SAS/GIS batch geocoder, you might consider using the Geocode procedure in SAS/GRAPH. The GIS geocoder has not been updated for many releases whereas the Geocode procedure is still being maintained.

 

Also, the nationwide lookup data used by the Geocode procedure for the entire U.S. can be downloaded from SAS MapsOnline. You do not have to create it. Just download the zip file with the prebuilt lookup data, unzip it, and follow the instructions in the ReadMe.txt file to install the lookup data on your machine.

 

A good overview of the Geocode procedure's capabilities with examples is in the SGF 2013 paper PROC GEOCODE: Finding Locations Outside the U.S. You can download a zip file with the paper's example programs from the SAS Support site. 

Contributor
Posts: 35

Re: PROC GIS need lookup table for entire US

Thanks a lot.

I found the following material which I think gives an example of PROC GIS doing the task (for complete US): http://support.sas.com/rnd/datavisualization/mapsonline/html/geocode_badzips.html

Do you think this will work?

Solution
‎08-25-2016 02:13 AM
SAS Employee
Posts: 23

Re: PROC GIS need lookup table for entire US

The example in your link is still using the old SAS/GIS batch geocoder, not the newer Geocode procedure. I have attached the STREET_GEOCODE_US.sas example program from the 2013 SGF paper noted previously. It uses the Geocode procedure to locate school addresses.

 

The attached example uses the sample lookup data sets shipped in SASHELP which cover only Wake County, NC. To geocode addresses in other areas you will need the larger nationwide lookup data sets. Comments in that SAS example discuss downloading those street lookup data sets from the SAS MapsOnline site. Under the 'Street Geocoding' section of the MapsOnline page, you will need to download the StreetLookupData (9.4)-2015.zip or StreetLookupData (9.3)-2015.zip file, depending on whether you are running 9.4 or 9.3 SAS

 

Once the proper zip file is downloaded and unzipped, follow the instructions in its ReadMe.txt file to import the CSV files into SAS data sets. The import program will assign the libref LOOKUP to the location where you have installed those. You can then geocode your addresses as shown in the attached example program but using LOOKUPSTREET=LOOKUP.USM instead of LOOKUPSTREET=SASHELP.GEOEXM in your PROC GEOCODE syntax. That will use the nationwide U.S. data sets you installed for your geocoding runs.

 

 

Attachment
Contributor
Posts: 35

Re: PROC GIS need lookup table for entire US

[ Edited ]

Thanks again for the help. 

After running PROC GEOCODE, for some records I have the following observation and would request your help to diagnose.

The street name matches (along with zip city and state) but the house number is outside the range of values in lookup data for the matching street. However, the note says just NOCT ZC(score=10). But it should give the following notes also:

1. AD (as street name also matches)

2. ENDNM (as house number is outside the range)

 

The acronyms are listed in the following link:

http://support.sas.com/documentation/cdl/en/graphref/67881/HTML/default/viewer.htm#p0volumqexbvwcn1k...

 

Is it possible to pass an option which will tell the procedure to ignore house number? (similar to nozip nocity option)

 

 

SAS Employee
Posts: 170

Re: PROC GIS need lookup table for entire US

[ Edited ]

I used my house address with the number outside the range and got _NOTES_: AD ZC ENDNM TS.

 

What version of SAS are you using?

Can you send the entire record from the output of proc Geocode for the address that you say is giving a problem?

 

SAS Employee
Posts: 23

Re: PROC GIS need lookup table for entire US

And also add DEBUG=4 to your Proc GEOCODE syntax. It will print trace information to the SAS log. Note, enable this option only when running a small number of addresses or the log will fill up.
Contributor
Posts: 35

Re: PROC GIS need lookup table for entire US

[ Edited ]

I'm using SAS 9.3

I have attached the log message after adding debug=4 for the last record (the problematic one), address obs: 336

Can you help me in interpreting the meanings of "TDS where", "MDS where" etc.?

 

For the problematic record,

address_match=860 MONTCLAIR RD STE 156

city_code_num=7000

zip=35213

state_code=1

 

I'm using the following code:

proc geocode debug=4

 method = street /* Street method */

addressvar = address_match

ADDRESSCITYVAR=city_code_num

ADDRESSZIPVAR=Zip

ADDRESSSTATEVAR=state_code

data = y /* Address data to geocode */

out = work.geocoded /* Geocoded output data set */

lookupstreet = lookup.USM; /* Street method lookup data */

run;

quit;

 

Output:

_MATCHED_=ZIP

_STATUS_=ZIP match

_NOTES_=NOCT ZC

_SCORE_=10

SAS Employee
Posts: 23

Re: PROC GIS need lookup table for entire US

[ Edited ]

I see three problems in your input data set work.y:

  1. Your variable state_code is numeric and contains the value 1 which is a state FIPS code. Proc Geocode expects the city variable to be character and contain the state name or postal abbreviation. In this case that would be either 'Alabama' or 'AL'.
  2. Variable city_var_num is numeric and contains 7000 which is the city place FIPS code. It should be a character variable and contain the city name of 'Birmingham'.
  3. The street address variable address_match is '860 MONTCLAIR RD STE 156'. The Geocode Procedure does not recognize the 'STE 156' as being a unit identifier. It treats it as part of the street name. The street name should be entered as '860 MONTCLAIR RD'.

If you contact SAS Technical Support and open a track with them for item number 3 above, it may be possible to add the ability to clean unit identifiers from input addresses in a future release.

 

If you change your input data and geocode syntax as below, you will get a street location for this address. Just be aware that your lookup data downloaded from SAS MapsOnline must match your SAS version which you indicated is 9.3.

 

data y;
  address = '860 Montclair Rd'; /* Remove the 'Ste 156' */
  city = 'Birmingham'; /* Use city name, not FIPS place code */
  state = 'AL';  /* Use state abbreviation, not FIPS state code */
  zip = 35213; /* ZIP code is correct */
run;

proc geocode
     method = street /* Street method */
     data = y /* Address data to geocode */
     out = geocoded /* Geocoded output data set */
     lookupstreet = lookup.USM; /* Street method lookup data for your SAS version */
run;

 

Contributor
Posts: 35

Re: PROC GIS need lookup table for entire US

Thanks!

PROC GEOCODE gives an output dataset where M_ADDR is not null for 95% records (because address matches with lookup table).

For the remaining 5% records, I want to find if there exists a record in the lookup table with similar street name for that zip-city-state.

For example, if there is a street "920 COMPASSION CIR", I want to check if there is a street with a slightly different spelling (say "COMPASION") in lookup table.

Of course I can do extensive programming in SAS to achieve this: for example, I can split the street name in my table into different words (in this case: 920, COMPASSION and CIR) and find out if any of these words exist in the lookup table for that zip-city-state.

Is there a more efficient way PROC GEOCODE (or any other procedure) can help me achieve this? 

SAS Employee
Posts: 23

Re: PROC GIS need lookup table for entire US

The Geocode Procedure uses exact matching to find a street name within a specified ZIP code or city. However there are several SAS functions which allow fuzzy text matching. I have not used them, but this article discusses several of those functions. The SAS documentation on those functions will also have examples.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 548 views
  • 0 likes
  • 3 in conversation