Hello,
I'm geocoding <100 addresses using PROC GEOCODE with downloaded 2024 Steet Lookup Data for 9.4.
The output showed good street match, but many of the coordinates (X and Y) are off, way off. Here are two output examples (matched addressed are not exactly the input addresses)
Matched address: "3003 Sevierville Rd, Maryville, TN 37804" with X=-92.68680269 and Y=34.458537714
(This coordinates fall into Arkansas)
Matched address: "206 Debbie Ann Dr, Leander, TX 78641" with X=-80.1951071 and Y=39.556943449
(This coordinates fall into West Virginia)
I wonder if anyone has used these lookup data and run into similar problems.
Thanks!
Hi @nianhui
Your USP look up data set has fewer observations than it should.
The ReadMe.txt file in the downloaded .zip file states that USP should have 323,712,810 obserations. Your PROC CONTENTS output shows the USP data set containing 288,425,236 observations.
Did you receive any errors when running the ImportCSVfiles.sas program to create the lookup data?
Try running the ImportCSVfiles.sas program again to recreate the lookup data and verify that the USP data set has the expected number of observations then try running PROC GEOCODE again.
I hope that helps.
Regards,
Marcia
Can you post the output of Proc Contents on the street lookup data set used and the Proc Geocode syntax you used?
If you used a SAS Map data set of some flavor it may be that the X and Y coordinates returned are Map coordinates and not latitude and longitude OR you are getting Lat and Long coordinates that are treated as X, Y map display pairs which would tend to display on a map incorrectly.
Thanks!
Please see attached for the output of proc contents on the street lookup data sets.
Below is proc geocode syntax I used.
Libname streets '//i110filesmb.hs.it.vumc.io/SASUSER/sasuser/nianh/margaret/geocodedata_2024_StreetLookupData_94';
proc geocode
method=STREET
data=forgeocode
out=outgeocode
lookupstreet=streets.usm
attribute_var=(BLKGRP);
run;
I think the issue may be the name of the state variable in the look up data Look at the help for the LOOKUPSTATEVAR. The default if the option is not set is to use a STATECODE variable. It looks like variable in the USM data set is MapIDNameAbrv. So try adding to the options in the Geocode syntax:
Lookupstatevar = Mapidnameabrv
So you may be getting results for similar City and Address values in different states.
The Geocode procedure defaults to a lot of variable names so it is worth checking if anything goes wrong.
I never had actual street lookup needs so I'm not sure how the procedure might complain if expected variable names aren't met. Did the LOG show anything that might be interpreted as expected variable not present?
Thanks! I think I used the correct variable names in the input data set: "address", "city", "state" and "zip". At first, I used "postal", and the log showed an error about the variable name.
I tried add
Lookupstatevar = Mapidnameabrv
but got an error message
ERROR: Variable MAPIDNAMEABRV not found in MAPSGFK.USCITY_ALL data set.
You did not provide the details about the variables in your INPUT datasets, just the map datasets.
Please see attached for the contents of input address data.
Thanks!
Hi @nianhui
Your USP look up data set has fewer observations than it should.
The ReadMe.txt file in the downloaded .zip file states that USP should have 323,712,810 obserations. Your PROC CONTENTS output shows the USP data set containing 288,425,236 observations.
Did you receive any errors when running the ImportCSVfiles.sas program to create the lookup data?
Try running the ImportCSVfiles.sas program again to recreate the lookup data and verify that the USP data set has the expected number of observations then try running PROC GEOCODE again.
I hope that helps.
Regards,
Marcia
Thank you very much! Problem solved!
The log file didn't officially give me any error, but I looked again and there was something wrong. Please see attached log_ImportCSVfiles (1~3). The observation number of usp file is 288425236.
Then I started from scratch and re-downloaded everything (attached log_ImportCSVfiles_new and contents_usp_new). Now the observation number is 323712820.
All the coordinates are correct now!
Thanks again!!
Your first log does have an indication of an problem.
Looks like some part of at least on the files was replaced with binary zeros instead of normal lines of text.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.