I used proc geocode 2 years ago with sas 9.4 I think it was TS1M2. I'm running the same address again using the same street data files that I used 2 years ago, but now I get different results when I do a compare with my new table and the generated table from 2 years ago. The only difference is that I'm using TS1M3 now. I'm using the exact same address file and sas street data as last time.
The reason for running it again is to see if there was any changes to the geocode procedure or not. It seems to have changed and the results now from what I can tell are not as good as last time. This is based on reviewing a few random addresses.
I am unable to post any of the addresses because they are restricted addresses. This is what my procedure looks like:
proc geocode method = street addressvar = raddr addresscityvar = rcity addressstatevar = rstate addresszipvar = rzip5 data = raw out = raw_geocoded attribute_var = (COUNTYFP, TRACT, BLKGRP, BLOCK) lookupstreet = USM ; run ;
I'm running the same address again using the same street data files that I used 2 years ago, but now I get different results when I do a compare with my new table and the generated table from 2 years ago.
It seems to have changed and the results now from what I can tell are not as good as last time. This is based on reviewing a few random addresses.
How different are the results? Addresses that previously matched not matched? Different lat/long - how different? Is the difference practically significant? if the results would vary by a few yards are different "tract"
As far as the "not as good as last time" describe what exactly makes it look not as good? How few is few? What percentage of the address results changed? And are you sure the previous result was the "correct" result?
Yes it is discomforting when results are different but you say the software changed. Were there any other changes such as a different computer or server?
I will say that the use of a one-level name for the reference data set makes me a tad suspicious about the "exact same street data" as the one-level name typically is WORK library and I would have expected that temporary library to be replaced many times in two years.
The results are considerible different. I get different lat/lon and it's is not just feet apart it is much more. The _matched_ column shows different values as well. For instance one address from when I ran the process 2 years ago has _matched_ = 12345 S Skyview Cyn. The same address running with the same map files and the same raw address file gives me _matched_ = 78945 S Skyview Way.
My raw address file is saved from when I ran it last time. I bring it into my work directory to run the geocode process.
SAS is running on the same exact computer it was running on 2 years ago.
I wonder if updating to TS1M3 updated the zipcode, plfips and any other table that proc geocode uses.
The SASHELP.zipcode dataset get updated regularly so that may be one bit.
You may need to contact Tech support with exact example records showing the result differences.
One more question, when you the "raw address file is saved" is that a SAS dataset in a permanent library or text file that you read?
The raw address file is a sas data set in a permanent library.
I uninstalled TS1M3 and I'm installing TS1M2 and re-running my process and it should be exact. If that happens then I will use TS1M2 to do this years addresses.
If that doesn't work then I will contact tech support.
Thank you again for all your help Ballardw. I will reply back with my results.
The SASHELP.ZIPCODE data set is updated for each maintenance release. The other data sets in SASHELP used by PROC GEOCODE do not change that often as there is rarely the need. For example, the same versions of GCTYPE (July 2013) and PLFIPS (July 2010) are in both 9.40M2 and 9.40M3.
New versions of the primary street lookup data sets (USM, USS and USP) are posted to SAS MapsOnline annually with each new TIGER/Line file relesae by the Census Bureau. The newer versions do not replace the older ones but are merely added to the MapsOnline site.
Are you using the nationwide street lookup data downloaded from MapsOnline? Which version (year)? Did you change lookup data sets between your older and newer geocoding runs?
Now there was a change in PROC GEOCODE between 9.40M2 and 9.40M3, but without a specific address to run, it is not possible to see if that is the reason for the difference you are seeing. There were also code changes between 9.40M3 and 9.40M4. It would be interesting to see how one of the problematic addresses is handled in that latest release.
I wish I could be of more help.
Thank you Ed,
I am using the 2016 street data file from sas maps online for my latest attempt and it was 2015 when we did it 2 years ago.
Yes I did change the location of the maps when I was running my process. I actually used the exact code I did last time but this time it was on 9.4 TS1M3. I'm installing 9.4 TS1M2 now and then I'm going to try it again to see if that will give me the same information. Once I have that then I know I can use TS1M2. That is unless TS1M4 is more accurate.
Changing from the 2015 to the 2016 primary street lookup data should not have caused a major difference. Normally a new TIGER release includes mainly new streets, although the Census Bureau does insert corrections in the TIGER/Line files on occasion.
Regarding 9.40M4 being more accurate than previous releases, that is always our goal. Unfortunately making a change to better handle some addresses can sometimes introduce undesirable effects in others. We do try to prevent that, but are not always 100% successful. We'd really like to determine why you are seeing those changes.
After I installed 9.4 TS1M2 I ran the same piece of code using my old address file and old maps and I geocoded them and did a compare to the file I created 2 years ago and they matched. So there is something going on with 9.4 TS1M3.
I was unable to speak with my supervisor today. I will try and come up with some addresses that show this issue.
Do you have any advice on speeding up proc ginside? Addresses that we get zip matches we take them and join to the sashelp.zipcode table to get the county and then we run those state by state using the latest census shape files. Even for one address the proc ginside has taken almost 30 minutes and it still isn't finished. We tried subsetting by county but there was an issue where the county in the sashelp.zipcode table didn't match the county in the shapefile so that row never got Tract or Block like we were trying to get.
So far I have ran almost 300 addresses that I have from 2015 through proc ginside using the latest census block shapefile data. When I compare them to the same 300 rows I have using proc ginside from 2016 shapefiles they are identical. Does this sound right? I was thinking that something might have changed over those 2 years in the shapefiles for tract or block.
I'm running this all on 9.4 TS1M2
Per your side-question about wanting Proc Ginside to run quicker ... If the size of your map is very large, then you might consider reducing it a bit. You could run "Proc Greduce" on it, and then get rid of observations with a density value lower than 3 (or thereabouts - you might have to plot the resulting map to see if it looks like it still has enough border definition/detail to suit your purposes). While you're at it, you might also 'drop' other variables you're not using, that might be in the dataset (basically all you need are the variables you're using for your id, the x & y, and segment).
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.