BookmarkSubscribeRSS Feed
jklein271
Calcite | Level 5

The street method proc geocode functionality was highly recommended to me by someone else in the industry.  I've read through all of the documentation and believe I've done all of the correct prep before actually executing this method against a dataset.  I've done the following:

  • Loaded the most recent Tiger Data files
  • Loaded the most recent SAS zipcode file
  • Feeding geocode only US based addresses that have been standardized and delivery point validated per the USPS by using 3rd party address cleansing / verificaton software
  • Providing all address fields that geocode will accept at the street method level (address1, city, state, zip5) and converting zip5 from char to num (to equal the sashelp.zipcode format)
  • Loading sashelp.usm and sashelp.zipcode into memory using SASFILE to reduce the number of I/O operations as much as possible

As far as confirming that the Tiger Files, zipcode table, and SAS install are working properly, I have seen this proc successfully execute and return the desired results.  If I limit the above specified address subset to obs=1000, proc geocode successfully finishes in under 45 seconds.  However, if I incease the obs to 5000, proc geocode will just hang and never complete.  The odd part here is that it certainly looks like SAS is attempting to do something the entire time the process is "hung."  The memory is a flat line at ~2.18 GB which I would expect as I've loaded what I can (and what I think are being used the most) into memory.  The CPU usage is steady at 49/50 percent with one CPU pegged and the other CPU with nothing. Again, I'd expect this as I don't think proc geocode would use mutli-threading in any way and this would appear to be a CPU expensive proc.

I guess my short version of the question is the following.  Has anyone had any success pushing a large number of obs through proc geocode?  I honestly can't think of anything else to try and it's very frustrating to see the proc work like it should with a small number of obs and then see it consistently fail as I increase the number of obs being pushed through.  I have seen this repeatedly happen with different datasets and the fact that it succeeds with an obs=1000 should rule out a corrupt dataset or bad data in general.  To me, it almost looks lke geocode is getting "stuck" on attepting to match an address at the street level and not falling back to a zip or city match.  To support this point, if I increase my obs to 5000 or even 500000 but specify a zip method instead, it completes with no issue.

Thanks for your time in reading this and I appreciate any feedback even if it's just confirmation of larger datasets successfully being pushed through this relativel new method.

2 REPLIES 2
Cynthia_sas
SAS Super FREQ

Hi:

  To me, this sounds like an issue that would be cause for a track with Tech Support. Personally, I can't verify on your OBS/size issue. I have talked to folks who are pumping datasets larger than 1000 through GEOCODE, but I haven't done it myself.

   But if you are seeing a performance hit using PROC GEOCODE, I believe that is something that Tech Support would be interested in. And, if there is something that needs the developer's attention, then Tech Support will be able to involve the developer, if necessary, in the resolution of the issue.

  This would also be one of those situations, where someone needs to look at your exact GEOCODE program and possibly take a look at your data to see if they can replicate your process "hang".

To open a track with Tech Support, fill out the form at this link:

http://support.sas.com/ctx/supportform/createForm

cynthia

EdO_sas
SAS Employee

There is a recently discovered problem with the street geocoding method when attempting to process more than 1000 addresses. It is a hit-or-miss memory related issue which only a handful of users have encountered. We've had users geocode over one million addresses while a few others have hit the problem with much smaller address data sets.

A fix is shipping this month (August 2012) in the 9.3M2 maintenance release. If you can install that, the problem which is hanging SAS should go away. I do apologize for the inconvenience.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 1404 views
  • 0 likes
  • 3 in conversation