BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jerry898969
Pyrite | Level 9

Hello,

I'm using proc geocode to geocode about 100,000 of addresses.  I want to know why when it is a zip or city match the tract and block are left blank?  Since it is a centroid wouldn't return back the tract and block of the centroid position or the X and Y that are returned?

County is also blank but I'm using proc ginside to get many of them populated.

Is there another way I can do this?  Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
Darrell_sas
SAS Employee

For GINSIDE, you are providing a map data set to find what the points are inside of.. GINSIDE only finds one polygon (like county) per execution.  I assume it is a county map and that is what is returned.  I assume it doesn't have Tract and Block.  Tract and Block don't necessary line up with county.  If you have a map data set (from the Census) that has the Tract, then you can run that with GINSIDE  a second time and add it to the County.

View solution in original post

23 REPLIES 23
ballardw
Super User

Basically when the street level address can't be coded the procedure isn't going to assume anything about the tracts and blocks since a Zip or City may be associated with multiple tracts or blocks. Not always the case in lower population densities but I can well sympathize with not trying to program in the literally thousands of exceptions.

If your coding data set doesn't have county then it will be blank. You can get that with a merge on Zipcode from your data and SASHELP.ZIPCODE

jerry898969
Pyrite | Level 9

Hi Ballardw,

Thank you for the reply.

I'm able to get county using proc ginside.  But why doesn't it return the tract and block for the X Y coordinates that are returned?

Thank you,

Jerry

ballardw
Super User

You may need to share how you are using Ginside and some details of the specific data sets. Such as the coordinates of some of the city or Zip geocoded values.

Darrell_sas
SAS Employee

For GINSIDE, you are providing a map data set to find what the points are inside of.. GINSIDE only finds one polygon (like county) per execution.  I assume it is a county map and that is what is returned.  I assume it doesn't have Tract and Block.  Tract and Block don't necessary line up with county.  If you have a map data set (from the Census) that has the Tract, then you can run that with GINSIDE  a second time and add it to the County.

jerry898969
Pyrite | Level 9

Darrell,

Thank you so so much.  I will check with the census to see if I can get some type of file that will allow me to do the same thing I did for county.  I was able to get most of the counties to match up with a previous process I used a few years ago so I know that county works.  I hope I can do the same from tract and block..

Thank you and I will post my results.

Jerry

jerry898969
Pyrite | Level 9

Hi Darrell,

Thank you so much for your help.  I'm currently in the process of getting all the shape files from the census website and running it against my data.  My test process worked so I really appreciate your help.

Thank you,

Jerry

jerry898969
Pyrite | Level 9

Hi Darrell,

I have downloaded the 51 state block shape files and have extracted them and imported them using proc mapimport.  Now I have a table with 645,065,406 rows.  When I try and do the proc ginside for just 1 address row using the block table it is taking a very long time.  It's at about 5 minutes now and still nothing is returned.  Is there options that can speed this process up?  Do you know of a better approach?  The block files have state, county, tract and block so all my info I need is there.

Any help would be greatly appreciated.

Darrell_sas
SAS Employee

Can you subset your Address data and your block data by the state?  So do NC addresses with NC block data.  And SC and every state the same way. 

What version are you using?  We discovered some performance issues a few releases back.  Big files ran much slower.

jerry898969
Pyrite | Level 9

Hi Darrell,

That is exactly what I'm doing now.  I'm doing Alabama first.  I have 330 addresses and around 16 million rows in the block file for Alabama.

Both tables are within my work library so it doesn't need to go get the data.   I'm using SAS 9.4 64-bit TS1M2

This is how my code looks

al_addr=330 obs

block_AL=16 million obs

proc ginside data=al_addr map=block_AL out=AL_blk ;

id statefp10 countyfp10 tractce10 blockce10 ;

run ;

Thank you so much for your help.  This is a big project and it's been one road block after another. 

Darrell_sas
SAS Employee

Sorry, it took us several days to research this.  It looks like the problem is the number of Blocks you have in Alabama, but mostly the number of points in each Block.  As you said, there are around 16 million observations.  I think there are around 6000 Blocks, so each Block (polygon) has 2000 to 3000 points. The I/O seems to be a big factor.  And the searching the polygons.

You have the county in both the Block map and your points.  You could search by county. The Block isn't the same as a county, but close since they list the county in the Block map.   For example:

libname foo 'C:\';

Proc mapimport out=foo.blocks datafile="c:\Public\user_data\tl_2014_01_tabblock10.shp" contents;

id blockce10; run;

/*

Proc gmap data=foo.blocks map=foo.blocks; id blockce10; choro blockce10; run;

*/

data points;

x= -86.8025; y=33.5205556; city="Birmingham"; county="Jefferson"; fips="073";

x= -86.8024; y=33.5205556; city="Birmingham"; county="Jefferson"; fips="073";

run;

proc sort data=points; by county; run;

data blocks; set foo.blocks(where=(countyfp10='073')); run;

proc ginside data=points map=blocks out=out; id blockce10; run;

jerry898969
Pyrite | Level 9

Hi Darrell,


Thank you so much for your reply.  Are you saying that I should subset the block file that I import to only leaves rows with the counties within my data?  I am unable to subset the data.  The lead wants to leave the file as is.  He is afraid we may compromise the quality of the returned data and we have to send it out to many people.

Within the mapimport I was doing the following:

proc mapimport datafile="c:\temp\state.shp" out=block_temp ;

            select statefp10 countyfp10 tractce10 blockce10 ;

run;

The fact I'm not using the id statement with blockce10 will it cause a problem or slow down the process?  Can I leave it like this?

I have already used this import for all 51 states and started to giniside  some of the states.  If the id is an issue I will have to recreate the map files and re-run ginside which will cause me even

more delay.  The main thing I have to know is if the output will be incorrect.

Thank you so much.  I really need to figure this issue out so I can complete this project.

Darrell_sas
SAS Employee

Regarding MAPIMPORT, sorry but the ID is important.  The Census frequently does not sort the data by what you need (blockce10, in this case).  In fact, you need to sort the data by 

'ID countyfp10 blockce10;'. See my example below. 


I don't understand the problem with sub-setting the data.  I am not subsetting the data from MAPIMPORT, but I am subsetting a copy into a smaller data set.  Otherwise, you will continue to run very slow.  This example does one state at a time and then only the counties that you have in your "point" data.

libname save 'C:\';

filename fipsout 'c:\blocks\myfips.sas';

%let state=01;

/*Alabama block data*/

Proc mapimport out=save.blocks2 datafile="c:\Public\user_data\tl_2014_01_tabblock10.shp" contents;

id countyfp10 blockce10; run;

data save.points;  /*The data that I'm matching to the blocks*/

x= -86.8025; y=33.5205556; city="Birmingham"; county="Jefferson"; fips="073"; output;

x= -86.8024; y=33.5205556; city="Birmingham"; county="Jefferson"; fips="073"; output;

x= -86.3;    y=32.3666667; city="Montgomery"; county="Montgomery"; fips="101"; output;

run;

proc sort data=save.points; by fips; run;

%macro ginside_cnty(state, fip );

data points; set save.points(where=(fips="&fip")); run;  /*subset points by county*/

data blocks; set save.blocks2(where=(countyfp10="&fip")); run;  /*subset map by county*/

proc ginside data=points map=blocks out=out&state&fip; id blockce10; run;

data save.blks&state; set save.blks&state out&state&fip; run;  /*concat the blocks in the state*/

%mend;

data save.blks&state; run; /*empty results data out*/

data out_fips;

   file fipsout; set save.points; length lastfips $3;  retain lastfips '';

   if  (lastfips ne fips) then do; out='%ginside_cnty('||"&state"||','||fips||');'; put out; end;

   lastfips=fips; run;

%inc fipsout;  /*Run the file created above and Results in save.blks01 */

jerry898969
Pyrite | Level 9

Hi Darrel,

Thank you for the reply.

I was using select to select out the variables I need.  The map data does seem sorted.  By not using ID will it give me bad data or just slow down the process?

I will speak to the lead guy tomorrow to see if we can use this approach.  My worry is that I created 51 tables use the proc mapimport without the id statement.  We have run some long processes against them.  I want to make sure the returned data isn't compromised.

Thank again for your help.  I appreciate the information as well. 

Darrell_sas
SAS Employee

It will give you bad data. 

You might try just importing Alabama and experimenting with that state with the ID statement before importing all the states.

You can use Proc GMAP to see the Census map, but i would look at one County at a time because otherwise it will look like an all black mess due to the many Census Blocks.  I used county "101".

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 23 replies
  • 2300 views
  • 0 likes
  • 3 in conversation