topic Re: Seeking Help With An Unusual Merge in SAS Programming

Seeking Help With An Unusual Merge

Adam_Coutts — Wed, 26 Jul 2017 23:13:24 GMT

I am seeking help with an unusual merge/join.

I have two datasets/tables in a Teradata environment, which I am communicating with using EG 7.11.

One table (let's call it CensusBlkMiss) contains about 2m records with location ID numbers, lat values, lon values, and is missing census block values:

ID   Lat   Lon
1    37 -120.7
2    37.2 -123.1
3    38 -122.2
4    36.4 -119.8
5    35.9 -122.1

The other table (let's call it CensusBlkExist) contains about 10m records with lat values, lon values, and census block values:

Lat   Lon CensusBlk
36.3 -121.7   C2
35.5 -120.9   A8
38.2 -122     A3
37.9 -122.4   C7
36.3 -121.9   B6
37 -120.3 A1
36.6 -122.1   B6

What I would like to do is assign census block values to the IDs in the first table based on the closest lat/long location from the second dataset, and end up with something like (let's call it CensusBlkCombo)

ID   CensusBlk
1    A1
2    B6
3    C7
4    A1
5    B6

("A1" is selected for ID #1 because 37/-120.3 is the closest location with a census block to 37/-120.7, the location of ID #1)

I have not been able to think of a way to do this merge using normal SAS code.

The SQL code that I have tried has been something like this

proc sql

create table CensusBlkCombo as

(select A.SP_ID,

(select CENSUS_BLK_GRP from

(select top 1 CENSUS_BLK_GRP

from CensusBlkExist B

order by (abs(B.sp_lat - A.sp_lat2) + abs(B.sp_long - A.sp_long2))

as CENSUS_BLK_GRP)

from CensusBlkMiss A));

That particular code gives me an error message, "TOP N Syntax error: Top N option is not supported in subquery".

I am not wedded to that setup, though - any suggestions are welcome, including a radically different code structure welcome too.

Re: Seeking Help With An Unusual Merge

Reeza — Wed, 26 Jul 2017 23:51:46 GMT

What country are you in? US? If so, look at PROC GINSIDE.

The example maps to county/state but if you have the shapefiles for the blocks you can do a similar project. SAS may already have the US census shapes in the MAPS folder.

http://documentation.sas.com/?docsetId=grmapref&docsetVersion=9.4&docsetTarget=p0qjcc8hugcjb2n1x3bmuaar16f0.htm&locale=en

@Adam_Coutts wrote:

I am seeking help with an unusual merge/join.

I have two datasets/tables in a Teradata environment, which I am communicating with using EG 7.11.

One table (let's call it CensusBlkMiss) contains about 2m records with location ID numbers, lat values, lon values, and is missing census block values:

ID   Lat   Lon
1    37 -120.7
2    37.2 -123.1
3    38 -122.2
4    36.4 -119.8
5    35.9 -122.1

The other table (let's call it CensusBlkExist) contains about 10m records with lat values, lon values, and census block values:

Lat   Lon CensusBlk
36.3 -121.7   C2
35.5 -120.9   A8
38.2 -122     A3
37.9 -122.4   C7
36.3 -121.9   B6
37 -120.3 A1
36.6 -122.1   B6

What I would like to do is assign census block values to the IDs in the first table based on the closest lat/long location from the second dataset, and end up with something like (let's call it CensusBlkCombo)

ID   CensusBlk
1    A1
2    B6
3    C7
4    A1
5    B6

("A1" is selected for ID #1 because 37/-120.3 is the closest location with a census block to 37/-120.7, the location of ID #1)

I have not been able to think of a way to do this merge using normal SAS code.

The SQL code that I have tried has been something like this

proc sql

create table CensusBlkCombo as

    (select A.SP_ID,

      (select CENSUS_BLK_GRP from

        (select top 1 CENSUS_BLK_GRP

         from CensusBlkExist B

         order by (abs(B.sp_lat - A.sp_lat2) + abs(B.sp_long - A.sp_long2))

       as CENSUS_BLK_GRP)

     from CensusBlkMiss A));

That particular code gives me an error message, "TOP N Syntax error: Top N option is not supported in subquery".

I am not wedded to that setup, though - any suggestions are welcome, including a radically different code structure welcome too.

Re: Seeking Help With An Unusual Merge

ChrisBrooks — Thu, 27 Jul 2017 08:58:23 GMT

You might be able to do something with the geodist function although I've never used it and with such a large number of records to process i don't think any method is likely to be quick. The documentation for geodist is here

Re: Seeking Help With An Unusual Merge

Adam_Coutts — Wed, 09 Aug 2017 06:52:41 GMT

ChrisBrooks - thank you for your reply.

It looks to me like the geodist function would replace me ad hoc distance calculation (abs(B.sp_lat - A.sp_lat2) + abs(B.sp_long - A.sp_long2)) but is not set up to translate lat and long to census block.

Reeza - thank you for the reply as well. PROC GINSIDE is indeed able to generate the census blocks that I was seeking. And I was able to find census block maps for California at ftp://ftp2.census.gov/geo/tiger/TIGER2010BLKPOPHU/.