Solved: Re: Seeking Help With An Unusual Merge

Adam_Coutts · Posted 07-26-2017 07:13 PM

I am seeking help with an unusual merge/join.

I have two datasets/tables in a Teradata environment, which I am communicating with using EG 7.11.

One table (let's call it CensusBlkMiss) contains about 2m records with location ID numbers, lat values, lon values, and is missing census block values:

ID   Lat   Lon
1    37 -120.7
2    37.2 -123.1
3    38 -122.2
4    36.4 -119.8
5    35.9 -122.1

The other table (let's call it CensusBlkExist) contains about 10m records with lat values, lon values, and census block values:

Lat   Lon CensusBlk
36.3 -121.7   C2
35.5 -120.9   A8
38.2 -122     A3
37.9 -122.4   C7
36.3 -121.9   B6
37 -120.3 A1
36.6 -122.1   B6

What I would like to do is assign census block values to the IDs in the first table based on the closest lat/long location from the second dataset, and end up with something like (let's call it CensusBlkCombo)

ID   CensusBlk
1    A1
2    B6
3    C7
4    A1
5    B6

("A1" is selected for ID #1 because 37/-120.3 is the closest location with a census block to 37/-120.7, the location of ID #1)

I have not been able to think of a way to do this merge using normal SAS code.

The SQL code that I have tried has been something like this

proc sql

create table CensusBlkCombo as

(select A.SP_ID,

(select CENSUS_BLK_GRP from

(select top 1 CENSUS_BLK_GRP

from CensusBlkExist B

order by (abs(B.sp_lat - A.sp_lat2) + abs(B.sp_long - A.sp_long2))

as CENSUS_BLK_GRP)

from CensusBlkMiss A));

That particular code gives me an error message, "TOP N Syntax error: Top N option is not supported in subquery".

I am not wedded to that setup, though - any suggestions are welcome, including a radically different code structure welcome too.

Reeza · Posted 07-26-2017 07:51 PM

What country are you in? US? If so, look at PROC GINSIDE.

The example maps to county/state but if you have the shapefiles for the blocks you can do a similar project. SAS may already have the US census shapes in the MAPS folder.

http://documentation.sas.com/?docsetId=grmapref&docsetVersion=9.4&docsetTarget=p0qjcc8hugcjb2n1x3bmu...

@Adam_Coutts wrote:

I am seeking help with an unusual merge/join.

I have two datasets/tables in a Teradata environment, which I am communicating with using EG 7.11.

One table (let's call it CensusBlkMiss) contains about 2m records with location ID numbers, lat values, lon values, and is missing census block values:

ID   Lat   Lon
1    37 -120.7
2    37.2 -123.1
3    38 -122.2
4    36.4 -119.8
5    35.9 -122.1

The other table (let's call it CensusBlkExist) contains about 10m records with lat values, lon values, and census block values:

Lat   Lon CensusBlk
36.3 -121.7   C2
35.5 -120.9   A8
38.2 -122     A3
37.9 -122.4   C7
36.3 -121.9   B6
37 -120.3 A1
36.6 -122.1   B6

What I would like to do is assign census block values to the IDs in the first table based on the closest lat/long location from the second dataset, and end up with something like (let's call it CensusBlkCombo)

ID   CensusBlk
1    A1
2    B6
3    C7
4    A1
5    B6

("A1" is selected for ID #1 because 37/-120.3 is the closest location with a census block to 37/-120.7, the location of ID #1)

I have not been able to think of a way to do this merge using normal SAS code.

The SQL code that I have tried has been something like this

proc sql

create table CensusBlkCombo as

    (select A.SP_ID,

      (select CENSUS_BLK_GRP from

        (select top 1 CENSUS_BLK_GRP

         from CensusBlkExist B

         order by (abs(B.sp_lat - A.sp_lat2) + abs(B.sp_long - A.sp_long2))

       as CENSUS_BLK_GRP)

     from CensusBlkMiss A));

That particular code gives me an error message, "TOP N Syntax error: Top N option is not supported in subquery".

I am not wedded to that setup, though - any suggestions are welcome, including a radically different code structure welcome too.

View solution in original post

Reeza · Posted 07-26-2017 07:51 PM

What country are you in? US? If so, look at PROC GINSIDE.

The example maps to county/state but if you have the shapefiles for the blocks you can do a similar project. SAS may already have the US census shapes in the MAPS folder.

http://documentation.sas.com/?docsetId=grmapref&docsetVersion=9.4&docsetTarget=p0qjcc8hugcjb2n1x3bmu...

@Adam_Coutts wrote:

I am seeking help with an unusual merge/join.

I have two datasets/tables in a Teradata environment, which I am communicating with using EG 7.11.

One table (let's call it CensusBlkMiss) contains about 2m records with location ID numbers, lat values, lon values, and is missing census block values:

ID   Lat   Lon
1    37 -120.7
2    37.2 -123.1
3    38 -122.2
4    36.4 -119.8
5    35.9 -122.1

The other table (let's call it CensusBlkExist) contains about 10m records with lat values, lon values, and census block values:

Lat   Lon CensusBlk
36.3 -121.7   C2
35.5 -120.9   A8
38.2 -122     A3
37.9 -122.4   C7
36.3 -121.9   B6
37 -120.3 A1
36.6 -122.1   B6

What I would like to do is assign census block values to the IDs in the first table based on the closest lat/long location from the second dataset, and end up with something like (let's call it CensusBlkCombo)

ID   CensusBlk
1    A1
2    B6
3    C7
4    A1
5    B6

("A1" is selected for ID #1 because 37/-120.3 is the closest location with a census block to 37/-120.7, the location of ID #1)

I have not been able to think of a way to do this merge using normal SAS code.

The SQL code that I have tried has been something like this

proc sql

create table CensusBlkCombo as

    (select A.SP_ID,

      (select CENSUS_BLK_GRP from

        (select top 1 CENSUS_BLK_GRP

         from CensusBlkExist B

         order by (abs(B.sp_lat - A.sp_lat2) + abs(B.sp_long - A.sp_long2))

       as CENSUS_BLK_GRP)

     from CensusBlkMiss A));

That particular code gives me an error message, "TOP N Syntax error: Top N option is not supported in subquery".

I am not wedded to that setup, though - any suggestions are welcome, including a radically different code structure welcome too.

ChrisBrooks · Posted 07-27-2017 04:58 AM

You might be able to do something with the geodist function although I've never used it and with such a large number of records to process i don't think any method is likely to be quick. The documentation for geodist is here

Adam_Coutts · Posted 08-09-2017 02:52 AM

ChrisBrooks - thank you for your reply.

It looks to me like the geodist function would replace me ad hoc distance calculation (abs(B.sp_lat - A.sp_lat2) + abs(B.sp_long - A.sp_long2)) but is not set up to translate lat and long to census block.

Reeza - thank you for the reply as well. PROC GINSIDE is indeed able to generate the census blocks that I was seeking. And I was able to find census block maps for California at ftp://ftp2.census.gov/geo/tiger/TIGER2010BLKPOPHU/.

Seeking Help With An Unusual Merge

Re: Seeking Help With An Unusual Merge

Re: Seeking Help With An Unusual Merge

Re: Seeking Help With An Unusual Merge

Re: Seeking Help With An Unusual Merge

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!