Seeking Help With An Unusual Merge

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

Seeking Help With An Unusual Merge

I am seeking help with an unusual merge/join.

 

I have two datasets/tables in a Teradata environment, which I am communicating with using EG 7.11.   

 

One table (let's call it CensusBlkMiss) contains about 2m records with location ID numbers, lat values, lon values, and is missing census block values:

ID   Lat   Lon
1    37    -120.7
2    37.2  -123.1
3    38    -122.2
4    36.4  -119.8
5    35.9  -122.1

The other table (let's call it CensusBlkExist) contains about 10m records with lat values, lon values, and census block values:

Lat   Lon      CensusBlk
36.3  -121.7   C2
35.5  -120.9   A8
38.2  -122     A3
37.9  -122.4   C7
36.3  -121.9   B6
37    -120.3   A1
36.6  -122.1   B6

What I would like to do is assign census block values to the IDs in the first table based on the closest lat/long location from the second dataset, and end up with something like (let's call it CensusBlkCombo)

ID   CensusBlk
1    A1
2    B6
3    C7
4    A1
5    B6

 

("A1" is selected for ID #1 because 37/-120.3 is the closest location with a census block to 37/-120.7, the location of ID #1)

I have not been able to think of a way to do this merge using normal SAS code.

The SQL code that I have tried has been something like this

proc sql

  create table CensusBlkCombo as

    (select A.SP_ID,

      (select CENSUS_BLK_GRP from

        (select top 1 CENSUS_BLK_GRP

         from CensusBlkExist B

         order by (abs(B.sp_lat - A.sp_lat2) + abs(B.sp_long - A.sp_long2))

       as CENSUS_BLK_GRP)

     from CensusBlkMiss A));


That particular code gives me an error message, "TOP N Syntax error: Top N option is not supported in subquery". 

 

I am not wedded to that setup, though - any suggestions are welcome, including a radically different code structure welcome too.


Accepted Solutions
Solution
2 weeks ago
Super User
Posts: 17,829

Re: Seeking Help With An Unusual Merge

[ Edited ]

What country are you in? US? If so, look at PROC GINSIDE. 

The example maps to county/state but if you have the shapefiles for the blocks you can do a similar project. SAS may already have the US census shapes in the MAPS folder.

 

http://documentation.sas.com/?docsetId=grmapref&docsetVersion=9.4&docsetTarget=p0qjcc8hugcjb2n1x3bmu...

 


Adam_Coutts wrote:

I am seeking help with an unusual merge/join.

 

I have two datasets/tables in a Teradata environment, which I am communicating with using EG 7.11.   

 

One table (let's call it CensusBlkMiss) contains about 2m records with location ID numbers, lat values, lon values, and is missing census block values:

ID   Lat   Lon
1    37    -120.7
2    37.2  -123.1
3    38    -122.2
4    36.4  -119.8
5    35.9  -122.1

The other table (let's call it CensusBlkExist) contains about 10m records with lat values, lon values, and census block values:

Lat   Lon      CensusBlk
36.3  -121.7   C2
35.5  -120.9   A8
38.2  -122     A3
37.9  -122.4   C7
36.3  -121.9   B6
37    -120.3   A1
36.6  -122.1   B6

What I would like to do is assign census block values to the IDs in the first table based on the closest lat/long location from the second dataset, and end up with something like (let's call it CensusBlkCombo)

ID   CensusBlk
1    A1
2    B6
3    C7
4    A1
5    B6

 

("A1" is selected for ID #1 because 37/-120.3 is the closest location with a census block to 37/-120.7, the location of ID #1)

I have not been able to think of a way to do this merge using normal SAS code.

The SQL code that I have tried has been something like this

proc sql

  create table CensusBlkCombo as

    (select A.SP_ID,

      (select CENSUS_BLK_GRP from

        (select top 1 CENSUS_BLK_GRP

         from CensusBlkExist B

         order by (abs(B.sp_lat - A.sp_lat2) + abs(B.sp_long - A.sp_long2))

       as CENSUS_BLK_GRP)

     from CensusBlkMiss A));


That particular code gives me an error message, "TOP N Syntax error: Top N option is not supported in subquery". 

 

I am not wedded to that setup, though - any suggestions are welcome, including a radically different code structure welcome too.


 

View solution in original post


All Replies
Solution
2 weeks ago
Super User
Posts: 17,829

Re: Seeking Help With An Unusual Merge

[ Edited ]

What country are you in? US? If so, look at PROC GINSIDE. 

The example maps to county/state but if you have the shapefiles for the blocks you can do a similar project. SAS may already have the US census shapes in the MAPS folder.

 

http://documentation.sas.com/?docsetId=grmapref&docsetVersion=9.4&docsetTarget=p0qjcc8hugcjb2n1x3bmu...

 


Adam_Coutts wrote:

I am seeking help with an unusual merge/join.

 

I have two datasets/tables in a Teradata environment, which I am communicating with using EG 7.11.   

 

One table (let's call it CensusBlkMiss) contains about 2m records with location ID numbers, lat values, lon values, and is missing census block values:

ID   Lat   Lon
1    37    -120.7
2    37.2  -123.1
3    38    -122.2
4    36.4  -119.8
5    35.9  -122.1

The other table (let's call it CensusBlkExist) contains about 10m records with lat values, lon values, and census block values:

Lat   Lon      CensusBlk
36.3  -121.7   C2
35.5  -120.9   A8
38.2  -122     A3
37.9  -122.4   C7
36.3  -121.9   B6
37    -120.3   A1
36.6  -122.1   B6

What I would like to do is assign census block values to the IDs in the first table based on the closest lat/long location from the second dataset, and end up with something like (let's call it CensusBlkCombo)

ID   CensusBlk
1    A1
2    B6
3    C7
4    A1
5    B6

 

("A1" is selected for ID #1 because 37/-120.3 is the closest location with a census block to 37/-120.7, the location of ID #1)

I have not been able to think of a way to do this merge using normal SAS code.

The SQL code that I have tried has been something like this

proc sql

  create table CensusBlkCombo as

    (select A.SP_ID,

      (select CENSUS_BLK_GRP from

        (select top 1 CENSUS_BLK_GRP

         from CensusBlkExist B

         order by (abs(B.sp_lat - A.sp_lat2) + abs(B.sp_long - A.sp_long2))

       as CENSUS_BLK_GRP)

     from CensusBlkMiss A));


That particular code gives me an error message, "TOP N Syntax error: Top N option is not supported in subquery". 

 

I am not wedded to that setup, though - any suggestions are welcome, including a radically different code structure welcome too.


 

Regular Contributor
Posts: 184

Re: Seeking Help With An Unusual Merge

You might be able to do something with the geodist function although I've never used it and with such a large number of records to process i don't think any method is likely to be quick. The documentation for geodist is here

New Contributor
Posts: 2

Re: Seeking Help With An Unusual Merge

ChrisBrooks - thank you for your reply.

It looks to me like the geodist function would replace me ad hoc distance calculation (abs(B.sp_lat - A.sp_lat2) + abs(B.sp_long - A.sp_long2)) but is not set up to translate lat and long to census block.

 

Reeza - thank you for the reply as well.  PROC GINSIDE is indeed able to generate the census blocks that I was seeking.  And I was able to find census block maps for California at ftp://ftp2.census.gov/geo/tiger/TIGER2010BLKPOPHU/.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 140 views
  • 1 like
  • 3 in conversation