BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Adam_Coutts
Calcite | Level 5

I am seeking help with an unusual merge/join.

 

I have two datasets/tables in a Teradata environment, which I am communicating with using EG 7.11.   

 

One table (let's call it CensusBlkMiss) contains about 2m records with location ID numbers, lat values, lon values, and is missing census block values:

ID   Lat   Lon
1    37    -120.7
2    37.2  -123.1
3    38    -122.2
4    36.4  -119.8
5    35.9  -122.1

The other table (let's call it CensusBlkExist) contains about 10m records with lat values, lon values, and census block values:

Lat   Lon      CensusBlk
36.3  -121.7   C2
35.5  -120.9   A8
38.2  -122     A3
37.9  -122.4   C7
36.3  -121.9   B6
37    -120.3   A1
36.6  -122.1   B6

What I would like to do is assign census block values to the IDs in the first table based on the closest lat/long location from the second dataset, and end up with something like (let's call it CensusBlkCombo)

ID   CensusBlk
1    A1
2    B6
3    C7
4    A1
5    B6

 

("A1" is selected for ID #1 because 37/-120.3 is the closest location with a census block to 37/-120.7, the location of ID #1)

I have not been able to think of a way to do this merge using normal SAS code.

The SQL code that I have tried has been something like this

proc sql

  create table CensusBlkCombo as

    (select A.SP_ID,

      (select CENSUS_BLK_GRP from

        (select top 1 CENSUS_BLK_GRP

         from CensusBlkExist B

         order by (abs(B.sp_lat - A.sp_lat2) + abs(B.sp_long - A.sp_long2))

       as CENSUS_BLK_GRP)

     from CensusBlkMiss A));


That particular code gives me an error message, "TOP N Syntax error: Top N option is not supported in subquery". 

 

I am not wedded to that setup, though - any suggestions are welcome, including a radically different code structure welcome too.

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

What country are you in? US? If so, look at PROC GINSIDE. 

The example maps to county/state but if you have the shapefiles for the blocks you can do a similar project. SAS may already have the US census shapes in the MAPS folder.

 

http://documentation.sas.com/?docsetId=grmapref&docsetVersion=9.4&docsetTarget=p0qjcc8hugcjb2n1x3bmu...

 


@Adam_Coutts wrote:

I am seeking help with an unusual merge/join.

 

I have two datasets/tables in a Teradata environment, which I am communicating with using EG 7.11.   

 

One table (let's call it CensusBlkMiss) contains about 2m records with location ID numbers, lat values, lon values, and is missing census block values:

ID   Lat   Lon
1    37    -120.7
2    37.2  -123.1
3    38    -122.2
4    36.4  -119.8
5    35.9  -122.1

The other table (let's call it CensusBlkExist) contains about 10m records with lat values, lon values, and census block values:

Lat   Lon      CensusBlk
36.3  -121.7   C2
35.5  -120.9   A8
38.2  -122     A3
37.9  -122.4   C7
36.3  -121.9   B6
37    -120.3   A1
36.6  -122.1   B6

What I would like to do is assign census block values to the IDs in the first table based on the closest lat/long location from the second dataset, and end up with something like (let's call it CensusBlkCombo)

ID   CensusBlk
1    A1
2    B6
3    C7
4    A1
5    B6

 

("A1" is selected for ID #1 because 37/-120.3 is the closest location with a census block to 37/-120.7, the location of ID #1)

I have not been able to think of a way to do this merge using normal SAS code.

The SQL code that I have tried has been something like this

proc sql

  create table CensusBlkCombo as

    (select A.SP_ID,

      (select CENSUS_BLK_GRP from

        (select top 1 CENSUS_BLK_GRP

         from CensusBlkExist B

         order by (abs(B.sp_lat - A.sp_lat2) + abs(B.sp_long - A.sp_long2))

       as CENSUS_BLK_GRP)

     from CensusBlkMiss A));


That particular code gives me an error message, "TOP N Syntax error: Top N option is not supported in subquery". 

 

I am not wedded to that setup, though - any suggestions are welcome, including a radically different code structure welcome too.


 

View solution in original post

3 REPLIES 3
Reeza
Super User

What country are you in? US? If so, look at PROC GINSIDE. 

The example maps to county/state but if you have the shapefiles for the blocks you can do a similar project. SAS may already have the US census shapes in the MAPS folder.

 

http://documentation.sas.com/?docsetId=grmapref&docsetVersion=9.4&docsetTarget=p0qjcc8hugcjb2n1x3bmu...

 


@Adam_Coutts wrote:

I am seeking help with an unusual merge/join.

 

I have two datasets/tables in a Teradata environment, which I am communicating with using EG 7.11.   

 

One table (let's call it CensusBlkMiss) contains about 2m records with location ID numbers, lat values, lon values, and is missing census block values:

ID   Lat   Lon
1    37    -120.7
2    37.2  -123.1
3    38    -122.2
4    36.4  -119.8
5    35.9  -122.1

The other table (let's call it CensusBlkExist) contains about 10m records with lat values, lon values, and census block values:

Lat   Lon      CensusBlk
36.3  -121.7   C2
35.5  -120.9   A8
38.2  -122     A3
37.9  -122.4   C7
36.3  -121.9   B6
37    -120.3   A1
36.6  -122.1   B6

What I would like to do is assign census block values to the IDs in the first table based on the closest lat/long location from the second dataset, and end up with something like (let's call it CensusBlkCombo)

ID   CensusBlk
1    A1
2    B6
3    C7
4    A1
5    B6

 

("A1" is selected for ID #1 because 37/-120.3 is the closest location with a census block to 37/-120.7, the location of ID #1)

I have not been able to think of a way to do this merge using normal SAS code.

The SQL code that I have tried has been something like this

proc sql

  create table CensusBlkCombo as

    (select A.SP_ID,

      (select CENSUS_BLK_GRP from

        (select top 1 CENSUS_BLK_GRP

         from CensusBlkExist B

         order by (abs(B.sp_lat - A.sp_lat2) + abs(B.sp_long - A.sp_long2))

       as CENSUS_BLK_GRP)

     from CensusBlkMiss A));


That particular code gives me an error message, "TOP N Syntax error: Top N option is not supported in subquery". 

 

I am not wedded to that setup, though - any suggestions are welcome, including a radically different code structure welcome too.


 

ChrisBrooks
Ammonite | Level 13

You might be able to do something with the geodist function although I've never used it and with such a large number of records to process i don't think any method is likely to be quick. The documentation for geodist is here

Adam_Coutts
Calcite | Level 5

ChrisBrooks - thank you for your reply.

It looks to me like the geodist function would replace me ad hoc distance calculation (abs(B.sp_lat - A.sp_lat2) + abs(B.sp_long - A.sp_long2)) but is not set up to translate lat and long to census block.

 

Reeza - thank you for the reply as well.  PROC GINSIDE is indeed able to generate the census blocks that I was seeking.  And I was able to find census block maps for California at ftp://ftp2.census.gov/geo/tiger/TIGER2010BLKPOPHU/.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 473 views
  • 1 like
  • 3 in conversation