I am seeking help with an unusual merge/join.
I have two datasets/tables in a Teradata environment, which I am communicating with using EG 7.11.
One table (let's call it CensusBlkMiss) contains about 2m records with location ID numbers, lat values, lon values, and is missing census block values:
ID Lat Lon
1 37 -120.7
2 37.2 -123.1
3 38 -122.2
4 36.4 -119.8
5 35.9 -122.1
The other table (let's call it CensusBlkExist) contains about 10m records with lat values, lon values, and census block values:
Lat Lon CensusBlk
36.3 -121.7 C2
35.5 -120.9 A8
38.2 -122 A3
37.9 -122.4 C7
36.3 -121.9 B6
37 -120.3 A1
36.6 -122.1 B6
What I would like to do is assign census block values to the IDs in the first table based on the closest lat/long location from the second dataset, and end up with something like (let's call it CensusBlkCombo)
ID CensusBlk
1 A1
2 B6
3 C7
4 A1
5 B6
("A1" is selected for ID #1 because 37/-120.3 is the closest location with a census block to 37/-120.7, the location of ID #1)
I have not been able to think of a way to do this merge using normal SAS code.
The SQL code that I have tried has been something like this
proc sql
create table CensusBlkCombo as
(select A.SP_ID,
(select CENSUS_BLK_GRP from
(select top 1 CENSUS_BLK_GRP
from CensusBlkExist B
order by (abs(B.sp_lat - A.sp_lat2) + abs(B.sp_long - A.sp_long2))
as CENSUS_BLK_GRP)
from CensusBlkMiss A));
That particular code gives me an error message, "TOP N Syntax error: Top N option is not supported in subquery".
I am not wedded to that setup, though - any suggestions are welcome, including a radically different code structure welcome too.
What country are you in? US? If so, look at PROC GINSIDE.
The example maps to county/state but if you have the shapefiles for the blocks you can do a similar project. SAS may already have the US census shapes in the MAPS folder.
@Adam_Coutts wrote:
I am seeking help with an unusual merge/join.
I have two datasets/tables in a Teradata environment, which I am communicating with using EG 7.11.
One table (let's call it CensusBlkMiss) contains about 2m records with location ID numbers, lat values, lon values, and is missing census block values:
ID Lat Lon
1 37 -120.7
2 37.2 -123.1
3 38 -122.2
4 36.4 -119.8
5 35.9 -122.1
The other table (let's call it CensusBlkExist) contains about 10m records with lat values, lon values, and census block values:
Lat Lon CensusBlk
36.3 -121.7 C2
35.5 -120.9 A8
38.2 -122 A3
37.9 -122.4 C7
36.3 -121.9 B6
37 -120.3 A1
36.6 -122.1 B6
What I would like to do is assign census block values to the IDs in the first table based on the closest lat/long location from the second dataset, and end up with something like (let's call it CensusBlkCombo)
ID CensusBlk
1 A1
2 B6
3 C7
4 A1
5 B6
("A1" is selected for ID #1 because 37/-120.3 is the closest location with a census block to 37/-120.7, the location of ID #1)
I have not been able to think of a way to do this merge using normal SAS code.
The SQL code that I have tried has been something like thisproc sql
create table CensusBlkCombo as
(select A.SP_ID,
(select CENSUS_BLK_GRP from
(select top 1 CENSUS_BLK_GRP
from CensusBlkExist B
order by (abs(B.sp_lat - A.sp_lat2) + abs(B.sp_long - A.sp_long2))
as CENSUS_BLK_GRP)
from CensusBlkMiss A));
That particular code gives me an error message, "TOP N Syntax error: Top N option is not supported in subquery".
I am not wedded to that setup, though - any suggestions are welcome, including a radically different code structure welcome too.
What country are you in? US? If so, look at PROC GINSIDE.
The example maps to county/state but if you have the shapefiles for the blocks you can do a similar project. SAS may already have the US census shapes in the MAPS folder.
@Adam_Coutts wrote:
I am seeking help with an unusual merge/join.
I have two datasets/tables in a Teradata environment, which I am communicating with using EG 7.11.
One table (let's call it CensusBlkMiss) contains about 2m records with location ID numbers, lat values, lon values, and is missing census block values:
ID Lat Lon
1 37 -120.7
2 37.2 -123.1
3 38 -122.2
4 36.4 -119.8
5 35.9 -122.1
The other table (let's call it CensusBlkExist) contains about 10m records with lat values, lon values, and census block values:
Lat Lon CensusBlk
36.3 -121.7 C2
35.5 -120.9 A8
38.2 -122 A3
37.9 -122.4 C7
36.3 -121.9 B6
37 -120.3 A1
36.6 -122.1 B6
What I would like to do is assign census block values to the IDs in the first table based on the closest lat/long location from the second dataset, and end up with something like (let's call it CensusBlkCombo)
ID CensusBlk
1 A1
2 B6
3 C7
4 A1
5 B6
("A1" is selected for ID #1 because 37/-120.3 is the closest location with a census block to 37/-120.7, the location of ID #1)
I have not been able to think of a way to do this merge using normal SAS code.
The SQL code that I have tried has been something like thisproc sql
create table CensusBlkCombo as
(select A.SP_ID,
(select CENSUS_BLK_GRP from
(select top 1 CENSUS_BLK_GRP
from CensusBlkExist B
order by (abs(B.sp_lat - A.sp_lat2) + abs(B.sp_long - A.sp_long2))
as CENSUS_BLK_GRP)
from CensusBlkMiss A));
That particular code gives me an error message, "TOP N Syntax error: Top N option is not supported in subquery".
I am not wedded to that setup, though - any suggestions are welcome, including a radically different code structure welcome too.
You might be able to do something with the geodist function although I've never used it and with such a large number of records to process i don't think any method is likely to be quick. The documentation for geodist is here
ChrisBrooks - thank you for your reply.
It looks to me like the geodist function would replace me ad hoc distance calculation (abs(B.sp_lat - A.sp_lat2) + abs(B.sp_long - A.sp_long2)) but is not set up to translate lat and long to census block.
Reeza - thank you for the reply as well. PROC GINSIDE is indeed able to generate the census blocks that I was seeking. And I was able to find census block maps for California at ftp://ftp2.census.gov/geo/tiger/TIGER2010BLKPOPHU/.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.