BookmarkSubscribeRSS Feed
Teketo
Calcite | Level 5

Hello,

I have got three spatial datasets.

  1. Polygon shape file of a country x (including all administrative regions)
  2. Demographic and Health Survey (DHS) data (population characteristics and cluster coordinates)
  3. Health facility data (health facilities characteristics and facilities coordinate)

I have been trying to merge these datasets using SAS; however, I have got problem in merging the population data (DHS) with health facility data. Both the sampled health facility and population data were collected all over the country.

 

The DHS data has 622 clusters (each cluster has one coordinate information) and an average of 23 individuals (14,300 people in total) were interviewed per cluster. On the other hand, in the second dataset, 1,020 health facilities were interviewed along with their geographic coordinates.

 

The merge should be done using both geographic coordinates (1,020 health facility vs 622 clusters) (using the nearest health facility distance to each DHS cluster) and regions (11 regions where both the data were collected). 

 

The merge that I want is not only by minimum distance between clusters and health facilities, but it should also consider the regional administration boundary. In other words, all nearest distance merges must not cross regional admin boundary. 

 

During merge, how do I manage the multiple observations per cluster in the population dataset; 14,300 people in 622 clusters? There are more location coordinates of health facility (1,020) as compared to 622 clusters.

 

How can I merge the DHS data with the health facility data? How do I manage the attribute data (an average of 23 individuals information per one cluster) while combining it with a single health facility data?

 

Here are sample elements of the two data sets.

Data x; *Health facility dataset;

Set a;

Keep LAT LONG REGION FACTYPE Q102_04 GR1 GR2 FA1 FA2 FR1 FR2 FR3

Run;

 

Data y; *Population dataset;

Set b;

Keep V001 LAT_DHS LONG_DHS V002 V012 REGION V190 V218 M14_1 V501 V313M;

Run;

 

Kind regards

Teketo

1 REPLY 1
Reeza
Super User

This isn't a merge, it's more like finding the nearest which is a different type of analysis.

 

GEODIST will calculate the distances, but those are 'straight line' distances, not driving distance. SAS VA can do this via driving distances and you may want to use ArcGIS or QGIS (free) to find the nearest location via driving distance.

 


@Teketo wrote:

Hello,

I have got three spatial datasets.

  1. Polygon shape file of a country x (including all administrative regions)
  2. Demographic and Health Survey (DHS) data (population characteristics and cluster coordinates)
  3. Health facility data (health facilities characteristics and facilities coordinate)

I have been trying to merge these datasets using SAS; however, I have got problem in merging the population data (DHS) with health facility data. Both the sampled health facility and population data were collected all over the country.

 

The DHS data has 622 clusters (each cluster has one coordinate information) and an average of 23 individuals (14,300 people in total) were interviewed per cluster. On the other hand, in the second dataset, 1,020 health facilities were interviewed along with their geographic coordinates.

 

The merge should be done using both geographic coordinates (1,020 health facility vs 622 clusters) (using the nearest health facility distance to each DHS cluster) and regions (11 regions where both the data were collected). 

 

The merge that I want is not only by minimum distance between clusters and health facilities, but it should also consider the regional administration boundary. In other words, all nearest distance merges must not cross regional admin boundary. 

 

During merge, how do I manage the multiple observations per cluster in the population dataset; 14,300 people in 622 clusters? There are more location coordinates of health facility (1,020) as compared to 622 clusters.

 

How can I merge the DHS data with the health facility data? How do I manage the attribute data (an average of 23 individuals information per one cluster) while combining it with a single health facility data?

 

Here are sample elements of the two data sets.

Data x; *Health facility dataset;

Set a;

Keep LAT LONG REGION FACTYPE Q102_04 GR1 GR2 FA1 FA2 FR1 FR2 FR3

Run;

 

Data y; *Population dataset;

Set b;

Keep V001 LAT_DHS LONG_DHS V002 V012 REGION V190 V218 M14_1 V501 V313M;

Run;

 

Kind regards

Teketo


 

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 1166 views
  • 0 likes
  • 2 in conversation