Dear Reeza, Many thanks for your invaluable support. The cross join works well; however, when I use the BY group and FIRST, I am losing most of the respondents (R_ID) and more than half of the clusters. For example, respondents (R_ID) 8, 9, and from 11 to 23 and clusters from 5 to 10 are not included in the final dataset. I used the following dataset and code to do the cross join and pick the appropriate health facilities, respondents and clusters along with the other variables. *To calculate the distance; Data clusters; Input R_ID Cluster Lat1 Lon1 region age FP; datalines; 1 1 14.0 38.7 1 19 0 2 1 14.0 38.7 1 21 1 3 2 11.6 41.5 2 30 0 4 2 11.6 41.5 2 48 0 5 2 11.6 41.5 2 25 1 6 3 12.7 37.5 3 32 1 7 3 12.7 37.5 3 21 1 8 4 7.5 39.1 4 40 0 9 4 7.5 39.1 4 33 1 10 4 7.5 39.1 4 29 0 11 5 12.7 37.6 1 40 0 12 5 12.7 37.6 1 31 1 13 6 10.4 40.1 2 27 1 14 6 10.4 40.1 2 29 0 15 7 12.1 40.0 2 26 0 16 7 12.1 40.0 2 24 1 17 8 12.2 37.0 3 32 0 18 8 12.2 37.0 3 37 1 19 9 9.9 39.9 3 42 0 20 9 9.9 39.9 3 37 1 21 10 9.1 40.7 4 33 0 22 10 9.1 40.7 4 35 1 23 10 9.1 40.7 4 36 0 ; *Health facility data; Data health_facility; Input ID Lat2 Lon2 region HFMA FPRS; Datalines; 1 13.6 39.1 1 1 0 2 13.3 38.9 1 2 1 3 10.7 40.1 2 1 1 4 10.3 40.2 2 1 1 5 11.4 40.7 2 1 1 6 10.5 39.5 3 2 0 7 11.6 39.6 3 1 1 8 11.3 37.9 3 2 1 9 9.1 35.6 4 2 0 10 9.5 34.8 4 1 1 11 14.3 37.8 1 2 0 12 13.5 39.4 1 1 1 13 10.4 40.2 2 2 0 14 9.2 40.3 2 1 1 15 9.9 39.8 3 2 1 16 11.0 37.1 3 1 1 17 10.6 39.4 3 2 0 18 8.5 34.5 4 2 1 19 7.2 38.2 4 1 1 20 8.8 35.8 4 2 1 ; run; *Proc sql: Using cross join to merge to the two datasets; proc sql; create table dist_table as select h.id, h.lat2 as lat_facility, h.lon2 as long_facility, h.region, c.r_id, c.cluster, c.lat1 as lat_cluster, c.lon1 as long_cluster, geodist(h.lat2, h.lon2, c.lat1, c.lon1, 'k' ) as dist_km from clusters as c cross join health_facility as h where c.region=h.region order by h.id, c.cluster, dist_km; quit; *To pick a single minimum distance for each health facility having in mind the following criteria One health facility should be linked to one cluster in a region One cluster can be linked to one or more health facilities in a region; *BY group and FIRST; Data merged; Set dist_table; By id cluster dist_km; If first.id; Run; The following one also doesn’t give the appropriate result. proc sort data = dist_table; by id dist_km; run; Data merged; Set dist_table; By id dist_km; If first.id; Run; Kind regards Teketo
... View more