BookmarkSubscribeRSS Feed
SAS93
Quartz | Level 8

I'm working on a geocoding project to geocode resident and hospital addresses given patient hospital records. 

 

I want to first estimate the distance between the home and hospital addresses (pretty straightforward), then also assign a new Y/N indicator variable to each patient so I can identify them as having visited the hospital that is closest to their home address. There are approximately 5,000 rows, one for each patient, and about 200 hospitals.

 

For each row, I basically want SAS to scan through the hospitals, select the one that is closest in distance to the home address, compare that hospitals ID number to the one on the patient's record, and return a yes/no based on if those hospital IDs match.

 

How do I do this efficiently?  I currently have variable names X and Y for home address coordinates, X_Hosp and Y_Hosp for the hospital addresses; this information is all in the same dataset and already assigned to each blinded patient ID. But as I've worked, I've also kept copies of the datasets (PTS and HOSP) as they were before I merged them, so there is still a dataset with only the 200 hospitals that I can work with.

2 REPLIES 2
PaigeMiller
Diamond | Level 26

It sounds like you have to compute the distance between a patients home and all 200 hospitals to determine which one is closest. I imagine you would create an array to loop over all 200 hospitals and you can use the GEODIST function to compute the distance. Then assign a 1 if the patient actually visited the closest hospital and a 0 if the patient did not visit the closest hospital.

--
Paige Miller
quickbluefish
Barite | Level 11

Pretty simplistic, but you could do this.  I wouldn't recommend this with anything much larger than what you have -- in your data, the CROSS JOIN will produce a table with 1M rows (5000*200), which is fine.

 

data pt;
do pt=1 to 20;
	visited_hosp=rand('integer',1,10);
	X=rand('erlang',2)*5;
	Y=rand('erlang',2)*10;
	output;
end;
run;

data hosp;
do hospid=1 to 10;
	hosp_X=rand('erlang',2)*5;
	hosp_Y=rand('erlang',2)*10;
	output;
end;
run;

proc sql;
create table pthosp as
select a.*, b.hospid, 
geodist(a.Y, a.X, b.hosp_Y, b.hosp_X) as dist
from
	pt A
	cross join 
	hosp B
order by a.pt, dist;
quit;

data want;
set pthosp;
by pt;
if first.pt;
hosp_match='N';
if visited_hosp=hospid then hosp_match='Y';
run;

proc print data=want; run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 367 views
  • 1 like
  • 3 in conversation