Hello. I have code below for calculating the shortest distance between a patient's zip code and their nearest specialty medical provider. The problem is that while I have patient zip codes, I only have city and state for a long list of medical providers. I'd be grateful for help with finding the shortest distance between a patient's zip code and the city of their nearest medical provider. I also figure that this code is probably not ideal, as I have city/state for over 400 doctors. Any suggestions for how to do this more efficiently than this?
data PatientZipDoctorCityState;
set PatientZipDoctorCityState;
if not missing(PatientZip) then PatientZipCode=put(PatientZip,$z5.);
Doctor1 =put(00000,$z5.);
Doctor2 =put(00000,$z5.);
Doctor3 =put(00000,$z5.);
Doctor4 =put(00000,$z5.);
Doctor5 =put(00000,$z5.);
Doctor6 =put(00000,$z5.);
Doctor7 =put(00000,$z5.);
Doctor8 =put(00000,$z5.);
Doctor9 =put(00000,$z5.);
if not missing(PatientZipCode) then distance=zipcitydistance(PatientZipCode,Doctor1);
array calc[9] Doctor1-Doctor9;
do i=1 to 9;
if not missing(PatientZip)
then distance=min(distance, zipcitydistance(PatientZip,calc(i))) ;
end;
drop Doctor1-Doctor9 i;
run;
Depending upon what (if anything) you want to output when you can't calculate distance, proc sql may be easier code to maintain.
e.g., the following will only output a record if there is a non-missing minimum distance:
data PatientZip;
do zip=15101,15100,15116,15123,15102,15122,15123,15102,15110,15112,15116,15120;
Patient+1;
output;
end;
run;
data DoctorCityState;
input doctor zip;
cards;
1 15101
2 15102
3 15104
4 15106
5 15108
6 15210
7 15212
8 15216
9 15220
;
proc sql;
create table PatientZipDoctorCityState as
select a.patient, a.zip,
zipcitydistance(a.Zip,b.zip) as distance
from PatientZip a, DoctorCityState b
group by patient
having calculated distance eq min(calculated distance)
and not missing(calculated distance)
;
quit;
Of course it can also be done using an array, but I wouldn't do it the way you did. Absolutely no reason to keep all of the doctor ids/names and zips for each patient. However, I won't suggest a modification if you decide to go with the sql method instead.
Art, CEO, AnalystFinder.com
This provides exactly what I need, which is the distance between the patient's zip code and the nearest doctor. But do you know of a way to do this with city and state data instead of zip codes? I only have city and state for each doctor, but I have zip code for each patient.
For each city, either take a representative ZIP that works for you or find all zips allocated to that city and use an average of the latitude/longitude instead. You're going to have make assumptions or some rule, so just make it consistent and if you have GIS skills and/or are working with a small area make a map so you understand what those assumptions may mean.
If you have lat/long for zips you can use GEODIST() instead.
SAS has several files with the lat/long for cities as well as the lat/long for zip codes.
They're in the SASMAPS and GFSK libraries these days.
Not sure how you would select a centralized zip for a large city. Here is a start, at least, for getting zip from city and state:
/* create DoctorCityState for testing purposes; */
data need(drop=wantzip);
input doctor wantzip;
city=scan(zipcity(wantzip),1,',');
state=zipstate(wantzip);
cards;
1 15101
2 15102
3 15104
4 15106
5 15108
6 15210
7 15212
8 15216
9 15220
;
proc sql;
create table DoctorCityState as
select a.doctor,a.city,a.state,b.zip
from DoctorCityState a,
sashelp.zipcode b
group by a.doctor,b.city,b.state
having upcase(a.city) eq upcase(b.city)
and upcase(a.state) eq upcase(b.statecode)
and b.zip eq min(b.zip)
;
quit;
/* END: create DoctorCityState for testing purposes; */
Art, CEO, AnalystFinder.com
See if you have access to SAS/GIS with a road map set. This is really a GIS problem as a short distance may equal a long drive time. I can show you some 15 mile drives that are much longer to drive than 30 or 40 miles of freeway.
Alternatively this might be a linear programming program if you have a database of from/to coordinates with travel time/distance information. If you have, or can find such information, that could be a SAS/OR problem.
Do you have the street address of the clinic, if not the ZIP? If so, SAS has a geocoding option that will geocode addresses.
If you search LexJansen.com there's some papers on using Google to get driving distances instead of distances 'as the crow flies'.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.