Hi!
I have a list of zip codes and whether or not they have a provider (1 for yes, 0 for no). If providerinzip = 0, then I need to find the closest zip code that has a provider. I am unsure if nearest neighbor would work because I'm not looking for the nearest zip but one where providerinzip=1.
Then I need to calculate the distance, but I know how to do that.
Attached csv file of data.
So this is the SAS code I ended up using. Excel file "one" is zip (all), lat and long. Excel file "two" is zip (that have providers), lat and long.
proc import
datafile ="one.xlsx"
dbms = xlsx
out = one
replace;
run;
proc import
datafile ="two.xlsx"
dbms = xlsx
out = two
replace;
run;
proc sql;
create table mixed as
select one.*,two.zip as zip2, geodist(one.lata, one.longa, two.latb, two.longb, 'M') as dist
from one as one
cross join two as two
order by one.zip;
quit;
data cross;
set mixed;
drop lata longa;
run;
proc sort data=cross;
by zip dist;
Run;
data crosskeep;
set cross;
by zip;
if first.zip;
run;
proc export
data=crosskeep
outfile="cross.xlsx"
dbms = xlsx
replace;
run;
This gave me a table with actual zip, closest zip, distance to closest zip. Would there have been an easier way?
I would:
1. Download the latitude/longitude of each ZIP code from a reputable source
2. Use the geodist function to calculate the distance between ZIP codes
Right, I know how to do that. My issue is that there's 420some odd zip codes. That will take hours of manual work to find the closest zip with a provider for each zip that doesn't have one. I was looking for a way for SAS to calculate the closest zip.
geodist does that calculation. Keep the lowest distance.
What's missing?
Reading the documentation on GEODIST should help your understanding:
So this is the SAS code I ended up using. Excel file "one" is zip (all), lat and long. Excel file "two" is zip (that have providers), lat and long.
proc import
datafile ="one.xlsx"
dbms = xlsx
out = one
replace;
run;
proc import
datafile ="two.xlsx"
dbms = xlsx
out = two
replace;
run;
proc sql;
create table mixed as
select one.*,two.zip as zip2, geodist(one.lata, one.longa, two.latb, two.longb, 'M') as dist
from one as one
cross join two as two
order by one.zip;
quit;
data cross;
set mixed;
drop lata longa;
run;
proc sort data=cross;
by zip dist;
Run;
data crosskeep;
set cross;
by zip;
if first.zip;
run;
proc export
data=crosskeep
outfile="cross.xlsx"
dbms = xlsx
replace;
run;
This gave me a table with actual zip, closest zip, distance to closest zip. Would there have been an easier way?
@ChrisNZ wrote:
I would:
1. Download the latitude/longitude of each ZIP code from a reputable source
2. Use the geodist function to calculate the distance between ZIP codes
SASHELP.ZIPCODE has lat/long for centroid of zip codes though you may want to download the latest version from SAS support.
This set is very likely in the SAS install used.
The function ZIPCITYDISTANCE may be easier than GEODIST unless using an older version of SAS without the function.
@ballardw Too easy! So many US-specific data sets and functions! 🙂
@ChrisNZ wrote:
@ballardw Too easy! So many US-specific data and functions! 🙂
It's good to live in the country that designs the software...
IIRC, when the SASHELP.ZIPCODE set is not available several of the Zip code related functions don't work.
A sneaky implication might be that replacing SASHELP.ZIPCODE with a custom set might allow some of the functionality elsewhere.
I know it is possible to replace the data set because there are updates on the SAS tech support site with instructions...
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.