I'm trying to find the distance between 3,600 pairs of zipcodes, patient and their provider. I attached 50 pairs in a CSV file. Patient zipcode is patz, and provider zipcode is provz.
I run SAS 9.4 on Windows 10 in a VM using Parallels, so I'm used to things taking a little bit longer than they would when I run it on a Windows computer. Except I left it to run overnight last night, and it wasn't done seven hours later.
Is there a problem with my code?
libname ob "\\Mac\Home\Desktop\SAS";
proc import file="\\Mac\Home\Desktop\zip.csv"
dbms=csv
out=zip
replace;
run;
data _null_;
call symputx('nzips',obs);
stop;
set zip nobs=obs;
run;
%macro distance_time (ds=, z1=, z2=, out=);
%do j=1 %to &nzips;
DATA _null_;
nrec = &j;
set zip point=nrec;
call symputx('patz',&z1);
call symputx('provz',&z2);
RUN;
filename x url "https://www.google.com/maps/dir/&z1/&z2/?force=lite";
filename z temp;
data _null_;
infile x recfm=f lrecl=1 end=eof;
file z recfm=f lrecl=1;
input @1 x $char1.;
put @1 x $char1.;
if eof;
call symputx('filesize',_n_);
run;
data temp;
keep z1 z2 distance units time;
z1="&z1";
z2="&z2";
infile z recfm=f lrecl=&filesize. eof=done;
input @ 'miles' +(-15) @ '"' distance :comma12. text $30.;
units = scan(text,1,'"');
text = scan(text,3,'"');
select;
when (find(text,'d') ne 0) time = sum(86400*input(scan(text,1,' '),best.),
3600*input(scan(text,3,' '),best.));
when (find(text,'h') ne 0) time = sum(3600*input(scan(text,1,' '),best.),
60*input(scan(text,3,' '),best.));
otherwise time = 60*input(scan(text,1,' '),best.);
end;
output;
keep z1 z2 distance units time;
done:
output;
run;
filename x clear;
filename z clear;
proc append base=distance_time data=temp force;
run;
%end;
%mend;
%distance_time (ds=zip, z1=patz, z2=provz, out=dist_time);
data ob.distance_time;
set dist_time;
run;
proc export data=ob.distance_time
dbms=xlsx
outfile="\\Mac\Home\Desktop\distance.xlsx"
replace;
run;
Thanks!!!
The main thing wrong with your code is the structure. The program uses a macro loop to loop over pairs and computes the distance between pairs and then uses a formula to compute the distance and uses PROC APPEND to append the results.
Just create a data set that contains the pairs that you want to analyze and then process them all by using a single DATA step.
PS> If you want to know why the macro loop + PROC APPEND method is so slow, see the article "Simulation in SAS: The slow way or the BY way", which analyzes macro loops in the context of simulation studies.
Take the macro out of the problem. Does it run for one pair of zip codes?
Do you really expect the TEMP data step to generate more than one observation? If not then insert a STOP statement after the OUTPUT statement.
And after you get this to work for few pairs you might want to look into the SAS ZIPCITYDISTANCE function and see if the results differ by enough to be a problem.
data example; input patz provz; distance = zipcitydistance(patz,provz); datalines; 28110 29403 28110 29440 28110 29520 ;
The distance returned is in miles.
The ZIPCITYDISTANCE function returns the geodetic distance in miles between two ZIP code locations. The centroid of each ZIP code is used in the calculation.The Sashelp.Zipcode data set must be present when you use this function. If you remove the data set, then ZIPCITYDISTANCE returns unexpected results.
Note that Google is probably not happy with so many hits on its service from the same location. It might well be slowing down its response or ignoring you all together.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.