BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Quentin
Super User

I would do this in three steps:

  1. Hard code that works to calculate one distance
  2. A macro that works to calculate one distance
  3. A data-driven driver program (macro or call execute) to call the macro once per patient-hospital

Since you've got hard code that works, I would write the macro next.  I took your hard code and wrote a macro.  It has inputs for patientID, HospitalID, and the latitude and longitude of each.  When you call the macro, it calculates the distance, and then appends one record to an output dataset.  The macro is:

%macro distance
  (patientID=
  ,patientLatitude=
  ,patientLongitude=
  ,hospitalID=
  ,hospitalLatitude=
  ,hospitalLongitude=
  ,out= /*name of output dataset to append to*/
  ) ;

filename x url "https://www.google.com/maps/dir/?api=1%nrstr(&)origin=&patientLatitude%2C&patientLongitude%nrstr(&)destination=&hospitalLatitude%2C&hospitalLongitude%nrstr(&)travelmode=driving";
filename z temp;

%local filesize ;
 
data _null_; 
  infile x recfm=f lrecl=1 end=eof; 
  file z recfm=f lrecl=1;
  input @1 x $char1.; 
  put @1 x $char1.;
  if eof;
  call symputx('filesize',_n_);
run;

* driving distance as a numeric variable;
data __distance(keep=PatientID HospitalID Distance PatientLatitude PatientLongitude HospitalLatitude HospitalLongitude) ;
  length PatientID $8 HospitalID $8 ;
  PatientID="&PatientID" ;
HospitalID="&HospitalID" ; PatientLatitude=&PatientLatitude ; PatientLongitude=&PatientLongitude ; HospitalLatitude=&HospitalLatitude ; HospitalLongitude=&hospitalLongitude ; infile z recfm=f lrecl=&filesize. eof=done; input @ 'miles' +(-15) @ '"' Distance :comma12. text $30.; units = scan(text,1,'"'); text = scan(text,3,'"'); output; put PatientID= HospitalID= Distance=; stop; done: output; run; proc append base=&out data=__distance ; run ; proc delete data=__distance ; run ; filename x clear ; filename z clear ; %mend distance ;

Call like:

%distance
  (patientID= foo1
  ,patientLatitude= 25.79585457901094
  ,patientLongitude= -80.2799268941304
  ,hospitalID= bar
  ,hospitalLatitude= 26.07420546642483
  ,hospitalLongitude= -80.15058074645368
  ,out=mydistances
  )

The nice thing about having a simple macro like this, is that it's easy to test so you can try stuff like:

%distance
  (patientID=foo1
  ,patientLatitude=25.79585457901094
  ,patientLongitude=-80.2799268941304
  ,hospitalID=bar1
  ,hospitalLatitude=26.07420546642483
  ,hospitalLongitude=-80.15058074645368
  ,out=mydistances
  )

%distance
  (patientID=foo2
  ,patientLatitude=25.71
  ,patientLongitude=-80.27
  ,hospitalID=bar2
  ,hospitalLatitude=26.07
  ,hospitalLongitude=-80.15
  ,out=mydistances
  )

%distance
  (patientID=foo3
  ,patientLatitude=25.72
  ,patientLongitude=-80.27
  ,hospitalID=bar3
  ,hospitalLatitude=26.07
  ,hospitalLongitude=-80.15
  ,out=mydistances
  )

proc print data=mydistances ;
run ;

and you'll learn stuff during testing.  For example, it's slow (Mike mentions in his paper that google maps is slow, they could be throttling this intentionally).  Sometimes you can a null value for distance (Mike has error-handling code in the paper to detect that sort of stuff).

I suggest you play with that macro a bit, and see if you like it.  Feel free to update it and post a version you like better.  Then once you have a macro you like, happy to help you with step #3, writing the driver code to call this macro once per patient-hospital distance.

 

 

 

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
stats_auca
Obsidian | Level 7

Wow, thank you! This is working perfectly (macro below). I would love your assistance with step #3, writing the driver code to call this macro once per patient-hospital distance. Also, I have ~9000 total combinations  so I am wondering how to best set up the data. I am new to Macros so I wasn't sure if I should set up multiple datasets? I really just need to compare the drive time of each ID (~900) and hospital (10) and then create a list of distance of only the nearest hospital for each ID. Any ideas are very much welcome! I cannot thank you enough.

%macro distance
  (patientID=
  ,patientLatitude=
  ,patientLongitude=
  ,hospitalID=
  ,hospitalLatitude=
  ,hospitalLongitude=
  ,out= /*name of output dataset to append to*/
  ) ;

filename x url "https://www.google.com/maps/dir/?api=1%nrstr(&)origin=&patientLatitude%2C&patientLongitude%nrstr(&)destination=&hospitalLatitude%2C&hospitalLongitude%nrstr(&)travelmode=driving";
filename z temp;

%local filesize ;
 
data _null_; 
  infile x recfm=f lrecl=1 end=eof; 
  file z recfm=f lrecl=1;
  input @1 x $char1.; 
  put @1 x $char1.;
  if eof;
  call symputx('filesize',_n_);
run;

* driving distance as a numeric variable;
data __distance(keep=PatientID HospitalID Distance PatientLatitude PatientLongitude HospitalLatitude HospitalLongitude) ;
  length PatientID $8 HospitalID $8 ;
  PatientID="&PatientID" ;  HospitalID="&HospitalID" ;
  PatientLatitude=&PatientLatitude ;
  PatientLongitude=&PatientLongitude ;
  HospitalLatitude=&HospitalLatitude ;
  HospitalLongitude=&hospitalLongitude ;
  infile z recfm=f lrecl=&filesize. eof=done;
  input @ 'miles' +(-15) @ '"' Distance :comma12. text $30.;
  units    = scan(text,1,'"');
  text     = scan(text,3,'"');
  output; 

  put PatientID= HospitalID= Distance=;
  stop;
  done:
  output;
run;

proc append base=&out data=__distance ;
run ;

proc delete data=__distance ;
run ;

filename x clear ;
filename z clear ;

%mend distance ;

%distance
  (patientID= foo1
  ,patientLatitude= 25.79585457901094
  ,patientLongitude= -80.2799268941304
  ,hospitalID= bar
  ,hospitalLatitude= 26.07420546642483
  ,hospitalLongitude= -80.15058074645368
  ,out=mydistances
  )
proc print data=mydistances ;
run ;

 

Quentin
Super User

Glad the macro works for you. 

 

I made a small change to the macro when I was testing step 3.  I realized that in your example data you have a LOT of precision for latitude and longitude, more than SAS can maintain in a numeric variable, so I made these variable character.  

 

The updated macro:

 

 

%macro distance
  (patientID=
  ,patientLatitude=
  ,patientLongitude=
  ,hospitalID=
  ,hospitalLatitude=
  ,hospitalLongitude=
  ,out= /*name of output dataset to append to*/
  ) ;

filename x url "https://www.google.com/maps/dir/?api=1%nrstr(&)origin=&patientLatitude%2C&patientLongitude%nrstr(&)destination=&hospitalLatitude%2C&hospitalLongitude%nrstr(&)travelmode=driving";
filename z temp;

%local filesize ;
 
data _null_; 
  infile x recfm=f lrecl=1 end=eof; 
  file z recfm=f lrecl=1;
  input @1 x $char1.; 
  put @1 x $char1.;
  if eof;
  call symputx('filesize',_n_);
run;

* driving distance as a numeric variable;
data __distance(keep=PatientID HospitalID Distance PatientLatitude PatientLongitude HospitalLatitude HospitalLongitude) ;
  length PatientID $8 HospitalID $8 PatientLatitude PatientLongitude HospitalLatitude HospitalLongitude $20;
  PatientID="&PatientID" ;
  HospitalID="&HospitalID" ;
  PatientLatitude="&PatientLatitude" ;
  PatientLongitude="&PatientLongitude" ;
  HospitalLatitude="&HospitalLatitude" ;
  HospitalLongitude="&hospitalLongitude" ;

  infile z recfm=f lrecl=&filesize. eof=done;
  input @ 'miles' +(-15) @ '"' Distance :comma12. text $30.;
  units    = scan(text,1,'"');
  text     = scan(text,3,'"');
  output; 

  put PatientID= HospitalID= Distance=;
  stop;
  done:
  output;
run;

proc append base=&out data=__distance ;
run ;

proc delete data=__distance ;
run ;

filename x clear ;
filename z clear ;

%mend distance ;

The first part of step 3 is to create the data.  Suppose you start with a dataset of hospitals with the latitude and longitude of each hospital, and a dataset of patients:

data Hospitals ;
  input ID $1 Latitude :$20. Longitude:$20. ;
  cards ;
A 26.07420546642483  -80.15058074645368
B 26.08 -80.15
C 26.09 -80.15
;
run ;

data Patients ;
    input ID $1 Latitude :$20. Longitude:$20. ;
  cards ;
1 25.79585457901094 -80.2799268941304
2 25.71 -80.27
3 25.72 -80.27
;
run ;

The goal is to create a dataset of trips, which has all possible trips (Patient 1 to Hospital A, Patient 1 to Hospital B, etc.). You can use PROC SQL to make a dataset which is the cartesian product of patients and hospitals.  So 3 patients * 3 hospitals=9 trips:

proc sql ;
  create table trips as
  select 
    Patients.ID as PatientID
   ,Patients.Latitude as PatientLatitude
   ,Patients.Longitude as PatientLongitude
   ,Hospitals.ID as HospitalID
   ,Hospitals.Latitude as HospitalLatitude
   ,Hospitals.Longitude as HospitalLongitude
  from Patients,Hospitals
 ;
quit ;

So that data looks like:

Patient                                        Hospital
  ID     PatientLatitude    PatientLongitude      ID     HospitalLatitude   HospitalLongitude

   1     25.79585457901094  -80.2799268941304     A      26.07420546642483  -80.15058074645368
   1     25.79585457901094  -80.2799268941304     B      26.08              -80.15
   1     25.79585457901094  -80.2799268941304     C      26.09              -80.15
   2     25.71              -80.27                A      26.07420546642483  -80.15058074645368
   2     25.71              -80.27                B      26.08              -80.15
   2     25.71              -80.27                C      26.09              -80.15
   3     25.72              -80.27                A      26.07420546642483  -80.15058074645368
   3     25.72              -80.27                B      26.08              -80.15
   3     25.72              -80.27                C      26.09              -80.15

Then you can use that dataset to call the macro.  Basically you want to call the %DISTANCE macro nine times, and each time you call it, you pass in the data for one trip.  One way to do that is with CALL EXECUTE.  With a DATA step, you can read the TRIPS data, and use CALL EXECUTE to generate a macro call from each record.  The code is a bit ugly, and CALL EXECUTE is an advanced macro topic.  But it's a handy method:

data _null_;
  set trips;
  call execute 
    (
    '%nrstr('
        ||  '%distance'
        ||    '('
        ||    ' patientID='         || trim(patientID)
        ||    ',patientLatitude='   || trim(patientLatitude)
        ||    ',patientLongitude='  || trim(patientLongitude)
        ||    ',HospitalID='        || trim(HospitalID)
        ||    ',HospitalLatitude='  || trim(HospitalLatitude)
        ||    ',HospitalLongitude=' || trim(HospitalLongitude)
        ||    ',out=mydistances'
        ||    ' )'
        || ')'          
    );
run;

When that step runs, it will generate nine macro calls.  The same as if you had typed nine macro calls yourself.  If you look in the log, it will show the macro calls, e.g.:

NOTE: CALL EXECUTE generated line.
1   + %distance(
patientID=1,patientLatitude=25.79585457901094,patientLongitude=-80.2799268941304,HospitalID=A,Hospital
Latitude=26.07420546642483,HospitalLongitude=-80.15058074645368,out=mydistances )

After the macro calls have executed, work.mydistances will have nine records:

Patient  Hospital
ID ID PatientLatitude PatientLongitude HospitalLatitude HospitalLongitude Distance 1 A 25.79585457901094 -80.2799268941304 26.07420546642483 -80.15058074645368 26.1 1 B 25.79585457901094 -80.2799268941304 26.08 -80.15 28.6 1 C 25.79585457901094 -80.2799268941304 26.09 -80.15 27.6 2 A 25.71 -80.27 26.07420546642483 -80.15058074645368 30.5 2 B 25.71 -80.27 26.08 -80.15 33.0 2 C 25.71 -80.27 26.09 -80.15 31.9 3 A 25.72 -80.27 26.07420546642483 -80.15058074645368 29.5 3 B 25.72 -80.27 26.08 -80.15 32.1 3 C 25.72 -80.27 26.09 -80.15 31.0

That has all possible trips.  To select the shortest trip for each patient, you could sort by Patient and Distance, then select the first record for each patient.

 

As a side note, each time you call the macro, it will append to work.mydistances.  So if you want to start fresh, you should delete it before calling the macro.  You could put this code before the data _null_ step that runs CALL EXECUTE:

%if %sysfunc(exist(mydistances)) %then %do ;
  proc delete data=mydistances ;
  run ;
%end ;

 

 

 

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
stats_auca
Obsidian | Level 7

Thank you! I ran the whole dataset this last weekend (6000 PatientIDs) but unfortunately, SAS stopped responding when approximately 90% complete on Monday. It took so long because I kept getting a notification that 'windows is full and must be cleared'.  The second time, I used a proc printto statement to output the log to my c drive but I forgot to sort my dataset by PatientID. There was an error after about 250 correct distances and then SAS incorrectly listed all remaining distance as '100'. I confirmed all lat/long are correct and I am now running for a 3rd time using the procprintto statement and with my data correctly sorted. It has been running for approximately 21 hours but I have not yet opened the log because I was unsure if that will create any issues. I was hoping to get some advice if it I do end up finding issues this time.

 

I am open to running smaller subsets of my data (could this be accomplished with a %DO statement?) or just running the unique Patient zip codes (800ish vs. 6000). However, I concerned that I may have issues then correctly transferring the distances back to my larger dataset. I also read that I can use 'Options no notes' to suppress log messages during the simulation - I'm not sure if this would be more helpful than outputting the log. I appreciate any advice!

Quentin
Super User

Hi,

 

Glad that is (almost) working.  : )

 

Using PROC PRINTTO to write the log to a file is a great idea.  I never use options NONOTES, because I like the log to have all the information I expect.  In this case, I don't think it's the log writing that is slowing things downs, it's the calls to google maps that are slow.  In my experience it's safe to open the .log file while the code is running, using notepad, notepad++, or whatever text editor.

 

Yes, it would be possible to run this in chunks, but if you've got only 800 unique zip codes for patients, changing to run this by zip code will be a huge time savings.  Great idea!   One approach to try would be to make a new "Patients" datasets, where the ID is the zipcode.  So you have 800 records like:

data Patients ;
  input ID $5. Latitude :$20. Longitude:$20. ;
  cards ;
90210 25.79585457901094 -80.2799268941304
27513 25.71 -80.27
02138 25.72 -80.27
;
run ;

Then you could run that dataset through the macros, and  select the closest hospital per zip code, and you'll get the output dataset with 800 records that match each zip code to the closest hospital.  The you could join that data onto your dataset with 6000 patient records, using MERGE or SQL, joining them by zip code.

 

I think that should work, and processing 800 zip codes should run in <15% of the time it takes to process 6000 patients.

 

 

 

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
stats_auca
Obsidian | Level 7

Thank you again for all of your help! The whole dataset ended up working great. There were a few hundred missing distances so I just reran the macro for those missing which worked for most missing distances. Interesting, there was one distance that did not run and was the only distance under 1 mile. I calculated that distance manually and then added that distance to those specific lat/longitude pairs using a datastep (if then do).  I then used proc means minimum to find the shortest distance for each ID. I could not have completed this without your help. I am so appreciative!

Quentin
Super User

@MMMIIIIKKKKEEEE Raithel presented a webinar for BASUG on calling google maps from SAS. 

 

The paper https://www.basug.org/_files/ugd/6fce8c_964ee9ecca2d4def95d523ccef5b9465.pdf  and recording https://www.basug.org/videos?wix-vod-video-id=4ffd50aeee284b1ca2a7c3ef5a91790e&wix-vod-comp-id=comp-... are available in our archives.

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
stats_auca
Obsidian | Level 7

@Quentin @MMMIIIIKKKKEEEE 

Thank you! I both watched the webinar and read the paper - both excellent. Unfortunately, I only have GPS coordinates (latitude and longitude). Thanks for all including your references - I am taking the time to look through those papers now. Also, there many BASUG webinars I would like to watch so thank you for making me aware of the webinars.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 22 replies
  • 1920 views
  • 14 likes
  • 4 in conversation