Hello, I'm trying to do some distance analysis. Basically, I want to determine the share of patients who are with 30 miles of certain medical providers. I grouped patients by zip code and I have a separate list of zip codes where 1 or more of the providers has a practice. For each patient zip code, loop through an array of provider zips and measure the distance for each provider zip code using the ZIPCITYDISTANCE function. I then would take the minimum value of all the distances and that would be distance value for that patient zip code.
I ran into trouble building the array of provider zip codes because I bumped into the size limit for array values. Can anyone suggest work around? I'm wondering if a hash table might do the same thing without the limit, but am not sure of the syntax.
200 PROC SQL NOPRINT;
201 /*CREATE AN ARRY OF PROVIDER ZIP CODES*/
202 SELECT QUOTE(PROV_ZIP,"'") INTO :ZIPLIST SEPARATED BY ' '
203 FROM VALID_PROVZ
204 WHERE VALID=1;
ERROR: THE LENGTH OF THE VALUE OF THE MACRO VARIABLE ZIPLIST (65540) EXCEEDS THE MAXIMUM LENGTH (65534). THE VALUE HAS BEEN
TRUNCATED TO 65534 CHARACTERS.
205 %LET ZIPCNT=&SQLOBS;
206 %PUT &ZIPCNT;
8194
207
NOTE: THE SAS SYSTEM STOPPED PROCESSING THIS STEP BECAUSE OF ERRORS.
NOTE: THE PROCEDURE SQL USED 0.05 CPU SECONDS AND 24308K.
NOTE: THE ADDRESS SPACE HAS USED A MAXIMUM OF 1940K BELOW THE LINE AND 650284K ABOVE THE LINE.
208 DATA CALC_DIST(KEEP=ZIP_CD VALID MINDIST);
209 SET VALID_BENEZ;
210 IF ZIP_CD NE '';
211 ARRAY PROVZIPS{&ZIPCNT} $ (&ZIPLIST);    /*PROVIDER ZIP CODE ARRAY*/
212 ARRAY PROVDIST{&ZIPCNT};                         /*DISTANCE ARRAY*/
213 IF VALID=1 THEN DO I=1 TO &ZIPCNT;
214 IF PROVZIPS(I) NE '' THEN PROVDIST(I)=
NOTE: THE QUOTED STRING CURRENTLY BEING PROCESSED HAS BECOME MORE THAN 262 CHARACTERS LONG. YOU MIGHT HAVE UNBALANCED QUOTATION
MARKS.
215 ZIPCITYDISTANCE(ZIP_CD,PROVZIPS(I));
216 END;
217 MINDIST=MIN(OF PROVDIST(*));
218 RUN;
Why do you need an array?
You have one dataset with patient ZIP codes and a second dataset with provider ZIP code.
Just join them.
proc sql ;
create table patzip as select ZIP_CD,count(distinct PATIENT) as n_patients from VALID_BENEZ ;
create table provzip as select distinct PROV_ZIP from VALID_PROVZ
create table want as
  select ZIP_CD  as pat_zip
       , PROV_ZIP 
       , n_patients
       , ZIPCITYDISTANCE(ZIP_CD,PROVZIPS) as distance
  from patzip, provzip  
  where calculated distance < 30
;
quit;If you want to summarize by provider zip code you can add an additional step.
create table summary as 
  select prov_zip,sum(n_patients) as n_patients_in_30miles
  from want;
  group by prov_zip
;You could then link this back to individual providers by their zip code.
Thanks, that did work although it took a long time and I also got this message "NOTE: THE EXECUTION OF THIS QUERY INVOLVES PERFORMING ONE OR MORE CARTESIAN PRODUCT JOINS THAT CAN NOT BE OPTIMIZED."
Do you have any suggested hash code, I'm wondering it might be more efficient.
It is doing a cartesian product on purpose since you want to check the distance for each pair of zip code.
Make sure to only keep the valid provider zip codes.
To improve performance you probably will want keep only the zip codes in the provider subset. Then you can remerge with the list of actual providers based on provider zip code once you have the patient counts for that zip code to see the results by provider.
Not sure hash will improve performance a lot. You might be able to do it in one step with hash and that might improve performance a little by eliminating some of the IO. But you still need to compare every possible pair of zipcodes.
You do not have an array problem, you have a macro variable problem.
This happens because you abuse the macro processor for something that it's not meant for: processing data.
Use a join or a hash object.
Understood. Do you have any suggested hash code? I've mainly used hash tables as a lookup in the past and this is a little different.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
