The solution (Shortest Distance Match) that was provided earlier this week worked fine, however; I have been asked to modify the criteria as below. I tried modifying the data step but unable to get the expected output. I can take dividual Miles_Rank one at a time in a separate data step but in our actual data file, we have 10 unique Miles_Ranks (1 to10) for each ZIP. I believe, there must be a way to have the Miles Ranks logic build in one or two data steps. I am attaching revised raw data (have1 and have2) and also included expected output sheet. Sorry, but I am posting this again. Once again thank you so much!
1. Match on zip, city and state (have1 and have2)
2. Want1 all records from have1 with ClientNo where there is a match based on the shortest distance (Miles_Rank)
3. Also, Want2 a list of records from have1 where there is no match
your states and zip data appear be corrupted
have1
ZIP | ClientNo | City | State |
14201 | A95572 | Cedar Grove | TA |
14204 | A91374 | Cedar Grove | NH |
14206 | A71649 | Bristol | KA |
14207 | A63397 | Buffalo | MI |
14208 | A06957 | Cedar Grove | NJ |
14209 | A58062 | Cedar Grove | ID |
14210 | A43865 | Bristol | IL |
14211 | A06391 | Bristol | HA |
14212 | A54498 | Cedar Grove | NC |
14213 | A09533 | Buffalo | UT |
14213 | A09534 | Buffalo | UT |
14214 | A31165 | Buffalo | DC |
14215 | A52498 | Buffalo | NY |
14216 | A77384 | Buffalo | CL |
14220 | A73552 | Buffalo | OH |
14220 | A73553 | Buffalo | OH |
14220 | A73554 | Buffalo | OH |
14222 | A66949 | Bristol | SC |
14222 | A66950 | Bristol | SC |
14222 | A66951 | Bristol | SC |
14222 | A66952 | Bristol | SC |
14222 | A66953 | Bristol | SC |
14222 | A66954 | Bristol | SC |
14222 | A66955 | Bristol | SC |
have2
ZIP | Miles_Rank | Distance_Miles | City | State |
14201 | 1 | 1.1 | Cedar Grove | TA |
14201 | 2 | 5.3 | Sloan | NY |
14201 | 3 | 9.3 | Amherst | CL |
14201 | 4 | 15.5 | Buffalo | MI |
14201 | 5 | 30.1 | Mo Town | NJ |
14204 | 1 | 0.3 | Cedar Grove | HA |
14204 | 1 | 0.3 | Blasdell | CL |
14204 | 2 | 1.5 | Cedar Grove | HA |
14204 | 4 | 12 | Blasdell | TA |
14204 | 5 | 15 | Cedar Grove | HA |
14211 | 1 | 1.1 | Buffalo | MI |
14211 | 2 | 5.3 | Mo Town | NJ |
14211 | 3 | 9.3 | Bristol | HA |
14211 | 4 | 15.5 | Amherst | CL |
14211 | 5 | 30.1 | Cedar Grove | HA |
14213 | 1 | 0.4 | Buffalo | MI |
14213 | 2 | 1.5 | Buffalo | UT |
14213 | 3 | 8 | Blasdell | CL |
14213 | 4 | 12 | Buffalo | NJ |
14213 | 5 | 15 | Blasdell | TA |
14214 | 1 | 0.2 | Buffalo | DC |
14214 | 2 | 0.4 | Buffalo | DC |
14214 | 3 | 1.5 | Buffalo | DC |
14214 | 4 | 8 | Buffalo | DC |
14214 | 5 | 15 | Buffalo | DC |
14215 | 1 | 1.2 | Sloan | CO |
14215 | 2 | 2 | Amherst | AL |
14215 | 3 | 2.5 | Buffalo | NJ |
14215 | 4 | 6 | Buffalo | NY |
14215 | 5 | 10 | Kenmore | DE |
14216 | 1 | 2 | Cheektowaga | AR |
14216 | 2 | 7.2 | Great Hill | GE |
14216 | 3 | 8.8 | Buffalo | NJ |
14216 | 4 | 14.2 | Fort Erie | XX |
14216 | 5 | 21.6 | Buffalo | CL |
14220 | 1 | 3 | Buffalo | WY |
14220 | 2 | 4.6 | Milton | NE |
14220 | 3 | 8 | Milton | WA |
14220 | 4 | 14.9 | Buffalo | OH |
14220 | 5 | 26.2 | Milton | TE |
14222 | 1 | 3 | Bristol | SC |
14222 | 2 | 4.6 | Bristol | SC |
14222 | 3 | 8 | Bristol | SC |
My remark is two fold, that, for one, the data does not conform with USPS State Abbreviations. Secondly, The zip codes should map many zips to one State, which is not true for your data. I'm afraid, Code developed on this example data may be misleadingly incorrect when applied to validated address data.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.