BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
chrishull
Obsidian | Level 7

Hi All,

 

I'm looking for help matching postal codes. I have a list of postal codes in one table and a list of postal codes and latitude and longitude in another. I need to match the postal codes by either all 6 or the first instance where 5 (or 4 or 3) characters match. 

 

example:

list of codes
L1G3H8
L1G3J0
L1G3J1
L1G3J2
L1G3J3
list of codes with lat and long
L1G3J1	43.920899	-78.874794
L1G3J2	43.920732	-78.874717
L1G3J3	43.920354	-78.876291

desired result

L1G3H8	43.920899	-78.874794
L1G3J0	43.920899	-78.874794
L1G3J1	43.920899	-78.874794
L1G3J2	43.920732	-78.874717
L1G3J3	43.920354	-78.876291

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
Create copies of your Postal codes with 3/4 chars each and for those take the average of the latitude/longitude.

So L1G3H8 becomes L1G3H* and the latitude and longitude are the average of all the 9 postal codes in that list.
Repeat for 4 characters.
For 3 characters, that is a Forward Sortation Area (FSA, assuming Canada) and use the FSA centroid list or do the same as above as well.

data codes345;
set fullCodes;
output; * 6 characters;
PC = substr(PC, 1, 5);
output; *5 characters;
PC = substr(PC, 1, 4);
output; *4 characters;
PC = substr(PC, 1, 3);
output; *3 characters;
run;

proc means data=codes345 noprint mean NWAY;
class PC;
var latitude longitude;
ods output summary=PC_Lookup;
run;

Now use the PC_LOOKUP table to merge with your data.

View solution in original post

3 REPLIES 3
mkeintz
PROC Star

What do you mean by


the first instance where 5 (or 4 or 3) characters match. 

Why "first" match?  Shouldn't all matching 5 character codes have an equal chance of being randomly selected?

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
SASKiwi
PROC Star

Start with an exact match on all 6 characters, then with the remaining rows a 5 character match. The problem is that with any matches with less than 6 characters you are not necessarily going to get a unique match. What post code will you choose when there are multiple choices? The first or last sorted value or some other strategy?

Reeza
Super User
Create copies of your Postal codes with 3/4 chars each and for those take the average of the latitude/longitude.

So L1G3H8 becomes L1G3H* and the latitude and longitude are the average of all the 9 postal codes in that list.
Repeat for 4 characters.
For 3 characters, that is a Forward Sortation Area (FSA, assuming Canada) and use the FSA centroid list or do the same as above as well.

data codes345;
set fullCodes;
output; * 6 characters;
PC = substr(PC, 1, 5);
output; *5 characters;
PC = substr(PC, 1, 4);
output; *4 characters;
PC = substr(PC, 1, 3);
output; *3 characters;
run;

proc means data=codes345 noprint mean NWAY;
class PC;
var latitude longitude;
ods output summary=PC_Lookup;
run;

Now use the PC_LOOKUP table to merge with your data.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1045 views
  • 0 likes
  • 4 in conversation