Hi All.
I have a master table with a column string as shown below :
STRING |
49 Laurel Court Mount Vernon NY 10550 |
9925 W. Greystone Lane Hamdn, CT 06514,HMn |
83 South Aldeood Street Middletown, CT 06457 |
518 Goldfield Drive Bakersfield, CA 93306 |
8991 Fairground Lane Nutley, NJ 07110 |
63 Bay Court Elkridge, MD 21075 |
83 South Alderwood Street Midletown, CT 06457 Bakersfield |
8991 Fairground Lane Nutley, NJ 07110 W. Graystoe Lane |
Now There is another table which contain lookup values .
I need to use the lookup values to find the probable matches in the string column of master table.
LOOKUP VALUES |
Hamden |
Middletown |
Nutley |
W. Graystone Lane |
South Alderwood Street |
So , I need my final table as below
STRING | Probable1 | Probable2 |
49 Laurel Court Mount Vernon NY 10550 | ||
9925 W. Greystone Lane Hamdn, CT 06514,HMn | HAMDEN | |
83 South Aldeood Street Middletown, CT 06457 | Middletown | |
518 Goldfield Drive Bakersfield, CA 93306 | ||
8991 Fairground Lane Nutley, NJ 07110 | Nutley | |
63 Bay Court Elkridge, MD 21075 | ||
83 South Alderwood Street Midletown, CT 06457 Bakersfield | Middletown | South Alderwood Street |
8991 Fairground Lane Nutley, NJ 07110 W. Graystoe Lane | Nutley | W. Graystone Lane |
So we have hamden in lookup table available and the most near match can be hamdn in 2nd record so I poulate
Hamden in Probable1 column for the record.
Like wise in the last record I can see Nutley as well as W. Graystone Lane having aprobable match in it
so I need to populate Nutley in probable1 and W. Graystone Lane(not exact match) in probable2 column.
I can do if there is a direct match in the string column but in case of variations things do not workout as expected.
So basically I also need to pull the lookup value if a variation of that is available in string column .
Any help on above will be highly appreciated.
Regards,
There are two aspects to this
1. how do you find similarity of two strings (spedis, soundex functions etc. are some viable options ). Prx (perl regular expressions ) but they do not do a fuzzy match and very memory intensive.
2. Get to the final layout.
a. use a hash table and iterate over all the elements you want to find.
b. create a cartesian join and then do a transpose
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.