I'm wondering if there is any text mining technique that is used to mtach the address fields based on phyid.
Phyid adr1 phyadr1
010011829 501 MED CENT DR 501 MEDICAL CENTER DR BOX 30114
010011829 501 MED CENTER DR STE:300 501 MEDICAL CENTER DR BOX 30114
adr1 and phyadr1 are the same.but this is how the data is provided.
how to match adr1 and phyadr1 in such cases?
Does your site license the SAS Data Quality Server? This gives you procedures and data step functions that allow fuzzy matching for text data such as names and addresses.
If you don't have a license for the data quality tools, you would need a way to parse the addresses into components (house number, street name, box number, suite number, etc.), standardize these elements, and then order them to create appropriate groups that represent the same location.