Hi SAS Community,
Like others before me, I am faced with cleaning addresses without the likes of SAS Data Quality. I've already applied prxparse to my dataset to the extent I could and encountered a few additional situations that I'd like to clean but unsure how to go about this (if at all). Fortunate for my case (I suppose?), I have historical addresses or several years of data. I wanted to see if I could apply 2 more steps to my data but unsure about how to go about this or if this would open me up to additional problems that I'm just not seeing:
1. I want to harmonize my data across years; for example, I have the same address listed differently (example below)
They share the same city,state,zip code
They appear to share the same address too but stated differently
2. If they share the same street number from year to year, I can assume it's the same address. From those I've encountered, it's the case (example below)
Data NewData;
INFILE DATALINES DSD;
input id $ name $ addy ~$25. city $ state $ zip $ cleaned_addy ~$25. year $;
DATALINES;
01,ABCSTORE,123 MAIN HIGHWAY 75,MOBILE,AL,36619,123 MAIN HIGHWAY,2011
01,ABCSTORE,123 MAIN HIGHWAY 75,MOBILE,AL,36619,123 MAIN HIGHWAY,2012
01,ABCSTORE,123 MAIN STREET,MOBILE,AL,36619,123 MAIN STREET,2013
01,ABCSTORE,123 MAIN ST,MOBILE,AL,36619,123 MAIN STREET,2014
01,ABCSTORE,123 MAIN,MOBILE,AL,36619,123 MAIN,2015
;
run;
In the example below, since they start with the same street number for the same id, they are the same. Similarly, the address is harmonized across year accounting for differences in how the address is stated.
Data WANT;
INFILE DATALINES DSD;
input id $ name $ addy ~$25. city $ state $ zip $ cleaned_addy ~$25. year $;
DATALINES;
01,ABCSTORE,123 MAIN HIGHWAY 75,MOBILE,AL,36619,123 MAIN STREET,2011
01,ABCSTORE,123 MAIN HIGHWAY 75,MOBILE,AL,36619,123 MAIN STREET,2012
01,ABCSTORE,123 MAIN STREET,MOBILE,AL,36619,123 MAIN STREET,2013
01,ABCSTORE,123 MAIN ST,MOBILE,AL,36619,123 MAIN STREET,2014
01,ABCSTORE,123 MAIN,MOBILE,AL,36619,123 MAIN STREET,2015
;
run;
Any guidance on steps I can apply to achieve this or potential issues would be much appreciated!
... View more