Hi everyone,
I am having some troubles with cleaning these addresses. at first it was easy with some if statement. But since my data is getting bigger, it is not efficient to use if statement anymore. I would love to hear your thoughts about how to clean this field.
I have:
Data have;
input address $200;
datalines;
100 ABC ST RM S02A
102 ABC STREET FLOOR 1
103 ABC ST APT 3 HOMELESS
1035 CD AVENUE FLOOR 2
108 SOMETHING ST # 2FL
115 VISA VISTA DR APT 212 APT 212
1155 LOOK AVENUE APT 205
12 BORED AVE APT 2
1214 TIRED STREET APT 428
127 HAPPY STREET FLOOR 2
1397 SOMEWHERE STREET FIRST FLOOR
142 SOMETHING ST APT 3
200 RAINBOW AVE UNIT 202
;
I don't want any Unit or floor or apt number in the clean address. So I want the address field that would look like this:
100 ABC ST
102 ABC STREET
103 ABC ST
1035 CD AVENUE
108 SOMETHING ST
115 VISA VISTA DR
1155 LOOK AVENUE
12 BORED AVE
1214 TIRED STREET
127 HAPPY STREET
1397 SOMEWHERE STREET
142 SOMETHING ST
200 RAINBOW AVE
Thank you so much!
What is your actual use case? If your address cleaning needs to scale up to enterprise-wide customer volumes and techniques then you would be better off using a tool specific to this task like SAS Data Quality. On the other hand, cleaning a few hundred addresses with a few transformation rules like the ones in your post, you are probably better off persevering with your current approach.
Ah Thank you. I thought SAS would have function for this that I don't know 😄
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.