I have addresses in one field. I want to break it apart as door number, street, city and zip. The addresses are not uniform; some have just spaces between the various parts and some have commas. Can somebody give me an idea of how to go about it efficiently?
Depending on how many addresses you have, their variability and the countries involved this could be a complex and long exercise. We use SAS Data Quality (also known as DataFlux) to do exactly this. It contains a parser to evaluate every address and split it into its component parts - no coding is required. It also caters for country-specific variations and you can fine-tune the process by tweaking the product's Quality Knowledge Base. The software also standardizes the address layouts and you can create address match keys to join to other address sources. You can even use it for geocoding the location of the addresses.
Personally I wouldn't even attempt this since I have access to an automated, superior alternative but I hope this demonstrates that a DIY approach will take a lot of work.
Like @SASKiwi explained, it's a tedious task to work through. There is a previous thread that addresses this and even made the same suggestion as @SASKiwi: https://communities.sas.com/t5/SAS-Procedures/Address-cleaning/td-p/521206. The article references a paper found here that may be useful: https://analytics.ncsu.edu/sesug/2008/CC-028.pdf
strip and prxchange functions are going to be your friend.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.