BookmarkSubscribeRSS Feed
alagiejatta
Calcite | Level 5

I have an address column in my dataset that I wanted to parse to put city and zip code in their separate columns. Positions differ in many of the addresses. Examples of address types in the data are:

 

1. Address: 321 E 12th St Des Moines Iowa, 50321phone Number: (515) 515-5151

2. Address: 321 Iowa st Boone Iowa, 50321Phone Number: (515) 515-5151

3. Address: 321 Des Moines st Suite 140 Iowa City Iowa, 50321Phone Number: (515) 515-5151

 

Thanks

10 REPLIES 10
PaigeMiller
Diamond | Level 26

It's nearly impossible for us to create a set of rules based on your three examples that will likely work on a wide variety of real-life examples.

 

Could you state the rules to find city and Zip Code that you think will work given the real data that you have?

--
Paige Miller
alagiejatta
Calcite | Level 5

I do not really have any rules in mind. And these examples are real data.

 

 

PaigeMiller
Diamond | Level 26

Yes, but you can come up with rules by looking at the real data (which we can't do) and then tell us what those rules are that you think will work, and then we can help you program them. 

--
Paige Miller
alagiejatta
Calcite | Level 5

So all I really needed from this address column is the City name and Zip code. 

1. The city name can be one or two words, followed by State name and then a comma ","

2. Zip codes are only 5 numbers

3. Some address are missing everything except the state name

4. There are addresses from other states, but only interested in those that has Iowa as the state

alagiejatta
Calcite | Level 5

Also, If zip codes is not blank, it is right after a comma

AMSAS
SAS Super FREQ

@alagiejatta 
Parsing addresses is difficult, there's a good reply in a thread on Stack Overflow:
https://stackoverflow.com/questions/11160192/how-to-parse-freeform-street-postal-address-out-of-text...

At the end of the day, you will need to come up with some rules for your situation/data.

As @PaigeMiller states, you need to review your data and come up with rules that work for your data.

Looking at your 3 sample lines, you can strip out the phone number, and probably get the zip, as long as nobody has long zip format (50321-1234). Getting the state/city that's going to be problematic as per the thread linked above.

alagiejatta
Calcite | Level 5

Sorry. City can actually be up to 3 words.

Thanks

PaigeMiller
Diamond | Level 26

Is Iowa always spelled out, all four letters, or is it sometimes abbreviated as IA?


Are there every 9 digit zip codes?

 

Are there separators between address and city name other than a blank? 

 

Again, there are lots of possibilities and you have addressed a few, but I'm sure there are many complications that you haven't addressed yet.

--
Paige Miller
alagiejatta
Calcite | Level 5

Iowa is always spelled out in full (including other states)

 

There are no 9 digit zip. All are 5 digits

 

There are only blank separators between address and city name and between most of the elements. The only comma separator is after the state name. Column (:) separators are after the text " Address" and after the text "Phone Number"

 

Thank you

PaigeMiller
Diamond | Level 26

I hope you get some idea of how difficult this is in general (and I hope you read the link from @AMSAS )

 

In your first record, there seems to be no way to decide if the city name is "Moines" or "Des Moines" or "St Des Moines".

 

Some people use a list of place names to help them decide, especially if you have a zip code. That isn't specifically pulling apart the text string, as extra information is used. I don't have skills with such a procedure, but I'm sure someone here does.

--
Paige Miller

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 2113 views
  • 0 likes
  • 3 in conversation