DATA Step, Macro, Functions and more

Splitting address by separating apt/suit number

Reply
Occasional Contributor
Posts: 7

Splitting address by separating apt/suit number

Hi,

I have address filed like

 

12345  BROADWAY  APT. B

or 

12345  BROADWAY  SUITE. B

or

12345  BROADWAY  #B

 

I would like separte this filed into two variables, which are the address variable (12345  BROADWAY), and apt/suit number variable(APT. B, etc). I understand that I can use scan function to separte by '#' but how I can separte by word 'APT' or 'SUIT'? Any idea would be helpful!

 

var1=scan(addr, 1, '#');
var2=scan(addr, 2, '#');
Contributor
Posts: 62

Re: Splitting address by separating apt/suit number

Use a regular expression. Find a valid pattern here:

 

http://www.regexlib.com/Search.aspx?k=street+address&c=-1&m=-1&ps=20

 

That is the simplest and most powerful way of handling it.Split the address into groups and then just assign the groups to the correct variable.

 

 

PROC Star
Posts: 1,570

Re: Splitting address by separating apt/suit number

I agree with @AlanC.

Regular expressions are a must for the kind of work you are doing. The learning curve is very steep though.

Here is stg to get you started:

data HAVE;
  ADD=' 2345  BROADWAY  APT. B   ';  output;
  ADD='12345  BROADWAY  SUITE. B ';  output;
  ADD='12345  BROADWAY  #B       ';  output;
run;
data WANT;
  set HAVE;
  ADD1=prxchange('s/(.*?)(APT|SUITE|#)\.?\s*(\w+)\s+/$1/i',1,ADD);
  ADD2=prxchange('s/(.*?)(APT|SUITE|#)\.?\s*(\w+)\s+/$3/i',1,ADD);
run;

 

ADD ADD1 ADD2
2345 BROADWAY APT. B 2345 BROADWAY B
12345 BROADWAY SUITE. B 12345 BROADWAY B
12345 BROADWAY #B 12345 BROADWAY B

Explanation of the expression:

s/       we are going to substitute (rather than just match)

(.*?)   first group. we match any characters (? means lazily: we limit the extend of the match as we don't want to prevent other matches)

(APT|SUITE|#)  second group: match any of these strings

\.?    optionally match a dot

\s*   optionally match any number of spaces

(\w+)  third group: match at least one word character (that's the unit number)

\s+     match remaining spaces

/$1     replace all we've matched with the first group (i.e. the address proper)

/i    make the logic case insensitive

 

I know looks daunting, but once you get there, you will be rewarded!

Contributor
Posts: 62

Re: Splitting address by separating apt/suit number

The best way, IMO, to learn regex is to use RegexBuddy.

 

https://www.regexbuddy.com/

 

Worth every penny and then some.

Ask a Question
Discussion stats
  • 3 replies
  • 123 views
  • 1 like
  • 3 in conversation