BookmarkSubscribeRSS Feed
chenyiwen1717
Fluorite | Level 6

Hi,

I have address filed like

 

12345  BROADWAY  APT. B

or 

12345  BROADWAY  SUITE. B

or

12345  BROADWAY  #B

 

I would like separte this filed into two variables, which are the address variable (12345  BROADWAY), and apt/suit number variable(APT. B, etc). I understand that I can use scan function to separte by '#' but how I can separte by word 'APT' or 'SUIT'? Any idea would be helpful!

 

var1=scan(addr, 1, '#');
var2=scan(addr, 2, '#');
3 REPLIES 3
AlanC
Barite | Level 11

Use a regular expression. Find a valid pattern here:

 

http://www.regexlib.com/Search.aspx?k=street+address&c=-1&m=-1&ps=20

 

That is the simplest and most powerful way of handling it.Split the address into groups and then just assign the groups to the correct variable.

 

 

https://github.com/savian-net
ChrisNZ
Tourmaline | Level 20

I agree with @AlanC.

Regular expressions are a must for the kind of work you are doing. The learning curve is very steep though.

Here is stg to get you started:

data HAVE;
  ADD=' 2345  BROADWAY  APT. B   ';  output;
  ADD='12345  BROADWAY  SUITE. B ';  output;
  ADD='12345  BROADWAY  #B       ';  output;
run;
data WANT;
  set HAVE;
  ADD1=prxchange('s/(.*?)(APT|SUITE|#)\.?\s*(\w+)\s+/$1/i',1,ADD);
  ADD2=prxchange('s/(.*?)(APT|SUITE|#)\.?\s*(\w+)\s+/$3/i',1,ADD);
run;

 

ADD ADD1 ADD2
2345 BROADWAY APT. B 2345 BROADWAY B
12345 BROADWAY SUITE. B 12345 BROADWAY B
12345 BROADWAY #B 12345 BROADWAY B

Explanation of the expression:

s/       we are going to substitute (rather than just match)

(.*?)   first group. we match any characters (? means lazily: we limit the extend of the match as we don't want to prevent other matches)

(APT|SUITE|#)  second group: match any of these strings

\.?    optionally match a dot

\s*   optionally match any number of spaces

(\w+)  third group: match at least one word character (that's the unit number)

\s+     match remaining spaces

/$1     replace all we've matched with the first group (i.e. the address proper)

/i    make the logic case insensitive

 

I know looks daunting, but once you get there, you will be rewarded!

AlanC
Barite | Level 11

The best way, IMO, to learn regex is to use RegexBuddy.

 

https://www.regexbuddy.com/

 

Worth every penny and then some.

https://github.com/savian-net

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 2867 views
  • 1 like
  • 3 in conversation