08-01-2017 06:57 PM
I have address filed like
12345 BROADWAY APT. B
12345 BROADWAY SUITE. B
12345 BROADWAY #B
I would like separte this filed into two variables, which are the address variable (12345 BROADWAY), and apt/suit number variable(APT. B, etc). I understand that I can use scan function to separte by '#' but how I can separte by word 'APT' or 'SUIT'? Any idea would be helpful!
var1=scan(addr, 1, '#'); var2=scan(addr, 2, '#');
08-01-2017 07:06 PM
Use a regular expression. Find a valid pattern here:
That is the simplest and most powerful way of handling it.Split the address into groups and then just assign the groups to the correct variable.
08-01-2017 07:32 PM
I agree with @AlanC.
Regular expressions are a must for the kind of work you are doing. The learning curve is very steep though.
Here is stg to get you started:
data HAVE; ADD=' 2345 BROADWAY APT. B '; output; ADD='12345 BROADWAY SUITE. B '; output; ADD='12345 BROADWAY #B '; output; run; data WANT; set HAVE; ADD1=prxchange('s/(.*?)(APT|SUITE|#)\.?\s*(\w+)\s+/$1/i',1,ADD); ADD2=prxchange('s/(.*?)(APT|SUITE|#)\.?\s*(\w+)\s+/$3/i',1,ADD); run;
|2345 BROADWAY APT. B||2345 BROADWAY||B|
|12345 BROADWAY SUITE. B||12345 BROADWAY||B|
|12345 BROADWAY #B||12345 BROADWAY||B|
Explanation of the expression:
s/ we are going to substitute (rather than just match)
(.*?) first group. we match any characters (? means lazily: we limit the extend of the match as we don't want to prevent other matches)
(APT|SUITE|#) second group: match any of these strings
\.? optionally match a dot
\s* optionally match any number of spaces
(\w+) third group: match at least one word character (that's the unit number)
\s+ match remaining spaces
/$1 replace all we've matched with the first group (i.e. the address proper)
/i make the logic case insensitive
I know looks daunting, but once you get there, you will be rewarded!