My data set has the street number, prefix, street name, and suffix all together in a single variable. I am trying to separate the street number from the rest of the address. (For example, I want 3991 3rd ST to read 3991 in one variable and 3rd ST in the next variable.) I know I have to do some sort of formatting language at the first space in the column but am not sure how to do that.
You can start with SCAN() but I'd strongly consider doing some searching on LexJansen for some developed code and working off that. This is a non-trivial problem (sadly) and usually PRX is the best approach and the papers will cover topics you didn't mention such as Street vs St vs St. and other things.
Yes, as @Reeza said, it can get quite complex. However, if you're only wanting to separate the number from the rest, you could use something like:
data have; informat address $80.; input address &; cards; 3991 3rd ST ; data want (drop=_:); set have; call scan(address,1,_pos,_len); street_number=input(substr(address,_pos,_len),8.); street=left(substr(address,_len+1)); run;
Art, CEO, AnalystFinder.com
I used your code and I was able to separate the street number for the first address in my list. But how would I do it for the entire variable? I have about 30,000 observations.
data palmbeachsales16;
informat own_addr1 $80.;
input own_addr1 &;
cards;
50257 CORKSCREW BLVD
;
data palmbeachaddr16 (drop=_:);
set palmbeachsales16;
call scan(own_addr1,1,_pos,_len);
street_number=input(substr(own_addr1,_pos,_len),8.);
street=left(substr(own_addr1,_len+1));
run;
data palmbeachaddr16 (drop=_:);
set palmbeachsales16; <- change this to be YOUR data set name;
You're using the demo data rather than your own. Change the SET statement to refer to your data set.
Like this? But how do I generalize it to all the addresses in the set?
data palmbeachaddr16 (drop=_:);
set palmbeachsales16;
call scan(own_addr1,1,_pos,_len);
street_number=input(substr(own_addr1,_pos,_len),8.);
street=left(substr(own_addr1,_len+1));
run;
@andrewfau wrote:
Like this? But how do I generalize it to all the addresses in the set?
What do you mean by that? Once you refer to your own address it will extract all the numbers. What are you expecting and what are you getting?
The example code had the example address in it. I have a dataset with about 30,000 addresses and I am trying to separate the streetnumber from the street name in each to make them separate variables. Is there a way to apply that code to all the addresses in that variable rather than just one?
Did you run the new code which refers to your data set and your address? It will do that automatically, that's how SAS works by default, it processes all rows within the data step.
If it's not working, post your code, log and explain what's not working in detail. Not working isn't informative.
I figured it out. I was making the simple mistake of not bringing in my whole dataset. Sorry. I'm still new at this. It is working. Thank you.
You could try regular expressions
data foo;
infile datalines delimiter=',';
informat raw $varying1024.;
input raw $;
datalines;
3991 3rd ST
;
run;
proc sql;
select
prxchange('s/(\d+)(?: )(\w+)/$1/', -1, raw) as house_number
,prxchange('s/(\d+)(?: )(\w+)/$2/', -1, raw) as street
from foo;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.