BookmarkSubscribeRSS Feed
andrewfau
Fluorite | Level 6

My data set has the street number, prefix, street name, and suffix all together in a single variable. I am trying to separate the street number from the rest of the address. (For example, I want 3991 3rd ST to read 3991 in one variable and 3rd ST in the next variable.) I know I have to do some sort of formatting language at the first space in the column but am not sure how to do that. 

10 REPLIES 10
Reeza
Super User

You can start with SCAN() but I'd strongly consider doing some searching on LexJansen for some developed code and working off that. This is a non-trivial problem (sadly) and usually PRX is the best approach and the papers will cover topics you didn't mention such as Street vs St vs St. and other things. 

art297
Opal | Level 21

Yes, as @Reeza said, it can get quite complex. However, if you're only wanting to separate the number from the rest, you could use something like:

data have;
  informat address $80.;
  input address &;
  cards;
3991 3rd ST
;

data want (drop=_:);
  set have;
  call scan(address,1,_pos,_len);
  street_number=input(substr(address,_pos,_len),8.);
  street=left(substr(address,_len+1));
run;

Art, CEO, AnalystFinder.com

 

andrewfau
Fluorite | Level 6

I used your code and I was able to separate the street number for the first address in my list. But how would I do it for the entire variable? I have about 30,000 observations.

data palmbeachsales16;
informat own_addr1 $80.;
input own_addr1 &;
cards;
50257 CORKSCREW BLVD
;

data palmbeachaddr16 (drop=_:);
set palmbeachsales16;
call scan(own_addr1,1,_pos,_len);
street_number=input(substr(own_addr1,_pos,_len),8.);
street=left(substr(own_addr1,_len+1));
run;
Reeza
Super User
data palmbeachaddr16 (drop=_:); 
set palmbeachsales16; <- change this to be YOUR data set name;

 

You're using the demo data rather than your own. Change the SET statement to refer to your data set. 

andrewfau
Fluorite | Level 6

Like this? But how do I generalize it to all the addresses in the set?

data palmbeachaddr16 (drop=_:);
set palmbeachsales16;
call scan(own_addr1,1,_pos,_len);
street_number=input(substr(own_addr1,_pos,_len),8.);
street=left(substr(own_addr1,_len+1));
run;
Reeza
Super User

@andrewfau wrote:

Like this? But how do I generalize it to all the addresses in the set?


What do you mean by that? Once you refer to your own address it will extract all the numbers. What are you expecting and what are you getting?

andrewfau
Fluorite | Level 6

The example code had the example address in it. I have a dataset with about 30,000 addresses and I am trying to separate the streetnumber from the street name in each to make them separate variables. Is there a way to apply that code to all the addresses in that variable rather than just one?

Reeza
Super User

Did you run the new code which refers to your data set and your address? It will do that automatically, that's how SAS works by default, it processes all rows within the data step. 

If it's not working, post your code, log and explain what's not working in detail. Not working isn't informative.

 

 

andrewfau
Fluorite | Level 6

I figured it out. I was making the simple mistake of not bringing in my whole dataset. Sorry. I'm still new at this. It is working. Thank you.

tomcmacdonald
Quartz | Level 8

You could try regular expressions

 

data foo;
	infile datalines delimiter=',';
	informat raw $varying1024.;
	input raw $;
	datalines;
3991 3rd ST
;
run;


proc sql;
	select
		prxchange('s/(\d+)(?: )(\w+)/$1/', -1, raw) as house_number
		,prxchange('s/(\d+)(?: )(\w+)/$2/', -1, raw) as street
	from foo;
quit;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 3448 views
  • 0 likes
  • 4 in conversation