DATA Step, Macro, Functions and more

Working with Addresses

Reply
Contributor
Posts: 32

Working with Addresses

My data set has the street number, prefix, street name, and suffix all together in a single variable. I am trying to separate the street number from the rest of the address. (For example, I want 3991 3rd ST to read 3991 in one variable and 3rd ST in the next variable.) I know I have to do some sort of formatting language at the first space in the column but am not sure how to do that. 

Super User
Posts: 24,026

Re: Working with Addresses

Posted in reply to andrewfau

You can start with SCAN() but I'd strongly consider doing some searching on LexJansen for some developed code and working off that. This is a non-trivial problem (sadly) and usually PRX is the best approach and the papers will cover topics you didn't mention such as Street vs St vs St. and other things. 

Super User
Posts: 8,220

Re: Working with Addresses

Posted in reply to andrewfau

Yes, as @Reeza said, it can get quite complex. However, if you're only wanting to separate the number from the rest, you could use something like:

data have;
  informat address $80.;
  input address &;
  cards;
3991 3rd ST
;

data want (drop=_:);
  set have;
  call scan(address,1,_pos,_len);
  street_number=input(substr(address,_pos,_len),8.);
  street=left(substr(address,_len+1));
run;

Art, CEO, AnalystFinder.com

 

Contributor
Posts: 32

Re: Working with Addresses

I used your code and I was able to separate the street number for the first address in my list. But how would I do it for the entire variable? I have about 30,000 observations.

data palmbeachsales16;
informat own_addr1 $80.;
input own_addr1 &;
cards;
50257 CORKSCREW BLVD
;

data palmbeachaddr16 (drop=_:);
set palmbeachsales16;
call scan(own_addr1,1,_pos,_len);
street_number=input(substr(own_addr1,_pos,_len),8.);
street=left(substr(own_addr1,_len+1));
run;
Super User
Posts: 24,026

Re: Working with Addresses

Posted in reply to andrewfau
data palmbeachaddr16 (drop=_:); 
set palmbeachsales16; <- change this to be YOUR data set name;

 

You're using the demo data rather than your own. Change the SET statement to refer to your data set. 

Contributor
Posts: 32

Re: Working with Addresses

Like this? But how do I generalize it to all the addresses in the set?

data palmbeachaddr16 (drop=_:);
set palmbeachsales16;
call scan(own_addr1,1,_pos,_len);
street_number=input(substr(own_addr1,_pos,_len),8.);
street=left(substr(own_addr1,_len+1));
run;
Super User
Posts: 24,026

Re: Working with Addresses

Posted in reply to andrewfau

andrewfau wrote:

Like this? But how do I generalize it to all the addresses in the set?


What do you mean by that? Once you refer to your own address it will extract all the numbers. What are you expecting and what are you getting?

Contributor
Posts: 32

Re: Working with Addresses

The example code had the example address in it. I have a dataset with about 30,000 addresses and I am trying to separate the streetnumber from the street name in each to make them separate variables. Is there a way to apply that code to all the addresses in that variable rather than just one?

Super User
Posts: 24,026

Re: Working with Addresses

Posted in reply to andrewfau

Did you run the new code which refers to your data set and your address? It will do that automatically, that's how SAS works by default, it processes all rows within the data step. 

If it's not working, post your code, log and explain what's not working in detail. Not working isn't informative.

 

 

Contributor
Posts: 32

Re: Working with Addresses

I figured it out. I was making the simple mistake of not bringing in my whole dataset. Sorry. I'm still new at this. It is working. Thank you.

Frequent Contributor
Posts: 107

Re: Working with Addresses

[ Edited ]
Posted in reply to andrewfau

You could try regular expressions

 

data foo;
	infile datalines delimiter=',';
	informat raw $varying1024.;
	input raw $;
	datalines;
3991 3rd ST
;
run;


proc sql;
	select
		prxchange('s/(\d+)(?: )(\w+)/$1/', -1, raw) as house_number
		,prxchange('s/(\d+)(?: )(\w+)/$2/', -1, raw) as street
	from foo;
quit;
Ask a Question
Discussion stats
  • 10 replies
  • 131 views
  • 0 likes
  • 4 in conversation