Re: Working with Addresses

andrewfau · Posted 01-29-2018 04:04 PM

My data set has the street number, prefix, street name, and suffix all together in a single variable. I am trying to separate the street number from the rest of the address. (For example, I want 3991 3rd ST to read 3991 in one variable and 3rd ST in the next variable.) I know I have to do some sort of formatting language at the first space in the column but am not sure how to do that.

Reeza · Posted 01-29-2018 04:25 PM

You can start with SCAN() but I'd strongly consider doing some searching on LexJansen for some developed code and working off that. This is a non-trivial problem (sadly) and usually PRX is the best approach and the papers will cover topics you didn't mention such as Street vs St vs St. and other things.

art297 · Posted 01-29-2018 04:29 PM

Yes, as @Reeza said, it can get quite complex. However, if you're only wanting to separate the number from the rest, you could use something like:

data have;
  informat address $80.;
  input address &;
  cards;
3991 3rd ST
;

data want (drop=_:);
  set have;
  call scan(address,1,_pos,_len);
  street_number=input(substr(address,_pos,_len),8.);
  street=left(substr(address,_len+1));
run;

Art, CEO, AnalystFinder.com

andrewfau · Posted 01-30-2018 11:08 AM

I used your code and I was able to separate the street number for the first address in my list. But how would I do it for the entire variable? I have about 30,000 observations.

data palmbeachsales16;
informat own_addr1 $80.;
input own_addr1 &;
cards;
50257 CORKSCREW BLVD
;

data palmbeachaddr16 (drop=_:);
set palmbeachsales16;
call scan(own_addr1,1,_pos,_len);
street_number=input(substr(own_addr1,_pos,_len),8.);
street=left(substr(own_addr1,_len+1));
run;

Reeza · Posted 01-30-2018 11:12 AM

data palmbeachaddr16 (drop=_:); 
set palmbeachsales16; <- change this to be YOUR data set name;

You're using the demo data rather than your own. Change the SET statement to refer to your data set.

andrewfau · Posted 01-30-2018 11:40 AM

Like this? But how do I generalize it to all the addresses in the set?

data palmbeachaddr16 (drop=_:);
set palmbeachsales16;
call scan(own_addr1,1,_pos,_len);
street_number=input(substr(own_addr1,_pos,_len),8.);
street=left(substr(own_addr1,_len+1));
run;

Reeza · Posted 01-30-2018 11:42 AM

@andrewfau wrote:

Like this? But how do I generalize it to all the addresses in the set?

What do you mean by that? Once you refer to your own address it will extract all the numbers. What are you expecting and what are you getting?

andrewfau · Posted 01-30-2018 12:00 PM

The example code had the example address in it. I have a dataset with about 30,000 addresses and I am trying to separate the streetnumber from the street name in each to make them separate variables. Is there a way to apply that code to all the addresses in that variable rather than just one?

Reeza · Posted 01-30-2018 12:03 PM

Did you run the new code which refers to your data set and your address? It will do that automatically, that's how SAS works by default, it processes all rows within the data step.

If it's not working, post your code, log and explain what's not working in detail. Not working isn't informative.

andrewfau · Posted 01-30-2018 12:29 PM

I figured it out. I was making the simple mistake of not bringing in my whole dataset. Sorry. I'm still new at this. It is working. Thank you.

tomcmacdonald · Posted 01-30-2018 01:15 PM

You could try regular expressions

data foo;
	infile datalines delimiter=',';
	informat raw $varying1024.;
	input raw $;
	datalines;
3991 3rd ST
;
run;


proc sql;
	select
		prxchange('s/(\d+)(?: )(\w+)/$1/', -1, raw) as house_number
		,prxchange('s/(\d+)(?: )(\w+)/$2/', -1, raw) as street
	from foo;
quit;

Registration is open

SAS Training: Just a Click Away