DATA Step, Macro, Functions and more

extracting an address from text string

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 9
Accepted Solution

extracting an address from text string

Hello I have an text string that has the address with city, state and zip code.  I can extract the state, zip code, city from the end of the string, but I can not figure out how to get the address from the beginning of the string.  I can not get the tranwrd function to work using a concatenation of the string from what I have extracted from the string so far.  Is there an elegant way to extract the first part of the string that contains the address.   

here is a sample of what the string looks like:

address_string

46 Siwash Creek Rd Tonasket Wa  98855

Po Box 124 Ford Wa  99013

Po Box 292 Ford Wa  99013

Po Box 124 Welpinit Wa  99040

Po Box 47 Wellpinit Wa  99040

Po Box 267 Wellpinit Wa  99040

Po Box 276 Wellpinit Wa  99040

5443 Ford Wellpinit Rd Wellpinit Wa  99040

Po Box 188 Wellpinit Wa  99040

Po Box 416 Wellpinit Wa  99040

5962a Martha Boardman Rd Wellpinit Wa  99040

Po Box 35 Wellpinit Wa  99040

2915 Number One Canyon Rd Wenatchee Wa  98801-2462

785 Skidmore Rd Colville Wa  99114


Accepted Solutions
Solution
‎10-15-2013 07:38 PM
PROC Star
Posts: 7,363

Re: extracting an address from text string

You might be able to use something like:

data have;

  format string $char70.;

  input;

  string=_infile_;

  cards;

46 Siwash Creek Rd Tonasket Wa  98855

Po Box 124 Ford Wa  99013

Po Box 292 Ford Wa  99013

Po Box 124 Welpinit Wa  99040

Po Box 47 Wellpinit Wa  99040

Po Box 267 Wellpinit Wa  99040

Po Box 276 Wellpinit Wa  99040

5443 Ford Wellpinit Rd Wellpinit Wa  99040

Po Box 188 Wellpinit Wa  99040

Po Box 416 Wellpinit Wa  99040

5962a Martha Boardman Rd Wellpinit Wa  99040

Po Box 35 Wellpinit Wa  99040

2915 Number One Canyon Rd Wenatchee Wa  98801-2462

785 Skidmore Rd Colville Wa  99114

;

data want (keep=address zip state);

  set have;

  zip=scan(string,-1,' ');

  state=scan(string,-2,' ');

  call scan(string,-2,x,y,' ');

  address=substrn(string,1,x-1);

run;

View solution in original post


All Replies
Solution
‎10-15-2013 07:38 PM
PROC Star
Posts: 7,363

Re: extracting an address from text string

You might be able to use something like:

data have;

  format string $char70.;

  input;

  string=_infile_;

  cards;

46 Siwash Creek Rd Tonasket Wa  98855

Po Box 124 Ford Wa  99013

Po Box 292 Ford Wa  99013

Po Box 124 Welpinit Wa  99040

Po Box 47 Wellpinit Wa  99040

Po Box 267 Wellpinit Wa  99040

Po Box 276 Wellpinit Wa  99040

5443 Ford Wellpinit Rd Wellpinit Wa  99040

Po Box 188 Wellpinit Wa  99040

Po Box 416 Wellpinit Wa  99040

5962a Martha Boardman Rd Wellpinit Wa  99040

Po Box 35 Wellpinit Wa  99040

2915 Number One Canyon Rd Wenatchee Wa  98801-2462

785 Skidmore Rd Colville Wa  99114

;

data want (keep=address zip state);

  set have;

  zip=scan(string,-1,' ');

  state=scan(string,-2,' ');

  call scan(string,-2,x,y,' ');

  address=substrn(string,1,x-1);

run;

Occasional Contributor
Posts: 9

Re: extracting an address from text string

Thanks arthur, this is really slick. I had to modify a little for cities with two words, but this is exactly what I wanted to do.

Now I need to figure out why it works and how it works. 

Frequent Contributor
Frequent Contributor
Posts: 83

Re: extracting an address from text string

The following code  separates all the words (depending on the length of the address you might have to change the code, perhaps a macro will help you).

Here is the code

data want1;
set have;

j=index(string," ");
add1=substr(string,1,j-1);
restadd=substr(string,j+1);


j = index(restadd," ");
add2=substr(restadd,1,j-1);
restadd=substr(restadd,j+1);


j = index(restadd," ");
add3=substr(restadd,1,j-1);
restadd=substr(restadd,j+1);

j = index(restadd," ");
add4=substr(restadd,1,j-1);
restadd=substr(restadd,j+1);

j = index(restadd," ");
add5=substr(restadd,1,j-1);
restadd=substr(restadd,j+1);

j = index(restadd," ");
add6=substr(restadd,1,j-1);
restadd=substr(restadd,j+1);

j = index(restadd," ");
add7=substr(restadd,1,j-1);
restadd=substr(restadd,j+1);
drop j restadd;
run;

output


string


add1


add2


add3


add4


add5


add6


add7


46 Siwash Creek Rd Tonasket Wa  98855


46


Siwash


Creek


Rd


Tonasket


Wa


98855


Po Box 124 Ford Wa  99013


Po


Box


124


Ford


Wa


99013


99013


Po Box 292 Ford Wa  99013


Po


Box


292


Ford


Wa

99013


99013


Po Box 124 Welpinit Wa  99040


Po


Box


124


Welpinit


Wa


99040


99040


Po Box 47 Wellpinit Wa  99040


Po


Box


47


Wellpinit


Wa


99040


99040


Po Box 267 Wellpinit Wa  99040


Po


Box


267


Wellpinit


Wa


99040


99040


Po Box 276 Wellpinit Wa  99040


Po


Box


276


Wellpinit


Wa


99040


99040


5443 Ford Wellpinit Rd Wellpinit Wa  99040


5443


Ford


Wellpinit


Rd


Wellpinit


Wa


99040


Po Box 188 Wellpinit Wa  99040


Po


Box


188


Wellpinit


Wa


99040


99040


Po Box 416 Wellpinit Wa  99040


Po


Box


416


Wellpinit


Wa


99040


99040


5962a Martha Boardman Rd Wellpinit Wa  99040


5962a


Martha


Boardman


Rd


Wellpinit


Wa


99040


Po Box 35 Wellpinit Wa  99040


Po


Box


35


Wellpinit


Wa


99040


99040


2915 Number One Canyon Rd Wenatchee Wa  98801-2462


2915


Number


One


Canyon


Rd


Wenatchee


Wa


785 Skidmore Rd Colville Wa  99114


785


Skidmore


Rd


Colville


Wa


99114


99114

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 362 views
  • 0 likes
  • 3 in conversation