BookmarkSubscribeRSS Feed
BhargavDesai
Calcite | Level 5

Hello,

 

Can some one please help as how to parse the address string into street, city, state, zip and Country?  The challenge is the string has different delimiters.  

 

Here is the sample string: (Note : some data line has street 2 and some don't.)

 

Street 1: ABC PL, HSE No: 100, Street 2: ATTN  xxxx / City: Loredo, State: TX, Zip: 77702-2600, / Country: US

 

Street 1: Cochte Rd, HSE No: 770, / City: FRAMI, State: MI, Zip: 01221-4666, / Country: US

 

Street 1: S 6th St,  Street 2: Mail Stop , / City: Manglore, State: MN, Zip: 66402-3601, / Country: US

 

Street 1: THOMAS PL, / City: EVERMAN, State: TX, Zip: 76140-4610, / Country: US

 

Appreciate any help.

 

Thanks in advance.

 

 

-------------------------------------------

 

5 REPLIES 5
andreas_lds
Jade | Level 19

What do you expect as result?

 

It seems that this problem could be solved by using regular expressions, maybe just one with multiple groups. But all depends on the structure you want as result.

BhargavDesai
Calcite | Level 5

Thanks.

 

What I am looking for is string parsed in following fields

 

STREET1    Street2   City State Zip Country.

 

Does this make sense?

 

Thanks

Reeza
Super User

Replace  your colons with equal sign and use named input. 

 

You can use TRANSLATE to replace the colon with equal sign.

 

https://documentation.sas.com/?docsetId=lrcon&docsetTarget=n1w749t788cgi2n1txpuccsuqtro.htm&docsetVe...

 

You can also just use SCAN() and a loop. 

 

1. use countc to count the number of colons. 

2. Use the result from 1 to loop your data. 

3. Use SCAN() to filter variable and value

4. Transpose data

 

 

 

 

Ksharp
Super User

How about this one ?

 

data have;
infile cards truncover;
input have $200.;
cards;
Street 1: ABC PL, HSE No: 100, Street 2: ATTN  xxxx / City: Loredo, State: TX, Zip: 77702-2600, / Country: US
Street 1: Cochte Rd, HSE No: 770, / City: FRAMI, State: MI, Zip: 01221-4666, / Country: US
Street 1: S 6th St,  Street 2: Mail Stop , / City: Manglore, State: MN, Zip: 66402-3601, / Country: US
Street 1: THOMAS PL, / City: EVERMAN, State: TX, Zip: 76140-4610, / Country: US
;
data temp;
 set have;
 pid=prxparse('/Street 1:|Street 2:|City:|State:|Zip:|Country:/i');
 s=1;e=length(have);
 id+1;
 call prxnext(pid,s,e,have,p,l);
 do while(p>0);
  name=substr(have,p,l); output;
  call prxnext(pid,s,e,have,p,l);
 end;
run;
data want;
 merge temp temp(firstobs=2 keep=id p rename=(id=_id p=_p));
 if id=_id then value=substr(have,p+l,_p-p-l);
  else value=substr(have,p+l,e-p-l+1);
 keep id name value;
run;
proc transpose data=want out=final_want;
by id;
id name;
var value;
run;
ballardw
Super User

Something like this might be able to read the file properly to begin with.

I may have included more checks for existing values but once you have one set of inconsistent values such as Street2 and the House number everything becomes questionable in my book.

 

If you don't want the house number the drop it. You may need to set the length of some variables longer.

data work.want;
   infile datalines dlm=',/' truncover;
   input @"Street 1: " Street1 :$15. @;
   if findw(_infile_,"HSE No:")> 0 then input  @"HSE No: "   Housenum :$15.  @;
   if findw(_infile_,"Street 2:")> 0 then input           @"Street 2: " Street2 :$15. @;
   if findw(_infile_,"City:")> 0 then input           @"City: "     City :$15. @;
   if findw(_infile_,"State:")> 0 then input           @"State: "    State :$15. @;
   if findw(_infile_,"Zip:")> 0 then input           @"Zip: "      Zip :$15. @;
   if findw(_infile_,"Country:")> 0 then input           @"Country:"   Country :$15. @;
   input;
   ;
datalines;
Street 1: ABC PL, HSE No: 100, Street 2: ATTN  xxxx / City: Loredo, State: TX, Zip: 77702-2600,/ Country: US
Street 1: Cochte Rd, HSE No: 770,/City: FRAMI, State: MI, Zip: 01221-4666,/Country: US
Street 1: S 6th St,  Street 2: Mail Stop ,/City: Manglore, State: MN, Zip: 66402-3601,/Country: US
Street 1: THOMAS PL,/City: EVERMAN, State: TX, Zip: 76140-4610,/Country: US
;

Please post code and data into a code box opened using the forum's {I} icon. The forum main message windows will reformat text to some extent and what we see may not be what you actually have.

Plus extra line feeds aren't helpful.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 794 views
  • 0 likes
  • 5 in conversation