DATA Step, Macro, Functions and more

Need help on substr and scan function

Accepted Solution Solved
Reply
Regular Contributor
Posts: 168
Accepted Solution

Need help on substr and scan function

I'm importing .csv file to SAS and it holds thousands of records as below.

name                Address Line 1        Address Line 2
VALERIE C SANDMIRE              1302 ELLENWOOD DR        MADISON WI 53714-1006
JONAH Z HAUGLEY              884 HUNTERS TRL        SUN PRAIRIE WI 53590-2585
DOROTHY A GENSKE             700 DOUGLAS TRL        MADISON WI 53716-2125

I need to convert the SAS dataset as below after import. It means I need to derive 'state' and 'zip' variable from the variable 'address line 2'.

name                Address Line 1        Address Line 2       State         zip
VALERIE C SANDMIRE             1302 ELLENWOOD DR        MADISON                 WI          53714
JONAH Z HAUGLEY             884 HUNTERS TRL        SUN PRAIRIE          WI          53590
DOROTHY A GENSKE            700 DOUGLAS TRL        MADISON                WI           53716

  

I tried to implement this via substr (tried from right to left to extract only numbers substr(Address Line 2,-1,-10)) and scan function but could not succeed. Please advise.

Any help would be appreciated.


Accepted Solutions
Solution
‎10-28-2014 07:27 AM
Respected Advisor
Posts: 3,777

Re: Need help on substr and scan function

If state is consistently the -2 blank delimited word then the position and length of that word is really all you need to know.

Data Have;
  Input A2 $50.;
 
call scan(a2,-2,p,l,' ');
  length city $32 state $2 zip5 $5;
  city  = substrn(a2,
1,p-1);
  state = substrn(a2,p,l);
  zip5  = scan(left(substrn(a2,p+l)),
1,'-');
  Datalines;
MADISON WI 53714-1006
SUN PRAIRIE WI 53590-2585
MADISON WI 53716-2125
SUN PRAIRIE    WI    53590-2585
;;;;
   run;
proc print;
  
run;

10-28-2014 6-26-02 AM.png

View solution in original post


All Replies
Super Contributor
Posts: 336

Re: Need help on substr and scan function

Probably the key is, to find the position of the first digit/non-character. Perl-functions should be useful. This program probably needs some house-keeping, but in principle should work:

Data Have;
  Input Address_Line_2 $50.;
  Datalines;
MADISON WI 53714-1006
SUN PRAIRIE WI 53590-2585
MADISON WI 53716-2125
;
Run;

Data Want;
  Set Have;
  Pos_of_first_digit=PRXMatch("/\d/",Address_line_2);
  Zip=Substr(Address_line_2,Pos_of_first_digit); * find Position of zip;
  State=(Substr(Address_line_2,Pos_of_first_digit-3,2)); * assumption: state is 2 characters long;
  Address_line_2=Substr(Address_line_2,1,Pos_of_first_digit-4); * drop zip and state;
Run;

Occasional Contributor
Posts: 16

Re: Need help on substr and scan function

data abc;

input kk $30. ;

datalines;

MADISON WI 53714-1006       

SUN PRAIRIE WI 53590-2585

MADISON WI 53716-2125

;

run;

data sub;

set abc;

Address_Line_2 = scan(kk,1);

state= scan(kk,2);

zip= substr(compress(kk, , 'kd'),1,5);/* to extract the numeric part */

run;

Super Contributor
Posts: 254

Re: Need help on substr and scan function

data have;

length Name $18  ALine1 $18  ALine2 $25;

input Name @24 ALine1 $18. @44 ALine2 $25.;

datalines;

VALERIE C SANDMIRE  1302 ELLENWOOD DR MADISON WI 53714-1006

JONAH Z HAUGLEY     884 HUNTERS TRL    SUN PRAIRIE WI 53590-2585

DOROTHY A GENSKE     700 DOUGLAS TRL    MADISON WI 53716-2125

;

run;

This program depends on:

[1] count the number of words in ALine2.

[2]  Gets the Zip at the last word adjusting for '-'.

[3] State is last but one word.

[4] Taking the beginning location of State, Addr2 is found in the first location - 1 positions.

data want;

format Name ALine1 Addr2 State Zip;

keep Name ALine1 Addr2 State Zip;

length State $2 zip $5 Addr2 $12;

   set have;

   num = countw(ALine2, ' ');

   Zip = scan(ALine2, num, ' -');

  

   State = scan(ALine2, num - 1, ' ');

   loc = index(ALine2, State);

   Addr2 = substr(ALine2,1, loc - 1);

run;

Super User
Posts: 6,932

Re: Need help on substr and scan function

Since the format for the zip code and state code is given, I recommend to roll up the "address line 2" string from the back, starting with length()-x, where x is sufficent to place you at the start of zip/state. What's left is then the municipality name.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super Contributor
Posts: 543

Re: Need help on substr and scan function

How about this?;

data have;

input name     $ 1- 22                Address1 $ 23 -48            Address2 $ 49 - 74;

datalines;

VALERIE C SANDMIRE      1302 ELLENWOOD DR            MADISON WI 53714-1006   

JONAH Z HAUGLEY          884 HUNTERS TRL            SUN PRAIRIE WI 53590-2585

DOROTHY A GENSKE      700 DOUGLAS TRL            MADISON WI 53716-2125   

;

run;

data want;

    set have;

    zipcode = substr(scan(address2,-1," "),1,5);

    state = scan(address2,-2, " ");

run;

I hope it helps,

good luck!

Anca.

Valued Guide
Posts: 3,208

Re: Need help on substr and scan function

no need for difficult constructions. Scan SAS(R) 9.3 Functions and CALL Routines: Reference is having the left to right functionality

Use those negative values for that. (-1   en -2 )   

the call scan SAS(R) 9.3 Functions and CALL Routines: Reference is giving the position in the string for that. An improvement could be made on datasp code with that one.

data want;

format Name ALine1 Addr2 State Zip;

keep Name ALine1 Addr2 State Zip;

length State $2 zip $5 Addr2 $12;

   set have;

   Zip = scan(Addr2, -2,' -');

   Call Scan(Addr2,-2,statpos,statlen) ; 

   State = substr(Addr2,statpos,statlen);

   Addr2 = substr(ALine2,1, statpos - 1);

drop statpos statlen;

run;

---->-- ja karman --<-----
Solution
‎10-28-2014 07:27 AM
Respected Advisor
Posts: 3,777

Re: Need help on substr and scan function

If state is consistently the -2 blank delimited word then the position and length of that word is really all you need to know.

Data Have;
  Input A2 $50.;
 
call scan(a2,-2,p,l,' ');
  length city $32 state $2 zip5 $5;
  city  = substrn(a2,
1,p-1);
  state = substrn(a2,p,l);
  zip5  = scan(left(substrn(a2,p+l)),
1,'-');
  Datalines;
MADISON WI 53714-1006
SUN PRAIRIE WI 53590-2585
MADISON WI 53716-2125
SUN PRAIRIE    WI    53590-2585
;;;;
   run;
proc print;
  
run;

10-28-2014 6-26-02 AM.png
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 269 views
  • 6 likes
  • 8 in conversation