BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RamKumar
Fluorite | Level 6

I'm importing .csv file to SAS and it holds thousands of records as below.

name                Address Line 1        Address Line 2
VALERIE C SANDMIRE              1302 ELLENWOOD DR        MADISON WI 53714-1006
JONAH Z HAUGLEY              884 HUNTERS TRL        SUN PRAIRIE WI 53590-2585
DOROTHY A GENSKE             700 DOUGLAS TRL        MADISON WI 53716-2125

I need to convert the SAS dataset as below after import. It means I need to derive 'state' and 'zip' variable from the variable 'address line 2'.

name                Address Line 1        Address Line 2       State         zip
VALERIE C SANDMIRE             1302 ELLENWOOD DR        MADISON                 WI          53714
JONAH Z HAUGLEY             884 HUNTERS TRL        SUN PRAIRIE          WI          53590
DOROTHY A GENSKE            700 DOUGLAS TRL        MADISON                WI           53716

  

I tried to implement this via substr (tried from right to left to extract only numbers substr(Address Line 2,-1,-10)) and scan function but could not succeed. Please advise.

Any help would be appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

If state is consistently the -2 blank delimited word then the position and length of that word is really all you need to know.

Data Have;
  Input A2 $50.;
 
call scan(a2,-2,p,l,' ');
  length city $32 state $2 zip5 $5;
  city  = substrn(a2,
1,p-1);
  state = substrn(a2,p,l);
  zip5  = scan(left(substrn(a2,p+l)),
1,'-');
  Datalines;
MADISON WI 53714-1006
SUN PRAIRIE WI 53590-2585
MADISON WI 53716-2125
SUN PRAIRIE    WI    53590-2585
;;;;
   run;
proc print;
  
run;

10-28-2014 6-26-02 AM.png

View solution in original post

7 REPLIES 7
user24feb
Barite | Level 11

Probably the key is, to find the position of the first digit/non-character. Perl-functions should be useful. This program probably needs some house-keeping, but in principle should work:

Data Have;
  Input Address_Line_2 $50.;
  Datalines;
MADISON WI 53714-1006
SUN PRAIRIE WI 53590-2585
MADISON WI 53716-2125
;
Run;

Data Want;
  Set Have;
  Pos_of_first_digit=PRXMatch("/\d/",Address_line_2);
  Zip=Substr(Address_line_2,Pos_of_first_digit); * find Position of zip;
  State=(Substr(Address_line_2,Pos_of_first_digit-3,2)); * assumption: state is 2 characters long;
  Address_line_2=Substr(Address_line_2,1,Pos_of_first_digit-4); * drop zip and state;
Run;

Paari
Calcite | Level 5

data abc;

input kk $30. ;

datalines;

MADISON WI 53714-1006       

SUN PRAIRIE WI 53590-2585

MADISON WI 53716-2125

;

run;

data sub;

set abc;

Address_Line_2 = scan(kk,1);

state= scan(kk,2);

zip= substr(compress(kk, , 'kd'),1,5);/* to extract the numeric part */

run;

KachiM
Rhodochrosite | Level 12

data have;

length Name $18  ALine1 $18  ALine2 $25;

input Name @24 ALine1 $18. @44 ALine2 $25.;

datalines;

VALERIE C SANDMIRE  1302 ELLENWOOD DR MADISON WI 53714-1006

JONAH Z HAUGLEY     884 HUNTERS TRL    SUN PRAIRIE WI 53590-2585

DOROTHY A GENSKE     700 DOUGLAS TRL    MADISON WI 53716-2125

;

run;

This program depends on:

[1] count the number of words in ALine2.

[2]  Gets the Zip at the last word adjusting for '-'.

[3] State is last but one word.

[4] Taking the beginning location of State, Addr2 is found in the first location - 1 positions.

data want;

format Name ALine1 Addr2 State Zip;

keep Name ALine1 Addr2 State Zip;

length State $2 zip $5 Addr2 $12;

   set have;

   num = countw(ALine2, ' ');

   Zip = scan(ALine2, num, ' -');

  

   State = scan(ALine2, num - 1, ' ');

   loc = index(ALine2, State);

   Addr2 = substr(ALine2,1, loc - 1);

run;

Kurt_Bremser
Super User

Since the format for the zip code and state code is given, I recommend to roll up the "address line 2" string from the back, starting with length()-x, where x is sufficent to place you at the start of zip/state. What's left is then the municipality name.

AncaTilea
Pyrite | Level 9

How about this?;

data have;

input name     $ 1- 22                Address1 $ 23 -48            Address2 $ 49 - 74;

datalines;

VALERIE C SANDMIRE      1302 ELLENWOOD DR            MADISON WI 53714-1006   

JONAH Z HAUGLEY          884 HUNTERS TRL            SUN PRAIRIE WI 53590-2585

DOROTHY A GENSKE      700 DOUGLAS TRL            MADISON WI 53716-2125   

;

run;

data want;

    set have;

    zipcode = substr(scan(address2,-1," "),1,5);

    state = scan(address2,-2, " ");

run;

I hope it helps,

good luck!

Anca.

jakarman
Barite | Level 11

no need for difficult constructions. Scan SAS(R) 9.3 Functions and CALL Routines: Reference is having the left to right functionality

Use those negative values for that. (-1   en -2 )   

the call scan SAS(R) 9.3 Functions and CALL Routines: Reference is giving the position in the string for that. An improvement could be made on datasp code with that one.

data want;

format Name ALine1 Addr2 State Zip;

keep Name ALine1 Addr2 State Zip;

length State $2 zip $5 Addr2 $12;

   set have;

   Zip = scan(Addr2, -2,' -');

   Call Scan(Addr2,-2,statpos,statlen) ; 

   State = substr(Addr2,statpos,statlen);

   Addr2 = substr(ALine2,1, statpos - 1);

drop statpos statlen;

run;

---->-- ja karman --<-----
data_null__
Jade | Level 19

If state is consistently the -2 blank delimited word then the position and length of that word is really all you need to know.

Data Have;
  Input A2 $50.;
 
call scan(a2,-2,p,l,' ');
  length city $32 state $2 zip5 $5;
  city  = substrn(a2,
1,p-1);
  state = substrn(a2,p,l);
  zip5  = scan(left(substrn(a2,p+l)),
1,'-');
  Datalines;
MADISON WI 53714-1006
SUN PRAIRIE WI 53590-2585
MADISON WI 53716-2125
SUN PRAIRIE    WI    53590-2585
;;;;
   run;
proc print;
  
run;

10-28-2014 6-26-02 AM.png

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2341 views
  • 6 likes
  • 8 in conversation