BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RamKumar
Fluorite | Level 6

I'm importing .csv file to SAS and it holds thousands of records as below.

name                Address Line 1        Address Line 2
VALERIE C SANDMIRE              1302 ELLENWOOD DR        MADISON WI 53714-1006
JONAH Z HAUGLEY              884 HUNTERS TRL        SUN PRAIRIE WI 53590-2585
DOROTHY A GENSKE             700 DOUGLAS TRL        MADISON WI 53716-2125

I need to convert the SAS dataset as below after import. It means I need to derive 'state' and 'zip' variable from the variable 'address line 2'.

name                Address Line 1        Address Line 2       State         zip
VALERIE C SANDMIRE             1302 ELLENWOOD DR        MADISON                 WI          53714
JONAH Z HAUGLEY             884 HUNTERS TRL        SUN PRAIRIE          WI          53590
DOROTHY A GENSKE            700 DOUGLAS TRL        MADISON                WI           53716

  

I tried to implement this via substr (tried from right to left to extract only numbers substr(Address Line 2,-1,-10)) and scan function but could not succeed. Please advise.

Any help would be appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

If state is consistently the -2 blank delimited word then the position and length of that word is really all you need to know.

Data Have;
  Input A2 $50.;
 
call scan(a2,-2,p,l,' ');
  length city $32 state $2 zip5 $5;
  city  = substrn(a2,
1,p-1);
  state = substrn(a2,p,l);
  zip5  = scan(left(substrn(a2,p+l)),
1,'-');
  Datalines;
MADISON WI 53714-1006
SUN PRAIRIE WI 53590-2585
MADISON WI 53716-2125
SUN PRAIRIE    WI    53590-2585
;;;;
   run;
proc print;
  
run;

10-28-2014 6-26-02 AM.png

View solution in original post

7 REPLIES 7
user24feb
Barite | Level 11

Probably the key is, to find the position of the first digit/non-character. Perl-functions should be useful. This program probably needs some house-keeping, but in principle should work:

Data Have;
  Input Address_Line_2 $50.;
  Datalines;
MADISON WI 53714-1006
SUN PRAIRIE WI 53590-2585
MADISON WI 53716-2125
;
Run;

Data Want;
  Set Have;
  Pos_of_first_digit=PRXMatch("/\d/",Address_line_2);
  Zip=Substr(Address_line_2,Pos_of_first_digit); * find Position of zip;
  State=(Substr(Address_line_2,Pos_of_first_digit-3,2)); * assumption: state is 2 characters long;
  Address_line_2=Substr(Address_line_2,1,Pos_of_first_digit-4); * drop zip and state;
Run;

Paari
Calcite | Level 5

data abc;

input kk $30. ;

datalines;

MADISON WI 53714-1006       

SUN PRAIRIE WI 53590-2585

MADISON WI 53716-2125

;

run;

data sub;

set abc;

Address_Line_2 = scan(kk,1);

state= scan(kk,2);

zip= substr(compress(kk, , 'kd'),1,5);/* to extract the numeric part */

run;

KachiM
Rhodochrosite | Level 12

data have;

length Name $18  ALine1 $18  ALine2 $25;

input Name @24 ALine1 $18. @44 ALine2 $25.;

datalines;

VALERIE C SANDMIRE  1302 ELLENWOOD DR MADISON WI 53714-1006

JONAH Z HAUGLEY     884 HUNTERS TRL    SUN PRAIRIE WI 53590-2585

DOROTHY A GENSKE     700 DOUGLAS TRL    MADISON WI 53716-2125

;

run;

This program depends on:

[1] count the number of words in ALine2.

[2]  Gets the Zip at the last word adjusting for '-'.

[3] State is last but one word.

[4] Taking the beginning location of State, Addr2 is found in the first location - 1 positions.

data want;

format Name ALine1 Addr2 State Zip;

keep Name ALine1 Addr2 State Zip;

length State $2 zip $5 Addr2 $12;

   set have;

   num = countw(ALine2, ' ');

   Zip = scan(ALine2, num, ' -');

  

   State = scan(ALine2, num - 1, ' ');

   loc = index(ALine2, State);

   Addr2 = substr(ALine2,1, loc - 1);

run;

Kurt_Bremser
Super User

Since the format for the zip code and state code is given, I recommend to roll up the "address line 2" string from the back, starting with length()-x, where x is sufficent to place you at the start of zip/state. What's left is then the municipality name.

AncaTilea
Pyrite | Level 9

How about this?;

data have;

input name     $ 1- 22                Address1 $ 23 -48            Address2 $ 49 - 74;

datalines;

VALERIE C SANDMIRE      1302 ELLENWOOD DR            MADISON WI 53714-1006   

JONAH Z HAUGLEY          884 HUNTERS TRL            SUN PRAIRIE WI 53590-2585

DOROTHY A GENSKE      700 DOUGLAS TRL            MADISON WI 53716-2125   

;

run;

data want;

    set have;

    zipcode = substr(scan(address2,-1," "),1,5);

    state = scan(address2,-2, " ");

run;

I hope it helps,

good luck!

Anca.

jakarman
Barite | Level 11

no need for difficult constructions. Scan SAS(R) 9.3 Functions and CALL Routines: Reference is having the left to right functionality

Use those negative values for that. (-1   en -2 )   

the call scan SAS(R) 9.3 Functions and CALL Routines: Reference is giving the position in the string for that. An improvement could be made on datasp code with that one.

data want;

format Name ALine1 Addr2 State Zip;

keep Name ALine1 Addr2 State Zip;

length State $2 zip $5 Addr2 $12;

   set have;

   Zip = scan(Addr2, -2,' -');

   Call Scan(Addr2,-2,statpos,statlen) ; 

   State = substr(Addr2,statpos,statlen);

   Addr2 = substr(ALine2,1, statpos - 1);

drop statpos statlen;

run;

---->-- ja karman --<-----
data_null__
Jade | Level 19

If state is consistently the -2 blank delimited word then the position and length of that word is really all you need to know.

Data Have;
  Input A2 $50.;
 
call scan(a2,-2,p,l,' ');
  length city $32 state $2 zip5 $5;
  city  = substrn(a2,
1,p-1);
  state = substrn(a2,p,l);
  zip5  = scan(left(substrn(a2,p+l)),
1,'-');
  Datalines;
MADISON WI 53714-1006
SUN PRAIRIE WI 53590-2585
MADISON WI 53716-2125
SUN PRAIRIE    WI    53590-2585
;;;;
   run;
proc print;
  
run;

10-28-2014 6-26-02 AM.png

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1650 views
  • 6 likes
  • 8 in conversation