I'm importing .csv file to SAS and it holds thousands of records as below.
name | Address Line 1 | Address Line 2 |
VALERIE C SANDMIRE | 1302 ELLENWOOD DR | MADISON WI 53714-1006 |
JONAH Z HAUGLEY | 884 HUNTERS TRL | SUN PRAIRIE WI 53590-2585 |
DOROTHY A GENSKE | 700 DOUGLAS TRL | MADISON WI 53716-2125 |
I need to convert the SAS dataset as below after import. It means I need to derive 'state' and 'zip' variable from the variable 'address line 2'.
name | Address Line 1 | Address Line 2 State zip |
VALERIE C SANDMIRE | 1302 ELLENWOOD DR | MADISON WI 53714 |
JONAH Z HAUGLEY | 884 HUNTERS TRL | SUN PRAIRIE WI 53590 |
DOROTHY A GENSKE | 700 DOUGLAS TRL | MADISON WI 53716 |
I tried to implement this via substr (tried from right to left to extract only numbers substr(Address Line 2,-1,-10)) and scan function but could not succeed. Please advise.
Any help would be appreciated.
If state is consistently the -2 blank delimited word then the position and length of that word is really all you need to know.
Probably the key is, to find the position of the first digit/non-character. Perl-functions should be useful. This program probably needs some house-keeping, but in principle should work:
Data Have;
Input Address_Line_2 $50.;
Datalines;
MADISON WI 53714-1006
SUN PRAIRIE WI 53590-2585
MADISON WI 53716-2125
;
Run;
Data Want;
Set Have;
Pos_of_first_digit=PRXMatch("/\d/",Address_line_2);
Zip=Substr(Address_line_2,Pos_of_first_digit); * find Position of zip;
State=(Substr(Address_line_2,Pos_of_first_digit-3,2)); * assumption: state is 2 characters long;
Address_line_2=Substr(Address_line_2,1,Pos_of_first_digit-4); * drop zip and state;
Run;
data abc;
input kk $30. ;
datalines;
MADISON WI 53714-1006
SUN PRAIRIE WI 53590-2585
MADISON WI 53716-2125
;
run;
data sub;
set abc;
Address_Line_2 = scan(kk,1);
state= scan(kk,2);
zip= substr(compress(kk, , 'kd'),1,5);/* to extract the numeric part */
run;
data have;
length Name $18 ALine1 $18 ALine2 $25;
input Name @24 ALine1 $18. @44 ALine2 $25.;
datalines;
VALERIE C SANDMIRE 1302 ELLENWOOD DR MADISON WI 53714-1006
JONAH Z HAUGLEY 884 HUNTERS TRL SUN PRAIRIE WI 53590-2585
DOROTHY A GENSKE 700 DOUGLAS TRL MADISON WI 53716-2125
;
run;
This program depends on:
[1] count the number of words in ALine2.
[2] Gets the Zip at the last word adjusting for '-'.
[3] State is last but one word.
[4] Taking the beginning location of State, Addr2 is found in the first location - 1 positions.
data want;
format Name ALine1 Addr2 State Zip;
keep Name ALine1 Addr2 State Zip;
length State $2 zip $5 Addr2 $12;
set have;
num = countw(ALine2, ' ');
Zip = scan(ALine2, num, ' -');
State = scan(ALine2, num - 1, ' ');
loc = index(ALine2, State);
Addr2 = substr(ALine2,1, loc - 1);
run;
Since the format for the zip code and state code is given, I recommend to roll up the "address line 2" string from the back, starting with length()-x, where x is sufficent to place you at the start of zip/state. What's left is then the municipality name.
How about this?;
data have;
input name $ 1- 22 Address1 $ 23 -48 Address2 $ 49 - 74;
datalines;
VALERIE C SANDMIRE 1302 ELLENWOOD DR MADISON WI 53714-1006
JONAH Z HAUGLEY 884 HUNTERS TRL SUN PRAIRIE WI 53590-2585
DOROTHY A GENSKE 700 DOUGLAS TRL MADISON WI 53716-2125
;
run;
data want;
set have;
zipcode = substr(scan(address2,-1," "),1,5);
state = scan(address2,-2, " ");
run;
I hope it helps,
good luck!
Anca.
no need for difficult constructions. Scan SAS(R) 9.3 Functions and CALL Routines: Reference is having the left to right functionality
Use those negative values for that. (-1 en -2 )
the call scan SAS(R) 9.3 Functions and CALL Routines: Reference is giving the position in the string for that. An improvement could be made on datasp code with that one.
data want;
format Name ALine1 Addr2 State Zip;
keep Name ALine1 Addr2 State Zip;
length State $2 zip $5 Addr2 $12;
set have;
Zip = scan(Addr2, -2,' -');
Call Scan(Addr2,-2,statpos,statlen) ;
State = substr(Addr2,statpos,statlen);
Addr2 = substr(ALine2,1, statpos - 1);
drop statpos statlen;
run;
If state is consistently the -2 blank delimited word then the position and length of that word is really all you need to know.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.