I'm importing .csv file to SAS and it holds thousands of records as below.
name | Address Line 1 | Address Line 2 |
VALERIE C SANDMIRE | 1302 ELLENWOOD DR | MADISON WI 53714-1006 |
JONAH Z HAUGLEY | 884 HUNTERS TRL | SUN PRAIRIE WI 53590-2585 |
DOROTHY A GENSKE | 700 DOUGLAS TRL | MADISON WI 53716-2125 |
I need to convert the SAS dataset as below after import. It means I need to derive 'state' and 'zip' variable from the variable 'address line 2'.
name | Address Line 1 | Address Line 2 State zip |
VALERIE C SANDMIRE | 1302 ELLENWOOD DR | MADISON WI 53714 |
JONAH Z HAUGLEY | 884 HUNTERS TRL | SUN PRAIRIE WI 53590 |
DOROTHY A GENSKE | 700 DOUGLAS TRL | MADISON WI 53716 |
I tried to implement this via substr (tried from right to left to extract only numbers substr(Address Line 2,-1,-10)) and scan function but could not succeed. Please advise.
Any help would be appreciated.
If state is consistently the -2 blank delimited word then the position and length of that word is really all you need to know.
Probably the key is, to find the position of the first digit/non-character. Perl-functions should be useful. This program probably needs some house-keeping, but in principle should work:
Data Have;
Input Address_Line_2 $50.;
Datalines;
MADISON WI 53714-1006
SUN PRAIRIE WI 53590-2585
MADISON WI 53716-2125
;
Run;
Data Want;
Set Have;
Pos_of_first_digit=PRXMatch("/\d/",Address_line_2);
Zip=Substr(Address_line_2,Pos_of_first_digit); * find Position of zip;
State=(Substr(Address_line_2,Pos_of_first_digit-3,2)); * assumption: state is 2 characters long;
Address_line_2=Substr(Address_line_2,1,Pos_of_first_digit-4); * drop zip and state;
Run;
data abc;
input kk $30. ;
datalines;
MADISON WI 53714-1006
SUN PRAIRIE WI 53590-2585
MADISON WI 53716-2125
;
run;
data sub;
set abc;
Address_Line_2 = scan(kk,1);
state= scan(kk,2);
zip= substr(compress(kk, , 'kd'),1,5);/* to extract the numeric part */
run;
data have;
length Name $18 ALine1 $18 ALine2 $25;
input Name @24 ALine1 $18. @44 ALine2 $25.;
datalines;
VALERIE C SANDMIRE 1302 ELLENWOOD DR MADISON WI 53714-1006
JONAH Z HAUGLEY 884 HUNTERS TRL SUN PRAIRIE WI 53590-2585
DOROTHY A GENSKE 700 DOUGLAS TRL MADISON WI 53716-2125
;
run;
This program depends on:
[1] count the number of words in ALine2.
[2] Gets the Zip at the last word adjusting for '-'.
[3] State is last but one word.
[4] Taking the beginning location of State, Addr2 is found in the first location - 1 positions.
data want;
format Name ALine1 Addr2 State Zip;
keep Name ALine1 Addr2 State Zip;
length State $2 zip $5 Addr2 $12;
set have;
num = countw(ALine2, ' ');
Zip = scan(ALine2, num, ' -');
State = scan(ALine2, num - 1, ' ');
loc = index(ALine2, State);
Addr2 = substr(ALine2,1, loc - 1);
run;
Since the format for the zip code and state code is given, I recommend to roll up the "address line 2" string from the back, starting with length()-x, where x is sufficent to place you at the start of zip/state. What's left is then the municipality name.
How about this?;
data have;
input name $ 1- 22 Address1 $ 23 -48 Address2 $ 49 - 74;
datalines;
VALERIE C SANDMIRE 1302 ELLENWOOD DR MADISON WI 53714-1006
JONAH Z HAUGLEY 884 HUNTERS TRL SUN PRAIRIE WI 53590-2585
DOROTHY A GENSKE 700 DOUGLAS TRL MADISON WI 53716-2125
;
run;
data want;
set have;
zipcode = substr(scan(address2,-1," "),1,5);
state = scan(address2,-2, " ");
run;
I hope it helps,
good luck!
Anca.
no need for difficult constructions. Scan SAS(R) 9.3 Functions and CALL Routines: Reference is having the left to right functionality
Use those negative values for that. (-1 en -2 )
the call scan SAS(R) 9.3 Functions and CALL Routines: Reference is giving the position in the string for that. An improvement could be made on datasp code with that one.
data want;
format Name ALine1 Addr2 State Zip;
keep Name ALine1 Addr2 State Zip;
length State $2 zip $5 Addr2 $12;
set have;
Zip = scan(Addr2, -2,' -');
Call Scan(Addr2,-2,statpos,statlen) ;
State = substr(Addr2,statpos,statlen);
Addr2 = substr(ALine2,1, statpos - 1);
drop statpos statlen;
run;
If state is consistently the -2 blank delimited word then the position and length of that word is really all you need to know.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.