## Need help on substr and scan function

Solved
Regular Contributor
Posts: 168

# Need help on substr and scan function

I'm importing .csv file to SAS and it holds thousands of records as below.

 name Address Line 1 Address Line 2 VALERIE C SANDMIRE 1302 ELLENWOOD DR MADISON WI 53714-1006 JONAH Z HAUGLEY 884 HUNTERS TRL SUN PRAIRIE WI 53590-2585 DOROTHY A GENSKE 700 DOUGLAS TRL MADISON WI 53716-2125

I need to convert the SAS dataset as below after import. It means I need to derive 'state' and 'zip' variable from the variable 'address line 2'.

 name Address Line 1 Address Line 2       State         zip VALERIE C SANDMIRE 1302 ELLENWOOD DR MADISON                 WI          53714 JONAH Z HAUGLEY 884 HUNTERS TRL SUN PRAIRIE          WI          53590 DOROTHY A GENSKE 700 DOUGLAS TRL MADISON                WI           53716

I tried to implement this via substr (tried from right to left to extract only numbers substr(Address Line 2,-1,-10)) and scan function but could not succeed. Please advise.

Any help would be appreciated.

Accepted Solutions
Solution
‎10-28-2014 07:27 AM
Posts: 3,852

## Re: Need help on substr and scan function

If state is consistently the -2 blank delimited word then the position and length of that word is really all you need to know.

Data Have;
Input A2 \$50.;

call scan(a2,-2,p,l,' ');
length city \$32 state \$2 zip5 \$5;
city  = substrn(a2,
1,p-1);
state = substrn(a2,p,l);
zip5  = scan(left(substrn(a2,p+l)),
1,'-');
Datalines;
SUN PRAIRIE WI 53590-2585
SUN PRAIRIE    WI    53590-2585
;;;;
run;
proc print;

run;

All Replies
Super Contributor
Posts: 355

## Re: Need help on substr and scan function

Probably the key is, to find the position of the first digit/non-character. Perl-functions should be useful. This program probably needs some house-keeping, but in principle should work:

Data Have;
Datalines;
SUN PRAIRIE WI 53590-2585
;
Run;

Data Want;
Set Have;
Zip=Substr(Address_line_2,Pos_of_first_digit); * find Position of zip;
State=(Substr(Address_line_2,Pos_of_first_digit-3,2)); * assumption: state is 2 characters long;
Run;

Occasional Contributor
Posts: 16

## Re: Need help on substr and scan function

data abc;

input kk \$30. ;

datalines;

SUN PRAIRIE WI 53590-2585

;

run;

data sub;

set abc;

state= scan(kk,2);

zip= substr(compress(kk, , 'kd'),1,5);/* to extract the numeric part */

run;

Super Contributor
Posts: 324

## Re: Need help on substr and scan function

data have;

length Name \$18  ALine1 \$18  ALine2 \$25;

input Name @24 ALine1 \$18. @44 ALine2 \$25.;

datalines;

VALERIE C SANDMIRE  1302 ELLENWOOD DR MADISON WI 53714-1006

JONAH Z HAUGLEY     884 HUNTERS TRL    SUN PRAIRIE WI 53590-2585

DOROTHY A GENSKE     700 DOUGLAS TRL    MADISON WI 53716-2125

;

run;

This program depends on:

[1] count the number of words in ALine2.

[2]  Gets the Zip at the last word adjusting for '-'.

[3] State is last but one word.

[4] Taking the beginning location of State, Addr2 is found in the first location - 1 positions.

data want;

format Name ALine1 Addr2 State Zip;

keep Name ALine1 Addr2 State Zip;

length State \$2 zip \$5 Addr2 \$12;

set have;

num = countw(ALine2, ' ');

Zip = scan(ALine2, num, ' -');

State = scan(ALine2, num - 1, ' ');

loc = index(ALine2, State);

Addr2 = substr(ALine2,1, loc - 1);

run;

Super User
Posts: 10,239

## Re: Need help on substr and scan function

Since the format for the zip code and state code is given, I recommend to roll up the "address line 2" string from the back, starting with length()-x, where x is sufficent to place you at the start of zip/state. What's left is then the municipality name.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super Contributor
Posts: 543

## Re: Need help on substr and scan function

data have;

input name     \$ 1- 22                Address1 \$ 23 -48            Address2 \$ 49 - 74;

datalines;

VALERIE C SANDMIRE      1302 ELLENWOOD DR            MADISON WI 53714-1006

JONAH Z HAUGLEY          884 HUNTERS TRL            SUN PRAIRIE WI 53590-2585

DOROTHY A GENSKE      700 DOUGLAS TRL            MADISON WI 53716-2125

;

run;

data want;

set have;

run;

I hope it helps,

good luck!

Anca.

Posts: 3,215

## Re: Need help on substr and scan function

no need for difficult constructions. Scan SAS(R) 9.3 Functions and CALL Routines: Reference is having the left to right functionality

Use those negative values for that. (-1   en -2 )

the call scan SAS(R) 9.3 Functions and CALL Routines: Reference is giving the position in the string for that. An improvement could be made on datasp code with that one.

data want;

format Name ALine1 Addr2 State Zip;

keep Name ALine1 Addr2 State Zip;

length State \$2 zip \$5 Addr2 \$12;

set have;

Addr2 = substr(ALine2,1, statpos - 1);

drop statpos statlen;

run;

---->-- ja karman --<-----
Solution
‎10-28-2014 07:27 AM
Posts: 3,852

## Re: Need help on substr and scan function

If state is consistently the -2 blank delimited word then the position and length of that word is really all you need to know.

Data Have;
Input A2 \$50.;

call scan(a2,-2,p,l,' ');
length city \$32 state \$2 zip5 \$5;
city  = substrn(a2,
1,p-1);
state = substrn(a2,p,l);
zip5  = scan(left(substrn(a2,p+l)),
1,'-');
Datalines;
SUN PRAIRIE WI 53590-2585
SUN PRAIRIE    WI    53590-2585
;;;;
run;
proc print;

run;

🔒 This topic is solved and locked.