DATA Step, Macro, Functions and more

Can an array of columns be used in FINDW and TRANWRD?

Accepted Solution Solved
Reply
Occasional Contributor mc
Occasional Contributor
Posts: 6
Accepted Solution

Can an array of columns be used in FINDW and TRANWRD?

[ Edited ]

Hi all! I am in the process in completing physical address cleanup. One of the issues I want to clean is removing the city name in the address line. I have created mock data, via datalines, to illustrate example data in a table called "Address_Mock". I have a table called "ZipCodes_Clean" which contains all U.S. states and territories zip codes with all the possible cities attributed to those zip codes in columns named City1 to City130. The following is an example from the table:


Zipcode State Country WorldRegion City1 City2 City3 City4 City5 City6 City7 City8 City9 City10
33172 FL US NA MIAMI DORAL DORAL BRANCH SWEETWATER WEST MIAMI

 

I am using two SAS functions (FINDW and TRANWRD) in a conditional IF-THEN statement to locate if the city is in the address line, create new column containing the address line with the city replaced by blank space. However, the final result has been blank result in the new column.

 

Below is the mock code of how I am currently trying to resolve the issue:

 

data Address_Mock;
infile datalines dsd;
input Name: $8. Address_Line: $26. ZipCode ;
datalines;
MICHELLE,"10597 NW 7 TER MIAMI",33172
NICOLE,"10597 NW 7 TER DORAL",33172
MARIA,"10597 NW 7 TER WEST MIAMI",33172
ANTONIO,"10597 NW 7 TER SWEETWATER",33172
;
run;

proc sql;
create table Address_Join as
select t1.*, t2.*
from Address_Mock as t1 left join ZipCodes_Clean as t2 on t1.ZipCode=t2.ZipCode;
quit;


data Address_Loop;
set Address_Join;

array cols {*} city1-city130 ;

do i=1 to 130;
if FINDW(Address_Line,cols{i}) then New_Address_Line=TRANWRD(Address_Line,cols{i},"");
end;

run;

 

 I'll be very appreciative if anyone sees the error in the code or I'm open to trying alternative code. Thanks in advance!


Accepted Solutions
Solution
‎02-08-2018 09:31 AM
Esteemed Advisor
Posts: 5,394

Re: Can an array of columns be used in FINDW and TRANWRD?

I would propose a few changes:

 

/* City names must be ordered by decreasing length (i.e. WEST MIAMI before MIAMI) */
data zip;
infile datalines truncover;
input Zipcode (State Country WorldRegion City1 City2 City3 City4 City5 City6 City7 City8 City9 City10) (&:$20.);
datalines;
33172 FL  US  NA  WEST MIAMI  MIAMI  DORAL BRANCH  DORAL  SWEETWATER 
;

data Addr;
infile datalines dsd;
input Name: $8. Line: $26. ZipCode ;
datalines;
MICHELLE,"10597 NW 7 TER MIAMI",33172
NICOLE,"10597 NW 7 TER DORAL",33172
MARIA,"10597 NW 7 TER WEST MIAMI",33172
ANTONIO,"10597 NW 7 TER SWEETWATER",33172
;

proc sql;
create table A as
select *
from Addr as t1 left join Zip as t2 on t1.ZipCode=t2.ZipCode;
quit;

/* Remove city names at the end of address lines */
data B;
set A;

array cols {*} city1-city130 ;

do i=1 to dim(cols) while(not missing(cols{i}));
    pos = length(line)-length(cols{i})+1;
    if pos > 2 then
        if notalpha(substr(line,pos-1,1)) then
            if substr(line, pos) = cols{i} then do;
                line = substr(line,1,pos-2);
                leave;
                end;
    end;
drop pos city: i;
run;
PG

View solution in original post


All Replies
Esteemed Advisor
Posts: 5,394

Re: Can an array of columns be used in FINDW and TRANWRD?

You might speed things up a bit with

 

do i = 1 to 130 while(not missing(cols{i}));
PG
Solution
‎02-08-2018 09:31 AM
Esteemed Advisor
Posts: 5,394

Re: Can an array of columns be used in FINDW and TRANWRD?

I would propose a few changes:

 

/* City names must be ordered by decreasing length (i.e. WEST MIAMI before MIAMI) */
data zip;
infile datalines truncover;
input Zipcode (State Country WorldRegion City1 City2 City3 City4 City5 City6 City7 City8 City9 City10) (&:$20.);
datalines;
33172 FL  US  NA  WEST MIAMI  MIAMI  DORAL BRANCH  DORAL  SWEETWATER 
;

data Addr;
infile datalines dsd;
input Name: $8. Line: $26. ZipCode ;
datalines;
MICHELLE,"10597 NW 7 TER MIAMI",33172
NICOLE,"10597 NW 7 TER DORAL",33172
MARIA,"10597 NW 7 TER WEST MIAMI",33172
ANTONIO,"10597 NW 7 TER SWEETWATER",33172
;

proc sql;
create table A as
select *
from Addr as t1 left join Zip as t2 on t1.ZipCode=t2.ZipCode;
quit;

/* Remove city names at the end of address lines */
data B;
set A;

array cols {*} city1-city130 ;

do i=1 to dim(cols) while(not missing(cols{i}));
    pos = length(line)-length(cols{i})+1;
    if pos > 2 then
        if notalpha(substr(line,pos-1,1)) then
            if substr(line, pos) = cols{i} then do;
                line = substr(line,1,pos-2);
                leave;
                end;
    end;
drop pos city: i;
run;
PG
Occasional Contributor mc
Occasional Contributor
Posts: 6

Re: Can an array of columns be used in FINDW and TRANWRD?

Great modifications to the code! I was able to get the desired outcome. Thanks for noting to organize the city names by descending character length; it is definitely required in order to use the positional approach to remove the city names.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 102 views
  • 4 likes
  • 2 in conversation