BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mc
Fluorite | Level 6 mc
Fluorite | Level 6

Hi all! I am in the process in completing physical address cleanup. One of the issues I want to clean is removing the city name in the address line. I have created mock data, via datalines, to illustrate example data in a table called "Address_Mock". I have a table called "ZipCodes_Clean" which contains all U.S. states and territories zip codes with all the possible cities attributed to those zip codes in columns named City1 to City130. The following is an example from the table:


Zipcode State Country WorldRegion City1 City2 City3 City4 City5 City6 City7 City8 City9 City10
33172 FL US NA MIAMI DORAL DORAL BRANCH SWEETWATER WEST MIAMI

 

I am using two SAS functions (FINDW and TRANWRD) in a conditional IF-THEN statement to locate if the city is in the address line, create new column containing the address line with the city replaced by blank space. However, the final result has been blank result in the new column.

 

Below is the mock code of how I am currently trying to resolve the issue:

 

data Address_Mock;
infile datalines dsd;
input Name: $8. Address_Line: $26. ZipCode ;
datalines;
MICHELLE,"10597 NW 7 TER MIAMI",33172
NICOLE,"10597 NW 7 TER DORAL",33172
MARIA,"10597 NW 7 TER WEST MIAMI",33172
ANTONIO,"10597 NW 7 TER SWEETWATER",33172
;
run;

proc sql;
create table Address_Join as
select t1.*, t2.*
from Address_Mock as t1 left join ZipCodes_Clean as t2 on t1.ZipCode=t2.ZipCode;
quit;


data Address_Loop;
set Address_Join;

array cols {*} city1-city130 ;

do i=1 to 130;
if FINDW(Address_Line,cols{i}) then New_Address_Line=TRANWRD(Address_Line,cols{i},"");
end;

run;

 

 I'll be very appreciative if anyone sees the error in the code or I'm open to trying alternative code. Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

I would propose a few changes:

 

/* City names must be ordered by decreasing length (i.e. WEST MIAMI before MIAMI) */
data zip;
infile datalines truncover;
input Zipcode (State Country WorldRegion City1 City2 City3 City4 City5 City6 City7 City8 City9 City10) (&:$20.);
datalines;
33172 FL  US  NA  WEST MIAMI  MIAMI  DORAL BRANCH  DORAL  SWEETWATER 
;

data Addr;
infile datalines dsd;
input Name: $8. Line: $26. ZipCode ;
datalines;
MICHELLE,"10597 NW 7 TER MIAMI",33172
NICOLE,"10597 NW 7 TER DORAL",33172
MARIA,"10597 NW 7 TER WEST MIAMI",33172
ANTONIO,"10597 NW 7 TER SWEETWATER",33172
;

proc sql;
create table A as
select *
from Addr as t1 left join Zip as t2 on t1.ZipCode=t2.ZipCode;
quit;

/* Remove city names at the end of address lines */
data B;
set A;

array cols {*} city1-city130 ;

do i=1 to dim(cols) while(not missing(cols{i}));
    pos = length(line)-length(cols{i})+1;
    if pos > 2 then
        if notalpha(substr(line,pos-1,1)) then
            if substr(line, pos) = cols{i} then do;
                line = substr(line,1,pos-2);
                leave;
                end;
    end;
drop pos city: i;
run;
PG

View solution in original post

3 REPLIES 3
PGStats
Opal | Level 21

You might speed things up a bit with

 

do i = 1 to 130 while(not missing(cols{i}));
PG
PGStats
Opal | Level 21

I would propose a few changes:

 

/* City names must be ordered by decreasing length (i.e. WEST MIAMI before MIAMI) */
data zip;
infile datalines truncover;
input Zipcode (State Country WorldRegion City1 City2 City3 City4 City5 City6 City7 City8 City9 City10) (&:$20.);
datalines;
33172 FL  US  NA  WEST MIAMI  MIAMI  DORAL BRANCH  DORAL  SWEETWATER 
;

data Addr;
infile datalines dsd;
input Name: $8. Line: $26. ZipCode ;
datalines;
MICHELLE,"10597 NW 7 TER MIAMI",33172
NICOLE,"10597 NW 7 TER DORAL",33172
MARIA,"10597 NW 7 TER WEST MIAMI",33172
ANTONIO,"10597 NW 7 TER SWEETWATER",33172
;

proc sql;
create table A as
select *
from Addr as t1 left join Zip as t2 on t1.ZipCode=t2.ZipCode;
quit;

/* Remove city names at the end of address lines */
data B;
set A;

array cols {*} city1-city130 ;

do i=1 to dim(cols) while(not missing(cols{i}));
    pos = length(line)-length(cols{i})+1;
    if pos > 2 then
        if notalpha(substr(line,pos-1,1)) then
            if substr(line, pos) = cols{i} then do;
                line = substr(line,1,pos-2);
                leave;
                end;
    end;
drop pos city: i;
run;
PG
mc
Fluorite | Level 6 mc
Fluorite | Level 6

Great modifications to the code! I was able to get the desired outcome. Thanks for noting to organize the city names by descending character length; it is definitely required in order to use the positional approach to remove the city names.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 915 views
  • 4 likes
  • 2 in conversation