DATA Step, Macro, Functions and more

Country Code Format Lookup

Accepted Solution Solved
Reply
Contributor
Posts: 64
Accepted Solution

Country Code Format Lookup

I have two datasets. One with all the countries and respective postal code formats as follows (note: N means a number, A means an alphabet):

 

dataset1:

 

Country        Format

Albania         NNNN

Argentina      ANNNNAAA

Canada         ANA NAN

United States NNNNN

........and so on

 

Other dataset with some counties and postal codes as follows:

dataset2:

 

Country                Zip_Code

United States       55344

United States      TW18 4BQ

Canada               V6C 2X8

France                 31360

 

What I need to accomplish is I need to lookup countries and zip codes in dataset 2 to their respective format in dataset 1 and flag if it is in improper format. Desired output as below:

Country                Zip_Code   Improper_Flag

United States       55344           N

United States      TW18 4BQ    Y

Canada               V6C 2X8       N

France                 31360          N

 

Any help is greatly appreciated. Thank you

 

 


Accepted Solutions
Solution
‎02-17-2016 02:05 AM
Super Contributor
Posts: 490

Re: Country Code Format Lookup

Posted in reply to sasmaverick
data have;
infile datalines dlm=",";
length country zip format $20;
input country $ zip $;
retain Rx;
if _N_=1 then Rx=rxparse("$U to A,$D to N"); 
format= rxchange(rx, 1000, zip); 
drop Rx;
cards;
Canada,A6C 2X8
Canada,A6C-2X8
France,31360
Switzerland,123
United States,55344
United States,TW18 4BQ
;run;

data codes;
  infile cards dlm=",";
  input country : $20. code : $20. ;
  cards;
Canada,ANA NAN
France,NNNNN
Switzerland,NNNN
United States,NNNNN
;
run;
data want;
  merge have codes;
  by country;
 if trim(format)= trim(code) then Improper_Flag='N';
 else Improper_Flag='Y';
run;

View solution in original post


All Replies
Super User
Super User
Posts: 7,976

Re: Country Code Format Lookup

Posted in reply to sasmaverick

Why is the format so important here?  It would seem logical to me to ensure the post codes are within the list of Postcodes given rather than format, if they are within that list then they are clean and in the correct format, if they are not within that list they are incorrect.  Checking the fomat doesn't check that the post code is valid?

If you have to then a simple routine to create a coded variable and merge that:  (Do please note for the future how I have created test data in the form of a datastep):

data codes;
  infile datalines dlm=",";
  length country format $20;
  input country $ format $;
datalines;
Albania,NNNN
Argentina,ANNNNAAA
Canada,ANA NAN
United States,NNNNN
;
run;

data have (drop=i chr);
  infile datalines dlm=",";
  length country zipcode format $20;
  input country $ zipcode $;
  do i=1 to 20;
    chr=char(zipcode,i);
    if chr=" " then substr(format,i,1)=chr;
    if chr ne " " and lengthn(compress(chr," ","a"))=0 then substr(format,i,1)="A";
    if chr ne " " and lengthn(compress(chr," ","d"))=0 then substr(format,i,1)="N"; 
  end;    
datalines;
United States,55344
United States,TW18 4BQ
Canada,V6C 2X8
France,31360
;
run;

proc sql;
  create table WANT as
  select  A.*,
          case when not exists(select distinct COUNTRY from CODES where COUNTRY=A.COUNTRY and FORMAT=A.FORMAT) then "N"
               else "Y" end as IMPROPER_FLAG
  from    HAVE A;
quit;
Super User
Posts: 5,432

Re: Country Code Format Lookup

I assume that the OP doens't have access to compete lists of valid zip codes, hence this half way validation.

If there are many other validations in your process, I'm thinking a data equality product, like DataFlux, could be of interest.

Another way that may work (untested) is to use translate() to convert the zip codes to the pattern, and then do the lookup (join/merge) with your pattern data set, and the test for equality.

Data never sleeps
SAS Super FREQ
Posts: 708

Re: Country Code Format Lookup

Posted in reply to sasmaverick

Here is an example using regular expressions, the code translates the A and N into the corresponding expression and then uses PRXMATCH to test

 

data ctry_format;
  infile cards dlm=",";
  input
    country : $32.
    zip_format : $32.
  ;
  length prx_expression $ 128;
  prx_expression = "/";

  drop i;
  do i = 1 to length(zip_format);
    select (char(zip_format, i));
      when ("N") do;
        prx_expression = cats(prx_expression, "\d");
      end;

      when ("A") do;
        prx_expression = cats(prx_expression, "[A-Z]");
      end;

      when (" ") do;
        prx_expression = cats(prx_expression, "\x20");
      end;
    end;
  end;

  prx_expression = cats(prx_expression, "/");
  cards;
Canada,ANA NAN
France,NNNNN
Switzerland,NNNN
United States,NNNNN
;

data ctry_have;
  infile cards dlm=",";
  input
    country : $32.
    zip : $32.
  ;
  cards;
Canada,A6C 2X8
Canada,A6C-2X8
France,31360
Switzerland,123
United States,55344
United States,TW18 4BQ
;

data want;
  merge ctry_format ctry_have;
  by country;
  prxmatch = ( prxmatch(prx_expression, trim(zip)) > 0 );
run;

Bruno

Solution
‎02-17-2016 02:05 AM
Super Contributor
Posts: 490

Re: Country Code Format Lookup

Posted in reply to sasmaverick
data have;
infile datalines dlm=",";
length country zip format $20;
input country $ zip $;
retain Rx;
if _N_=1 then Rx=rxparse("$U to A,$D to N"); 
format= rxchange(rx, 1000, zip); 
drop Rx;
cards;
Canada,A6C 2X8
Canada,A6C-2X8
France,31360
Switzerland,123
United States,55344
United States,TW18 4BQ
;run;

data codes;
  infile cards dlm=",";
  input country : $20. code : $20. ;
  cards;
Canada,ANA NAN
France,NNNNN
Switzerland,NNNN
United States,NNNNN
;
run;
data want;
  merge have codes;
  by country;
 if trim(format)= trim(code) then Improper_Flag='N';
 else Improper_Flag='Y';
run;
Contributor
Posts: 64

Re: Country Code Format Lookup

Posted in reply to mohamed_zaki
Thanks a lot! Works for me
Super User
Posts: 10,041

Re: Country Code Format Lookup

Posted in reply to sasmaverick

It looks like Perl Regular Expression thing.

 

 

data ctry_format;
  infile cards dlm=",";
  input
    country : $32.
    zip_format : $32.
  ;
  length prx $ 200;
  prx =zip_format ;
  prx =tranwrd(strip(prx ),'A','[a-z]');
  prx =tranwrd(strip(prx ),'N','\d');
  prx =tranwrd(strip(prx ),' ','\s');
  cards;
Canada,ANA NAN
France,NNNNN
Switzerland,NNNN
United States,NNNNN
;
run;
data ctry_have;
  infile cards dlm=",";
  input
    country : $32.
    zip : $32.
  ;
  cards;
Canada,A6C 2X8
Canada,A6C-2X8
France,31360
Switzerland,123
United States,55344
United States,TW18 4BQ
;

data want;
  merge ctry_format ctry_have;
  by country;
if not missing(prx) then do;
  if prxmatch(cats('/^',prx,'$/i'),strip(zip)) then Improper_Flag='N';
   else Improper_Flag='Y';
end;
run;
Contributor
Posts: 64

Re: Country Code Format Lookup

Thanks a lot Xia! Your approach works too!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 361 views
  • 2 likes
  • 6 in conversation