Solved: Country Code Format Lookup

sasmaverick · Posted 02-16-2016 03:54 AM

I have two datasets. One with all the countries and respective postal code formats as follows (note: N means a number, A means an alphabet):

dataset1:

Country Format

Albania NNNN

Argentina ANNNNAAA

Canada ANA NAN

United States NNNNN

........and so on

Other dataset with some counties and postal codes as follows:

dataset2:

Country Zip_Code

United States 55344

United States TW18 4BQ

Canada V6C 2X8

France 31360

What I need to accomplish is I need to lookup countries and zip codes in dataset 2 to their respective format in dataset 1 and flag if it is in improper format. Desired output as below:

Country Zip_Code Improper_Flag

United States 55344 N

United States TW18 4BQ Y

Canada V6C 2X8 N

France 31360 N

Any help is greatly appreciated. Thank you

mohamed_zaki · Posted 02-16-2016 05:47 AM

data have;
infile datalines dlm=",";
length country zip format $20;
input country $ zip $;
retain Rx;
if _N_=1 then Rx=rxparse("$U to A,$D to N"); 
format= rxchange(rx, 1000, zip); 
drop Rx;
cards;
Canada,A6C 2X8
Canada,A6C-2X8
France,31360
Switzerland,123
United States,55344
United States,TW18 4BQ
;run;

data codes;
  infile cards dlm=",";
  input country : $20. code : $20. ;
  cards;
Canada,ANA NAN
France,NNNNN
Switzerland,NNNN
United States,NNNNN
;
run;
data want;
  merge have codes;
  by country;
 if trim(format)= trim(code) then Improper_Flag='N';
 else Improper_Flag='Y';
run;

View solution in original post

RW9 · Posted 02-16-2016 04:21 AM

Why is the format so important here? It would seem logical to me to ensure the post codes are within the list of Postcodes given rather than format, if they are within that list then they are clean and in the correct format, if they are not within that list they are incorrect. Checking the fomat doesn't check that the post code is valid?

If you have to then a simple routine to create a coded variable and merge that: (Do please note for the future how I have created test data in the form of a datastep):

data codes;
  infile datalines dlm=",";
  length country format $20;
  input country $ format $;
datalines;
Albania,NNNN
Argentina,ANNNNAAA
Canada,ANA NAN
United States,NNNNN
;
run;

data have (drop=i chr);
  infile datalines dlm=",";
  length country zipcode format $20;
  input country $ zipcode $;
  do i=1 to 20;
    chr=char(zipcode,i);
    if chr=" " then substr(format,i,1)=chr;
    if chr ne " " and lengthn(compress(chr," ","a"))=0 then substr(format,i,1)="A";
    if chr ne " " and lengthn(compress(chr," ","d"))=0 then substr(format,i,1)="N"; 
  end;    
datalines;
United States,55344
United States,TW18 4BQ
Canada,V6C 2X8
France,31360
;
run;

proc sql;
  create table WANT as
  select  A.*,
          case when not exists(select distinct COUNTRY from CODES where COUNTRY=A.COUNTRY and FORMAT=A.FORMAT) then "N"
               else "Y" end as IMPROPER_FLAG
  from    HAVE A;
quit;

LinusH · Posted 02-16-2016 04:26 AM

I assume that the OP doens't have access to compete lists of valid zip codes, hence this half way validation.

If there are many other validations in your process, I'm thinking a data equality product, like DataFlux, could be of interest.

Another way that may work (untested) is to use translate() to convert the zip codes to the pattern, and then do the lookup (join/merge) with your pattern data set, and the test for equality.

Data never sleeps

BrunoMueller · Posted 02-16-2016 04:50 AM

Here is an example using regular expressions, the code translates the A and N into the corresponding expression and then uses PRXMATCH to test

data ctry_format;
  infile cards dlm=",";
  input
    country : $32.
    zip_format : $32.
  ;
  length prx_expression $ 128;
  prx_expression = "/";

  drop i;
  do i = 1 to length(zip_format);
    select (char(zip_format, i));
      when ("N") do;
        prx_expression = cats(prx_expression, "\d");
      end;

      when ("A") do;
        prx_expression = cats(prx_expression, "[A-Z]");
      end;

      when (" ") do;
        prx_expression = cats(prx_expression, "\x20");
      end;
    end;
  end;

  prx_expression = cats(prx_expression, "/");
  cards;
Canada,ANA NAN
France,NNNNN
Switzerland,NNNN
United States,NNNNN
;

data ctry_have;
  infile cards dlm=",";
  input
    country : $32.
    zip : $32.
  ;
  cards;
Canada,A6C 2X8
Canada,A6C-2X8
France,31360
Switzerland,123
United States,55344
United States,TW18 4BQ
;

data want;
  merge ctry_format ctry_have;
  by country;
  prxmatch = ( prxmatch(prx_expression, trim(zip)) > 0 );
run;

Bruno

mohamed_zaki · Posted 02-16-2016 05:47 AM

data have;
infile datalines dlm=",";
length country zip format $20;
input country $ zip $;
retain Rx;
if _N_=1 then Rx=rxparse("$U to A,$D to N"); 
format= rxchange(rx, 1000, zip); 
drop Rx;
cards;
Canada,A6C 2X8
Canada,A6C-2X8
France,31360
Switzerland,123
United States,55344
United States,TW18 4BQ
;run;

data codes;
  infile cards dlm=",";
  input country : $20. code : $20. ;
  cards;
Canada,ANA NAN
France,NNNNN
Switzerland,NNNN
United States,NNNNN
;
run;
data want;
  merge have codes;
  by country;
 if trim(format)= trim(code) then Improper_Flag='N';
 else Improper_Flag='Y';
run;

sasmaverick · Posted 02-17-2016 02:05 AM

Thanks a lot! Works for me

Ksharp · Posted 02-16-2016 09:57 PM

It looks like Perl Regular Expression thing.

data ctry_format;
  infile cards dlm=",";
  input
    country : $32.
    zip_format : $32.
  ;
  length prx $ 200;
  prx =zip_format ;
  prx =tranwrd(strip(prx ),'A','[a-z]');
  prx =tranwrd(strip(prx ),'N','\d');
  prx =tranwrd(strip(prx ),' ','\s');
  cards;
Canada,ANA NAN
France,NNNNN
Switzerland,NNNN
United States,NNNNN
;
run;
data ctry_have;
  infile cards dlm=",";
  input
    country : $32.
    zip : $32.
  ;
  cards;
Canada,A6C 2X8
Canada,A6C-2X8
France,31360
Switzerland,123
United States,55344
United States,TW18 4BQ
;

data want;
  merge ctry_format ctry_have;
  by country;
if not missing(prx) then do;
  if prxmatch(cats('/^',prx,'$/i'),strip(zip)) then Improper_Flag='N';
   else Improper_Flag='Y';
end;
run;

sasmaverick · Posted 02-17-2016 02:06 AM

Thanks a lot Xia! Your approach works too!

Country Code Format Lookup

Re: Country Code Format Lookup

Re: Country Code Format Lookup

Re: Country Code Format Lookup

Re: Country Code Format Lookup

Re: Country Code Format Lookup

Re: Country Code Format Lookup

Re: Country Code Format Lookup

Re: Country Code Format Lookup

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away