Zip Code validation help?

Reply
Contributor
Posts: 46

Zip Code validation help?

I have a Zip code column, and I need validate using the following:

Zip code: Zip code is considered as invalid if first 5 digits falls between these,

between '00001' and '00099'

between '00200' and '00399' 

between '00545' and '00599'

between '09900' and '09999'

between '26900' and '26999'

between '34300' and '34399'

between '34800' and '34899'

between '42800' and '42999'

between '52900' and '52999' 

between '56800' and '56999' 

between '57800' and '57999'    

between '58900' and '58999'   

between '65900' and '65999'   

between '69400' and '69999'   

between '71500' and '71599'

The challenge is long the list of conditions is long like above, how to code this? A Proc format would suffice or some other solution?

Grand Advisor
Posts: 17,358

Re: Zip Code validation help?

Isn't there a table of ZIP codes in the SASHELP library that you could validate against instead?

SASHELP.ZIPCODE

Quarterly Updates of ZIP Code Data Set Available from SAS Maps Online

Obviously this is only valid if you're in the US Smiley Happy

SAS Employee
Posts: 170

Re: Zip Code validation help?

You can also use Proc GEOCODE in SAS/GRAPH.  It will validate all your ZIP Codes with the data set SASHELP.ZIPCODE. 

proc geocode

   method     = zip                        /* Geocoding method      */

   data          = work.zipcode        /* Input address data    */

   out            = work.geocoded;     /* Output data set       */

run;

For more info, see: http://support.sas.com/rnd/papers/#2013 and the paper PROC GEOCODE: Finding Locations Outside the U.S.

Note, this will work with any country that you have Postal Codes for.  We make available Australia and Great Britain, which are free.

Grand Advisor
Posts: 9,578

Re: Zip Code validation help?

Yeah. proc format is a good idea.  Make a dataset to contain these range ,then proc format + cntlin= .

Alternative way is Hash Table.

Xia Keshan

Contributor
Posts: 46

Re: Zip Code validation help?

Hi, Boss I have no clue of hashes as I am very novice to SAS programming, i might try proc format unless if you can gift me a hash solution if you have time Smiley Happy

Grand Advisor
Posts: 9,578

Re: Zip Code validation help?

Hash Table almost like proc format . assuming they are all numeric value.

proc format;

value fmt

1-99,200-399='N' ;

Hash Table just hold all these discrete value in a array, and check it when running data step .

I proc format is good enough , if you are not familiar with Hash Table.

Xia Keshan

Contributor
Posts: 46

Re: Zip Code validation help?

Yes, I'm afraid they are all character values, that adds to complexity in defining the range

Frequent Contributor
Posts: 115

Re: Zip Code validation help?

Does this help?

data have;
do zipcode=00001 to 00099;
  zipcode1=put(zipcode, z5.);
  output;
end;

do zipcode1='12345','54321','45101';
  output;
end;

drop zipcode;
run;


data want;
set have;

if '00001' <= zipcode1 <= '00099' then
  flag='invalid';
else flag='valid';
run;

/*check the last 3 records in the result WANT dataset*/

Esteemed Advisor
Esteemed Advisor
Posts: 7,217

Re: Zip Code validation help?

Do you mean the data is stored as character, or actually contains characters which are non-numeric (i.e. not 0-9).  If they only contain the characters 0-9 then they can be converted to numeric per my mail.  If however they contain non-numeric characters, e.g. a-z (A-Z), then numeric ranges will not work. 

However looking at Darrel@sas's response, that would seem to be the most logical way to proceed.

Grand Advisor
Posts: 9,578

Re: Zip Code validation help?

But you can transform it into number by INPUT , right ?

Esteemed Advisor
Esteemed Advisor
Posts: 7,217

Re: Zip Code validation help?

I personally would create a dataset with your conditions, though put them in a numeric it will save you space and be quicker:

Data Conditions:

From          To

1                 99

200            399

...

Now you can generate a datastep from this dataset - the below adds a flag if the value is within these ranges:

data _null_;

     set conditions end=last;

     if _n_=1 then call execute('data want; set have;');

     call execute(' if input(substr(zip_code,1,5),best.) between '||put(from,best.)||' and '||put(to,best.)||' then flag=1;');

     if last then call execute('run;');

run;

Grand Advisor
Posts: 9,578

Re: Zip Code validation help?

Yeah. proc format is a good idea.  Make a dataset to contain these range ,then proc format + cntlin= .

Alternative way is Hash Table.

Xia Keshan

Respected Advisor
Posts: 4,977

Re: Zip Code validation help?

Formats (as well as BETWEEN) can be tricky when dealing with character strings.  For example, does "002AB" fall between "00200" and "00399"?  Actually, the answer is that it varies depending on whether you are working on an ASCII-based system vs. an EBCDIC-based system.

So ... if you can validate using a pre-built SAS procedure, by all means do it that way!

Good luck.

Ask a Question
Discussion stats
  • 12 replies
  • 525 views
  • 2 likes
  • 7 in conversation