06-22-2015 10:27 PM
I have a Zip code column, and I need validate using the following:
Zip code: Zip code is considered as invalid if first 5 digits falls between these,
between '00001' and '00099'
between '00200' and '00399'
between '00545' and '00599'
between '09900' and '09999'
between '26900' and '26999'
between '34300' and '34399'
between '34800' and '34899'
between '42800' and '42999'
between '52900' and '52999'
between '56800' and '56999'
between '57800' and '57999'
between '58900' and '58999'
between '65900' and '65999'
between '69400' and '69999'
between '71500' and '71599'
The challenge is long the list of conditions is long like above, how to code this? A Proc format would suffice or some other solution?
06-23-2015 12:36 AM
Isn't there a table of ZIP codes in the SASHELP library that you could validate against instead?
Obviously this is only valid if you're in the US
06-23-2015 08:31 AM
You can also use Proc GEOCODE in SAS/GRAPH. It will validate all your ZIP Codes with the data set SASHELP.ZIPCODE.
method = zip /* Geocoding method */
data = work.zipcode /* Input address data */
out = work.geocoded; /* Output data set */
For more info, see: http://support.sas.com/rnd/papers/#2013 and the paper PROC GEOCODE: Finding Locations Outside the U.S.
Note, this will work with any country that you have Postal Codes for. We make available Australia and Great Britain, which are free.
06-23-2015 10:07 AM
Hash Table almost like proc format . assuming they are all numeric value.
Hash Table just hold all these discrete value in a array, and check it when running data step .
I proc format is good enough , if you are not familiar with Hash Table.
06-24-2015 05:46 AM
Does this help?
do zipcode=00001 to 00099;
if '00001' <= zipcode1 <= '00099' then
/*check the last 3 records in the result WANT dataset*/
06-24-2015 06:07 AM
Do you mean the data is stored as character, or actually contains characters which are non-numeric (i.e. not 0-9). If they only contain the characters 0-9 then they can be converted to numeric per my mail. If however they contain non-numeric characters, e.g. a-z (A-Z), then numeric ranges will not work.
However looking at Darrel@sas's response, that would seem to be the most logical way to proceed.
06-23-2015 08:23 AM
I personally would create a dataset with your conditions, though put them in a numeric it will save you space and be quicker:
Now you can generate a datastep from this dataset - the below adds a flag if the value is within these ranges:
set conditions end=last;
if _n_=1 then call execute('data want; set have;');
call execute(' if input(substr(zip_code,1,5),best.) between '||put(from,best.)||' and '||put(to,best.)||' then flag=1;');
if last then call execute('run;');
06-23-2015 09:11 AM
Formats (as well as BETWEEN) can be tricky when dealing with character strings. For example, does "002AB" fall between "00200" and "00399"? Actually, the answer is that it varies depending on whether you are working on an ASCII-based system vs. an EBCDIC-based system.
So ... if you can validate using a pre-built SAS procedure, by all means do it that way!