Help using Base SAS procedures

Find A Postal Code In My Data

Reply
Super Contributor
Posts: 358

Find A Postal Code In My Data

Hi All:

I need to be able to generate a valid mailing address from our customer master file.  Naturally, the data is a mess with no 2 records being formatted the same (since they are manually entered).

What I would like to do is parse the address fields and extract out the postal codes.  It looks like this one field seems to have been entered correctly, that is, it appears to have the correct format.

Is there a way I can parse the data looking for a postal code?    I don't have a copy of the PCCF so I can;t SQL it unfortunately.

Thanks in advance and don't forget to tip your waitress.

Frequent Contributor
Frequent Contributor
Posts: 94

Re: Find A Postal Code In My Data

It might help to have a sample of what your data looks like (with real people's information obscured of course!).

Generally though, post codes in your country will likely follow a specific pattern.  As such locating them is probably done best using Regular Experssions.  The SAS online help example 3 for the function PRXMATCH (http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002296115.htm) attempts to do just that.

I imagine the SAS example is for US zip codes, so if you're not in the US then you'll need to find the pattern suitable for your own country.

Wikipedia provides some examples for UK post codes, although I'd imagine the regex strings would need modified slightly to work with SAS.  Postcodes in the United Kingdom - Wikipedia, the free encyclopedia

Hope that helps!

DF.

Super Contributor
Posts: 358

Re: Find A Postal Code In My Data

Thanks DF - the link was helpful.

Way up here in Canadakistan, our postal codes are formatted  as A#A #A# .  It's a shame SAS won't handle our postal codes the same as it does for ZIP codes.

I would need to do the pattern matching on at least 2 (maybe 3) address fields in order to insure that I got the postal code - as I said the data is a mess and the postal could have been entered just about anywhere.   Do you know how to code the PRXMATCH to find this pattern? 

Anyone?   Bueller? 

Frequent Contributor
Frequent Contributor
Posts: 94

Re: Find A Postal Code In My Data

No worries.  I'm no expert on this - I largely put it together from the SAS examples (and the regex from US and Canada zip code validation RegEx), but:

data ZipCodes;

    infile datalines dsd dlm=',';

   input name: $16. zip:$40.;

   datalines;

Johnathan,A2A 2B2 a39js 93jaaj39

Seth,AAA B2B 3C3apples

Kim,39204B2B3C3B2B3C3

Samuel,93849-3843

;

data ZipCodes2;

set ZipCodes;

regex = prxparse('/[ABCEGHJKLMNPRSTVXY]{1}\d{1}[A-Z]{1} *\d{1}[A-Z]{1}\d{1}/');

if prxmatch(regex, zip) then do;

    postcode = prxposn(regex,0,zip);

end;

drop regex;

run;

For the examples given, this will pick up the first text string that matches the regex string.  This is pretty much as you described, although it excludes some characters in the first block (i.e. DFIOQUZ) - if this isn't correct you could just add them back in, or change it to A-Z.

If your data really has junk in it, then you'll need to be mindful of any false positives too.

Super Contributor
Posts: 358

Re: Find A Postal Code In My Data

Thanks again DF.

I wound up doing something like this after  I  "RTFM" .....

a  = prxmatch('/\w\d\w\s\d\w\d/',name_address_line_4); 

b  = prxmatch('/\w\d\w\s\d\w\d/',name_address_line_5); 

if (a+b)>0 then output;  else delete;

and there was about a 99% success rate (the others were not postal codes that matched the pattern).

Ask a Question
Discussion stats
  • 4 replies
  • 1457 views
  • 3 likes
  • 2 in conversation