Help using Base SAS procedures

Assigning values for postal codes

Accepted Solution Solved
Reply
Super Contributor
Posts: 338
Accepted Solution

Assigning values for postal codes

Hi SAS Forum,

/*Hi SAS Forum,

I have this data set. It shows the postal codes of each account.

But the postal_code field has a mix of values. Some are correct, some are strange, some are wrong.

*/

data have;

input account_number postal_code $ 6-12;

cards;

1111 T2H 1B5

2222 T9E 6C8

3333 01000

4444 1097C7

5555 123

7777 A0A 1K0

9999 J0N 1G3

8888 11654-2011

1234 T9W 3H2

;

run;

/*

I wanted to create a new variable called "Validation" with following rules.

  • If a postal code start with T, and it has 6 chractoers, and first charactoer is a letter, second is a numeric, third is a letter, fourth is a space,

fifth is a numeric, sixth is a letter, 7th is a numeric

then validation = 'correct'

  • If a postal code starts with a letter which is not T, and all the other conditions above are satisfied, then validation = 'wrong'

  • Else validation ='strange'

*/

/*For clarity, I have indicated below the values of variable "Validation" in front of each record*/

data have;

input account_number postal_code $ 6-12 validation;

cards;

1111 T2H 1B5      /*correct*/

2222 T9E 6C8      /*correct*/

3333 01000        /*strange*/

4444 1097C7       /*strange*/

5555 123          /*strange*/

7777 A0A 1K0      /*wrong*/

9999 J0N 1G3      /*wrong*/

8888 T1654-2011   /*strange*/

1234 T9W 3H2      /*correct*/  

;

run;

Q: Could someone help me how to do this.

Thanks for your time.

Miris


Accepted Solutions
Solution
‎08-22-2013 02:17 AM
Contributor
Posts: 45

Re: Assigning values for postal codes

Some conditional logic should help here, with the use of the substring function.

I don't have SAS o my home computer so I havnt been able to check for errors.

/* define lists of numbers and letters, although I suspect SAS may have a shortcut for this. */

%let nums = ("1", "2", "3", "4", "5", "6", "7", "8", "9", "0");

%let letters = ("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z");

data want;

set have;

if

upcase(substr(postal_code,1,1)). = "T"

AND

substr(postal_code,2,1) in &nums.

AND

upcase(substr(postal_code,3,1)) in &letters.

AND

substr(postal_code,4,1) = " "

AND

substr(postal_code,5,1) in &nums.

AND

upcase(substr(postal_code,6,1)) in &letters.

AND

substr(postal_code,7,1) in &nums.

then validation = "correct";

/*if the mength of your field is more than 7 characters you may want to check if the remaining ones are blank,

in your example the field post_code has 7 characters so no check needed */

else if

upcase(substr(postal_code,1,1)) in &letters

AND

substr(postal_code,2,1) in &nums.

AND

upcase(substr(postal_code,3,1)) in &letters.

AND

substr(postal_code,4,1) = " "

AND

substr(postal_code,5,1) in &nums.

AND

upcase(substr(postal_code,6,1)) in &letters.

AND

substr(postal_code,7,1) in &nums.

then validation = "wrong  "; /*extra spaces needed to avoid truncation */

else validation="strange";

View solution in original post


All Replies
Solution
‎08-22-2013 02:17 AM
Contributor
Posts: 45

Re: Assigning values for postal codes

Some conditional logic should help here, with the use of the substring function.

I don't have SAS o my home computer so I havnt been able to check for errors.

/* define lists of numbers and letters, although I suspect SAS may have a shortcut for this. */

%let nums = ("1", "2", "3", "4", "5", "6", "7", "8", "9", "0");

%let letters = ("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z");

data want;

set have;

if

upcase(substr(postal_code,1,1)). = "T"

AND

substr(postal_code,2,1) in &nums.

AND

upcase(substr(postal_code,3,1)) in &letters.

AND

substr(postal_code,4,1) = " "

AND

substr(postal_code,5,1) in &nums.

AND

upcase(substr(postal_code,6,1)) in &letters.

AND

substr(postal_code,7,1) in &nums.

then validation = "correct";

/*if the mength of your field is more than 7 characters you may want to check if the remaining ones are blank,

in your example the field post_code has 7 characters so no check needed */

else if

upcase(substr(postal_code,1,1)) in &letters

AND

substr(postal_code,2,1) in &nums.

AND

upcase(substr(postal_code,3,1)) in &letters.

AND

substr(postal_code,4,1) = " "

AND

substr(postal_code,5,1) in &nums.

AND

upcase(substr(postal_code,6,1)) in &letters.

AND

substr(postal_code,7,1) in &nums.

then validation = "wrong  "; /*extra spaces needed to avoid truncation */

else validation="strange";

Super User
Posts: 5,429

Re: Assigning values for postal codes

Posted in reply to Murray_Court

This code could be simplified by:

  • applying the position 1 test at the end - make the need for repeating the conditions for all other positions twice
  • using anydigit() and anyalpha() functions - no need to use hard coded macro variables.
Data never sleeps
Super Contributor
Posts: 345

Re: Assigning values for postal codes

Concept:

use a regular expression catching correct and wrong postal codes - strange codes do not match the expression. Use first-function to verify that the code starts with "T".

RegEx: /^[A-Z]\d[A-Z] \d[A-Z]\d$/

Super User
Posts: 5,429

Re: Assigning values for postal codes

Postal codes a generally public information. Couldn't you try to builds a lookup table, and testing using that (i.e. SAS formats). That will give you correct at least.

Since I assume this is some kind of data quality applications, how big is this work? Many rows, how often etc? You might want to look a DQ tool that can automate a great part of this process.

Data never sleeps
Super User
Posts: 19,792

Re: Assigning values for postal codes

In Canada, postal codes are owned by Canada Post and not generally public :smileyconfused:

However, even if a postal code is in the correct format it may not be a valid postal code, for example, X are rarely used in postal codes.

Depending on who you work for, you may have access to a list of valid postal codes.

A list of Canadian postal codes, crowd sourced (quality not assured) is here:

geocoder.ca: geocoding for North America - USA and Canada

Super Contributor
Posts: 338

Re: Assigning values for postal codes

Hi Murray_Court,

Thank you very much for this code, it worked very well. And I learned a lot how to handle
this kind of coding tasks.


Hi andreas_lds
This is the first time I heard about "Regular expressions". I serached in the google
and now I have some understanding. Thanks a lot for introducing this.


Hi LinusH
I should try your short cut and compare with Murray_Court's resutls. Thanks a lot.

Hi Reeza
I learned the term "crowd sourced", and many thanks for the postal code site as well.

Regards

Mirisage

Respected Advisor
Posts: 3,156

Re: Assigning values for postal codes

data have;

input account_number postal_code $ 6-12;

length flag $ 7;

if prxmatch('/^T\d[A-Z] \d[A-Z]\d$/',postal_code) then flag='correct';

else if prxmatch('/^[^T]\d[A-Z] \d[A-Z]\d$/',postal_code) then flag='wrong';

else flag='strange';

cards;

1111 T2H 1B5

2222 T9E 6C8

3333 01000

4444 1097C7

5555 123

7777 A0A 1K0

9999 J0N 1G3

8888 11654-2011

1234 T9W 3H2

;

run;

Super Contributor
Posts: 338

Re: Assigning values for postal codes

Hi

proc compare base=have compare=want; run;

Both code produced identical resutls. Thank you so much.

First, I will read some literature to understand what is happening in your magical code and then come back to you if still I cannot understand this code logic.

Regards

Mirisage

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 369 views
  • 6 likes
  • 6 in conversation