BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mirisage
Obsidian | Level 7

Hi SAS Forum,

/*Hi SAS Forum,

I have this data set. It shows the postal codes of each account.

But the postal_code field has a mix of values. Some are correct, some are strange, some are wrong.

*/

data have;

input account_number postal_code $ 6-12;

cards;

1111 T2H 1B5

2222 T9E 6C8

3333 01000

4444 1097C7

5555 123

7777 A0A 1K0

9999 J0N 1G3

8888 11654-2011

1234 T9W 3H2

;

run;

/*

I wanted to create a new variable called "Validation" with following rules.

  • If a postal code start with T, and it has 6 chractoers, and first charactoer is a letter, second is a numeric, third is a letter, fourth is a space,

fifth is a numeric, sixth is a letter, 7th is a numeric

then validation = 'correct'

  • If a postal code starts with a letter which is not T, and all the other conditions above are satisfied, then validation = 'wrong'

  • Else validation ='strange'

*/

/*For clarity, I have indicated below the values of variable "Validation" in front of each record*/

data have;

input account_number postal_code $ 6-12 validation;

cards;

1111 T2H 1B5      /*correct*/

2222 T9E 6C8      /*correct*/

3333 01000        /*strange*/

4444 1097C7       /*strange*/

5555 123          /*strange*/

7777 A0A 1K0      /*wrong*/

9999 J0N 1G3      /*wrong*/

8888 T1654-2011   /*strange*/

1234 T9W 3H2      /*correct*/  

;

run;

Q: Could someone help me how to do this.

Thanks for your time.

Miris

1 ACCEPTED SOLUTION

Accepted Solutions
Murray_Court
Quartz | Level 8

Some conditional logic should help here, with the use of the substring function.

I don't have SAS o my home computer so I havnt been able to check for errors.

/* define lists of numbers and letters, although I suspect SAS may have a shortcut for this. */

%let nums = ("1", "2", "3", "4", "5", "6", "7", "8", "9", "0");

%let letters = ("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z");

data want;

set have;

if

upcase(substr(postal_code,1,1)). = "T"

AND

substr(postal_code,2,1) in &nums.

AND

upcase(substr(postal_code,3,1)) in &letters.

AND

substr(postal_code,4,1) = " "

AND

substr(postal_code,5,1) in &nums.

AND

upcase(substr(postal_code,6,1)) in &letters.

AND

substr(postal_code,7,1) in &nums.

then validation = "correct";

/*if the mength of your field is more than 7 characters you may want to check if the remaining ones are blank,

in your example the field post_code has 7 characters so no check needed */

else if

upcase(substr(postal_code,1,1)) in &letters

AND

substr(postal_code,2,1) in &nums.

AND

upcase(substr(postal_code,3,1)) in &letters.

AND

substr(postal_code,4,1) = " "

AND

substr(postal_code,5,1) in &nums.

AND

upcase(substr(postal_code,6,1)) in &letters.

AND

substr(postal_code,7,1) in &nums.

then validation = "wrong  "; /*extra spaces needed to avoid truncation */

else validation="strange";

View solution in original post

8 REPLIES 8
Murray_Court
Quartz | Level 8

Some conditional logic should help here, with the use of the substring function.

I don't have SAS o my home computer so I havnt been able to check for errors.

/* define lists of numbers and letters, although I suspect SAS may have a shortcut for this. */

%let nums = ("1", "2", "3", "4", "5", "6", "7", "8", "9", "0");

%let letters = ("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z");

data want;

set have;

if

upcase(substr(postal_code,1,1)). = "T"

AND

substr(postal_code,2,1) in &nums.

AND

upcase(substr(postal_code,3,1)) in &letters.

AND

substr(postal_code,4,1) = " "

AND

substr(postal_code,5,1) in &nums.

AND

upcase(substr(postal_code,6,1)) in &letters.

AND

substr(postal_code,7,1) in &nums.

then validation = "correct";

/*if the mength of your field is more than 7 characters you may want to check if the remaining ones are blank,

in your example the field post_code has 7 characters so no check needed */

else if

upcase(substr(postal_code,1,1)) in &letters

AND

substr(postal_code,2,1) in &nums.

AND

upcase(substr(postal_code,3,1)) in &letters.

AND

substr(postal_code,4,1) = " "

AND

substr(postal_code,5,1) in &nums.

AND

upcase(substr(postal_code,6,1)) in &letters.

AND

substr(postal_code,7,1) in &nums.

then validation = "wrong  "; /*extra spaces needed to avoid truncation */

else validation="strange";

LinusH
Tourmaline | Level 20

This code could be simplified by:

  • applying the position 1 test at the end - make the need for repeating the conditions for all other positions twice
  • using anydigit() and anyalpha() functions - no need to use hard coded macro variables.
Data never sleeps
andreas_lds
Jade | Level 19

Concept:

use a regular expression catching correct and wrong postal codes - strange codes do not match the expression. Use first-function to verify that the code starts with "T".

RegEx: /^[A-Z]\d[A-Z] \d[A-Z]\d$/

LinusH
Tourmaline | Level 20

Postal codes a generally public information. Couldn't you try to builds a lookup table, and testing using that (i.e. SAS formats). That will give you correct at least.

Since I assume this is some kind of data quality applications, how big is this work? Many rows, how often etc? You might want to look a DQ tool that can automate a great part of this process.

Data never sleeps
Reeza
Super User

In Canada, postal codes are owned by Canada Post and not generally public :smileyconfused:

However, even if a postal code is in the correct format it may not be a valid postal code, for example, X are rarely used in postal codes.

Depending on who you work for, you may have access to a list of valid postal codes.

A list of Canadian postal codes, crowd sourced (quality not assured) is here:

geocoder.ca: geocoding for North America - USA and Canada

Mirisage
Obsidian | Level 7

Hi Murray_Court,

Thank you very much for this code, it worked very well. And I learned a lot how to handle
this kind of coding tasks.


Hi andreas_lds
This is the first time I heard about "Regular expressions". I serached in the google
and now I have some understanding. Thanks a lot for introducing this.


Hi LinusH
I should try your short cut and compare with Murray_Court's resutls. Thanks a lot.

Hi Reeza
I learned the term "crowd sourced", and many thanks for the postal code site as well.

Regards

Mirisage

Haikuo
Onyx | Level 15

data have;

input account_number postal_code $ 6-12;

length flag $ 7;

if prxmatch('/^T\d[A-Z] \d[A-Z]\d$/',postal_code) then flag='correct';

else if prxmatch('/^[^T]\d[A-Z] \d[A-Z]\d$/',postal_code) then flag='wrong';

else flag='strange';

cards;

1111 T2H 1B5

2222 T9E 6C8

3333 01000

4444 1097C7

5555 123

7777 A0A 1K0

9999 J0N 1G3

8888 11654-2011

1234 T9W 3H2

;

run;

Mirisage
Obsidian | Level 7

Hi

proc compare base=have compare=want; run;

Both code produced identical resutls. Thank you so much.

First, I will read some literature to understand what is happening in your magical code and then come back to you if still I cannot understand this code logic.

Regards

Mirisage

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 3017 views
  • 6 likes
  • 6 in conversation