## Delete observations with wrong zip code format

Solved
Frequent Contributor
Posts: 107

# Delete observations with wrong zip code format

[ Edited ]

Hello,

I have a large dataset with client information. Zip codes have the following format: 1234 ZT. That is 4 digits, followed by a space and 2 letters.

I would like to extract the first four digits AND delete any observations that have the wrong format.

I have this so far:

data work.clients;

set work.clients;

PC4_=substr(POSTCODE,1,4);

PC4=.;

PC4=input(PC4_,8.);

run;

This works well, but how do I delete the observations with the wrong format?

Thanks,

Accepted Solutions
Solution
‎07-19-2017 05:17 AM
Super User
Posts: 10,255

## Re: Delete observations with wrong zip code format

``````data want;
set have;
if
length(postcode) = 7 and
notdigit(substr(postcode,1,4)) = 0 and
substr(postcode,5,1) = ' '
;
PC4 = input(substr(postcode,1,4),4.);
run;``````

You might add code in the subsetting if that checks for allowed letters in substr(postcode,6,2).

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code

All Replies
Solution
‎07-19-2017 05:17 AM
Super User
Posts: 10,255

## Re: Delete observations with wrong zip code format

``````data want;
set have;
if
length(postcode) = 7 and
notdigit(substr(postcode,1,4)) = 0 and
substr(postcode,5,1) = ' '
;
PC4 = input(substr(postcode,1,4),4.);
run;``````

You might add code in the subsetting if that checks for allowed letters in substr(postcode,6,2).

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Valued Guide
Posts: 593

## Re: Delete observations with wrong zip code format

A regular expression should do the trick here

``````data have;
infile datalines dlm=",";
input code \$;
datalines;
1234 ZT
5678 AB
Q123 AB
2345 4R
;
run;

data want(drop=pattern);
set have;
if _n_=1 then pattern=prxparse("/\d{4}\s[A-Z]{2}/");
if not prxmatch(pattern, code) then delete;
pc4=substr(code,1,4);
retain pattern;
run;``````
☑ This topic is solved.