Solved: Delete observations with wrong zip code format

GKati · Posted 07-19-2017 03:53 AM

Hello,

I have a large dataset with client information. Zip codes have the following format: 1234 ZT. That is 4 digits, followed by a space and 2 letters.

I would like to extract the first four digits AND delete any observations that have the wrong format.

I have this so far:

data work.clients;

set work.clients;

PC4_=substr(POSTCODE,1,4);

PC4=.;

PC4=input(PC4_,8.);

run;

This works well, but how do I delete the observations with the wrong format?

Thanks,

Kurt_Bremser · Posted 07-19-2017 04:11 AM

data want;
set have;
if
  length(postcode) = 7 and
  notdigit(substr(postcode,1,4)) = 0 and
  substr(postcode,5,1) = ' '
;
PC4 = input(substr(postcode,1,4),4.);
run;

You might add code in the subsetting if that checks for allowed letters in substr(postcode,6,2).

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 07-19-2017 04:11 AM

data want;
set have;
if
  length(postcode) = 7 and
  notdigit(substr(postcode,1,4)) = 0 and
  substr(postcode,5,1) = ' '
;
PC4 = input(substr(postcode,1,4),4.);
run;

You might add code in the subsetting if that checks for allowed letters in substr(postcode,6,2).

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ChrisBrooks · Posted 07-19-2017 04:52 AM

A regular expression should do the trick here

data have;
	infile datalines dlm=",";
	input code $;
	datalines;
1234 ZT
5678 AB
Q123 AB
2345 4R
;
run;

data want(drop=pattern);
	set have;
	if _n_=1 then pattern=prxparse("/\d{4}\s[A-Z]{2}/");
	if not prxmatch(pattern, code) then delete;
	pc4=substr(code,1,4);
	retain pattern;
run;

Delete observations with wrong zip code format

Re: Delete observations with wrong zip code format

Re: Delete observations with wrong zip code format

Re: Delete observations with wrong zip code format

Catch up on SAS Innovate 2026

Delete observations with wrong zip code format

Re: Delete observations with wrong zip code format

Re: Delete observations with wrong zip code format

Re: Delete observations with wrong zip code format

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away