Help using Base SAS procedures

Delete observations with wrong zip code format

Accepted Solution Solved
Reply
Contributor
Posts: 47
Accepted Solution

Delete observations with wrong zip code format

[ Edited ]

Hello, 

 

I have a large dataset with client information. Zip codes have the following format: 1234 ZT. That is 4 digits, followed by a space and 2 letters. 

 

I would like to extract the first four digits AND delete any observations that have the wrong format. 

 

I have this so far:

 

data work.clients;

set work.clients;

PC4_=substr(POSTCODE,1,4);

PC4=.;

PC4=input(PC4_,8.);

run;

 

This works well, but how do I delete the observations with the wrong format?

 

Thanks,  


Accepted Solutions
Solution
‎07-19-2017 05:17 AM
Super User
Posts: 6,963

Re: Delete observations with wrong zip code format

data want;
set have;
if
  length(postcode) = 7 and
  notdigit(substr(postcode,1,4)) = 0 and
  substr(postcode,5,1) = ' '
;
PC4 = input(substr(postcode,1,4),4.);
run;

You might add code in the subsetting if that checks for allowed letters in substr(postcode,6,2).

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers

View solution in original post


All Replies
Solution
‎07-19-2017 05:17 AM
Super User
Posts: 6,963

Re: Delete observations with wrong zip code format

data want;
set have;
if
  length(postcode) = 7 and
  notdigit(substr(postcode,1,4)) = 0 and
  substr(postcode,5,1) = ' '
;
PC4 = input(substr(postcode,1,4),4.);
run;

You might add code in the subsetting if that checks for allowed letters in substr(postcode,6,2).

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Regular Contributor
Posts: 187

Re: Delete observations with wrong zip code format

A regular expression should do the trick here

 

data have;
	infile datalines dlm=",";
	input code $;
	datalines;
1234 ZT
5678 AB
Q123 AB
2345 4R
;
run;

data want(drop=pattern);
	set have;
	if _n_=1 then pattern=prxparse("/\d{4}\s[A-Z]{2}/");
	if not prxmatch(pattern, code) then delete;
	pc4=substr(code,1,4);
	retain pattern;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 108 views
  • 1 like
  • 3 in conversation