Hi all
One variable in my data set is zip code. It should be a 5 digit number, but of course, for some records it's text, for example, "000NA","BMT ","BROOK","DR000","ENUE","FLR","L2J1P","P1470","ST000","UNKNO". Also some records just have “9” or some other single digit number. So in the original program, I read it as character.
Is there a way to look through all the records and any time zip codes is not a 5 digit number, change the value to missing? Then I can treat it as a number.
Thanks
Gene
If you have 5 character alphabetic characters, can't you use the following?
if (anyalpha(ZIP)=0 and length(ZIP) = 5) then ZipNum=input(Zip,5.);
Since you have already read the data then create a new variable
Data want;
set have;
if length(ZIP) = 5 then ZipNum=input(Zip,5.);
run;
May do what you want.
One issue I would consider would be if the data source treated zips such as 00009 as numeric and exported them as fewer than 5 digits.
Well, two questions. I tried that and got
"NOTE: Invalid numeric data, ZIP='L2J1P' , at line 361 column 4." So it seems to be having problems in that Zip is not numeric to begin with.
But you also wrote "since you already read the data". I can go back to the original data, reading it in with the input code,
input
@133 zip $char5.
Are you suggesting a way to modify that?
Thanks
If you have 5 character alphabetic characters, can't you use the following?
if (anyalpha(ZIP)=0 and length(ZIP) = 5) then ZipNum=input(Zip,5.);
Yes, of course, it has characters.
But you wish to do this operation anyway right?
So you can suppress the NOTE by using the colon modifier in the input statement.
Are you absolutely sure that chars are invalid data? If yes, you should report the erroneous records to the data source to have them fix them.
Why you need to treat it like a number? It is a zip code, you are not doing any arithmetic operation on it, so keep it Char. It will make your life a lot easier when dealing zip codes start with 0s, such as those from NJ.
data test; infile cards truncover; input zip $10.; new_zip=ifc(prxmatch('/^\d{5}$/',strip(zip))>0, zip, ''); cards; asdhg 78659 0987689 89ui 26543 u . ;
Just variations to what @Haikuo posted.
data test;
infile cards truncover;
input zip $10.;
length new_zip1 new_zip2 $5;
new_zip=ifc(prxmatch('/^\d{5}$/',strip(zip))>0, zip, '');
if lengthn(zip)=5 and notdigit(strip(zip))=0 then new_zip1=zip;
if prxmatch('/^\d{5} *$/',zip)>0 then new_zip2=zip;
cards;
asdhg
78659
0987689
89ui
26543
u
.
;
Well, first, I want to delete any zip code that isn't 5 characters long. All zip codes must be 5 characters long (which can include leading 0). If it's character, that's fine, for this part, and this part looks easy to do.
Second, I want to delete any values that are not all numbers. All zip codes must be numerical, and cannot include any non numerical characters. This is the part that seems the most difficult. I don't want to just supress notes or messages about errors.
Thanks.
Gene
Then I believe what Haikuo or I posted returns what you've asked for - and it leaves Zip codes character as it should be.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.