Hello,
I am using this piece of code to convert a character variable to numeric:
data want;
set have;
Zip_Code=input(Zip_Code, 10.);
run;
but the variable still comes out character.
Any ideas?
You can't change a variable from character to numeric. Zip_Code is initially character, no matter what you do to it, it will be character. (I suspect there are errors or warnings in the LOG as well)
You need to create a NEW numeric variable (with a different name) as follows
Zip_Code1=input(Zip_Code, 10.);
Once a variable is defined as character you cannot change that. You would have to create a new variable such as:
data want; set have; Zip_num=input(Zip_Code, 10.); run;
Now, why do you need a numeric value? Arithmetic with Zip codes is pretty meaningless.
Plus your use of a 10. informat makes me suspect that you may have some +4 Zips like 12345-6789 in your data. Those will not convert to numeric at all without some serious manipulation.
@ballardw has a good point. If you convert 99999-9999 to numeric, you'll get a missing value and in essence lose your data.
If you really, truly need numeric, you could do this:
data have;
LENGTH Zip_Code $10.;
INFILE Datalines;
INPUT Zip_Code $ &;
Datalines;
91024-4444
97805
91024-
99999-9999
91111
92222 2222
;
run;
data want;
set have;
Zip_num=input(COMPRESS(Zip_Code,,'kd'), 10.);
run;
The COMPRESS will take care of dashes or blanks.
Results:
Jim
SAS variables are of fixed type. They cannot be changed. You must create a new variable.
Try this approach:
data want(drop = Zip_Code_str); set have(rename = (Zip_Code = Zip_Code_str); Zip_Code=input(Zip_Code_str, 10.); run;
A ZIP code is a code, not a number used for calculations. Keep it as character.
@Kurt_Bremser wrote:
A ZIP code is a code, not a number used for calculations. Keep it as character.
Unless you want to save 37 bytes per row.
We don't know the defined length of the variable, but if it was 10, the saving is only 2 per observation.
Empty space in character variables can be better dealt with with the COMPRESS option.
And how do you deal with codes that have leading zeroes in a consistent, failsafe way?
My mistake on the math. But ZIP codes only need numeric length 3, and they also only need 5 characters. So you do only save 2 bytes, but for different reasons.
But why would you need to worry about leading zeros except when outputing results. They are just integers so they are just as unique without the leading zero.
@CurtisMackWSIPP wrote:
My mistake on the math. But ZIP codes only need numeric length 3, and they also only need 5 characters. So you do only save 2 bytes, but for different reasons.
But why would you need to worry about leading zeros except when outputing results. They are just integers so they are just as unique without the leading zero.
Zip+4 are not "integers". the position of the required dash character means that you cannot just remove it and treat the remainder as "integer". if you have a zip of 00004 and another of 00004-1234 and attempt to treat them as integers then 4 and 41234 are not going to go to the same state for postal services.
And what about those extended zip codes @ballardw mentioned?
No, several hundred years of SAS experience here on the communities will tell you that codes are best kept as character.
The poster didn't mention ZIP+4 Codes, and in my experience they are typically stored in a separate column. My user account does not reflect my 30 years of SAS experience. I specializing in GIS have a lot of experience with ZIP Codes. I unusually store them as characters strings, but know that they can work just fine as integers. For some situations you need them as integers. Particularly when you are working across software platforms.
And then there is the case where you don't control the data source or are matching an output requirement.
@CurtisMackWSIPP wrote:
The poster didn't mention ZIP+4 Codes, and in my experience they are typically stored in a separate column. My user account does not reflect my 30 years of SAS experience. I specializing in GIS have a lot of experience with ZIP Codes. I unusually store them as characters strings, but know that they can work just fine as integers. For some situations you need them as integers. Particularly when you are working across software platforms.
And then there is the case where you don't control the data source or are matching an output requirement.
The reason I consider Zip+4 is the specific use of an F10. informat. If the values are only basic 5 digits then why pick an F10 informat? Might be habit or looking at the values and seeing the 10 columns a Zip+4 takes.
If your data providers a clean enough to have the +4 in a separate column. I have to deal with "data" that will be in Excel and have a column that should be Zip codes in currency or date formats. At which point I am not terribly confident that going back to a general format actual has the original value.
I did ask why. If we get a response to match some other requirement then see what the actual match use has to be. I can join data of different types sometimes if all the details are present.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.