New SAS User

marleeakerson · Posted 09-23-2020 12:26 PM

Hello,

I am using this piece of code to convert a character variable to numeric:

data want;
set have;
Zip_Code=input(Zip_Code, 10.);
run;

but the variable still comes out character.

Any ideas?

PaigeMiller · Posted 09-23-2020 12:31 PM

You can't change a variable from character to numeric. Zip_Code is initially character, no matter what you do to it, it will be character. (I suspect there are errors or warnings in the LOG as well)

You need to create a NEW numeric variable (with a different name) as follows

Zip_Code1=input(Zip_Code, 10.);

--
Paige Miller

ballardw · Posted 09-23-2020 12:33 PM

Once a variable is defined as character you cannot change that. You would have to create a new variable such as:

data want;
   set have;
   Zip_num=input(Zip_Code, 10.);
run;

Now, why do you need a numeric value? Arithmetic with Zip codes is pretty meaningless.

Plus your use of a 10. informat makes me suspect that you may have some +4 Zips like 12345-6789 in your data. Those will not convert to numeric at all without some serious manipulation.

jimbarbour · Posted 09-23-2020 01:06 PM

@ballardw has a good point. If you convert 99999-9999 to numeric, you'll get a missing value and in essence lose your data.

If you really, truly need numeric, you could do this:

data	have;
	LENGTH	Zip_Code	$10.;
	INFILE	Datalines;
	INPUT	Zip_Code	$	&;
Datalines;
91024-4444
97805
91024-
99999-9999
91111
92222 2222
;
run;

data want;
   set have;
   Zip_num=input(COMPRESS(Zip_Code,,'kd'), 10.);
run;

The COMPRESS will take care of dashes or blanks.

Results:

Jim

CurtisMackWSIPP · Posted 09-23-2020 12:34 PM

SAS variables are of fixed type. They cannot be changed. You must create a new variable.

Try this approach:

data want(drop = Zip_Code_str);
set have(rename = (Zip_Code = Zip_Code_str);
Zip_Code=input(Zip_Code_str, 10.);
run;

Kurt_Bremser · Posted 09-23-2020 02:30 PM

A ZIP code is a code, not a number used for calculations. Keep it as character.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

CurtisMackWSIPP · Posted 09-23-2020 03:50 PM

@Kurt_Bremser wrote:

A ZIP code is a code, not a number used for calculations. Keep it as character.

Unless you want to save 37 bytes per row.

Kurt_Bremser · Posted 09-23-2020 04:03 PM

We don't know the defined length of the variable, but if it was 10, the saving is only 2 per observation.

Empty space in character variables can be better dealt with with the COMPRESS option.

And how do you deal with codes that have leading zeroes in a consistent, failsafe way?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

CurtisMackWSIPP · Posted 09-23-2020 04:53 PM

My mistake on the math. But ZIP codes only need numeric length 3, and they also only need 5 characters. So you do only save 2 bytes, but for different reasons.

But why would you need to worry about leading zeros except when outputing results. They are just integers so they are just as unique without the leading zero.

ballardw · Posted 09-23-2020 06:05 PM

@CurtisMackWSIPP wrote:

My mistake on the math. But ZIP codes only need numeric length 3, and they also only need 5 characters. So you do only save 2 bytes, but for different reasons.

But why would you need to worry about leading zeros except when outputing results. They are just integers so they are just as unique without the leading zero.

Zip+4 are not "integers". the position of the required dash character means that you cannot just remove it and treat the remainder as "integer". if you have a zip of 00004 and another of 00004-1234 and attempt to treat them as integers then 4 and 41234 are not going to go to the same state for postal services.

Kurt_Bremser · Posted 09-23-2020 06:07 PM

And what about those extended zip codes @ballardw mentioned?

No, several hundred years of SAS experience here on the communities will tell you that codes are best kept as character.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

CurtisMackWSIPP · Posted 09-23-2020 06:19 PM

The poster didn't mention ZIP+4 Codes, and in my experience they are typically stored in a separate column. My user account does not reflect my 30 years of SAS experience. I specializing in GIS have a lot of experience with ZIP Codes. I unusually store them as characters strings, but know that they can work just fine as integers. For some situations you need them as integers. Particularly when you are working across software platforms.

And then there is the case where you don't control the data source or are matching an output requirement.

ballardw · Posted 09-24-2020 12:00 AM

@CurtisMackWSIPP wrote:

The poster didn't mention ZIP+4 Codes, and in my experience they are typically stored in a separate column. My user account does not reflect my 30 years of SAS experience. I specializing in GIS have a lot of experience with ZIP Codes. I unusually store them as characters strings, but know that they can work just fine as integers. For some situations you need them as integers. Particularly when you are working across software platforms.

And then there is the case where you don't control the data source or are matching an output requirement.

The reason I consider Zip+4 is the specific use of an F10. informat. If the values are only basic 5 digits then why pick an F10 informat? Might be habit or looking at the values and seeing the 10 columns a Zip+4 takes.

If your data providers a clean enough to have the +4 in a separate column. I have to deal with "data" that will be in Excel and have a column that should be Zip codes in currency or date formats. At which point I am not terribly confident that going back to a general format actual has the original value.

I did ask why. If we get a response to match some other requirement then see what the actual match use has to be. I can join data of different types sometimes if all the details are present.

New SAS User

Character to Numeric

Re: Character to Numeric

Re: Character to Numeric

Re: Character to Numeric

Re: Character to Numeric

Re: Character to Numeric

Re: Character to Numeric

Re: Character to Numeric

Re: Character to Numeric

Re: Character to Numeric

Re: Character to Numeric

Re: Character to Numeric

Re: Character to Numeric

Follow Us

What is...

New SAS User

Our biggest data and AI event of the year.

Follow Us

What is...