Hi all! Sorry, I’ve already read through a few posts on converting char to num variables, and also documents on formats/informats. However, I can’t figure out why my code isn’t working. I’m hoping someone can help me solve this issue.
I want to convert a character variable to a numeric variable. The original variable is tctsize, and has a length of 3 and leading zeroes. I need to keep the zeroes. This variable measures the size of something in mm. The new numeric variable will be called tsize.
I am getting the following error: “Format $ z3 could not be loaded/found.” The variable’s format changes with my code to $3 but remains as a character variable. I need it to be a numeric variable so I can easily categorize them into groups.
My dataset:
PERSON_ID tctsize
1 004
2 158
3 016
...
3,000,000 016
My code
data libname.name; set libname.dataset;
tsize=input(tctsize,$3.);
format tsize z3.;
run;
Thank you!!
You use an INFORMAT to convert text to values and a FORMAT to convert values to text.
A character INFORMAT creates a character value. A numeric informat creates a numeric value. A character FORMAT displays character values. A numeric format displays numeric values.
Your program is creating a character variable because you are using a character informat. Use a numeric informat instead.
data libname.name;
set libname.dataset;
tsize=input(tctsize,3.);
format tsize z3.;
run;
Note you can use the INPUTN() and INPUTC() functions also. In addition to only working to generate the type of value their names imply they also allow (require) you to pass the informat to use as a value instead of as code. So you could use them where you need to dynamically decide what informat to use.
tsize=inputn(tctsize,'3.');
You use an INFORMAT to convert text to values and a FORMAT to convert values to text.
A character INFORMAT creates a character value. A numeric informat creates a numeric value. A character FORMAT displays character values. A numeric format displays numeric values.
Your program is creating a character variable because you are using a character informat. Use a numeric informat instead.
data libname.name;
set libname.dataset;
tsize=input(tctsize,3.);
format tsize z3.;
run;
Note you can use the INPUTN() and INPUTC() functions also. In addition to only working to generate the type of value their names imply they also allow (require) you to pass the informat to use as a value instead of as code. So you could use them where you need to dynamically decide what informat to use.
tsize=inputn(tctsize,'3.');
@Tom :
A nice overview - and a good note on INPUTN and INPUTC, lest folks forget they exist. PUTN and PUTC can be added to the roster as well. At times, I find the foursome quite handy at applying informats and formats dynamically, and of course they are indispensable for using with %SYSFUNC. But for all these niceties, a few caveats to bear in mind:
Kind regards
Paul D.
Hi Tom,
Thank you for your response. I was under the impression that " $3. " is telling SAS that I am inputting a variable with a character informat, and the variable has meaningful leading zeroes (e.g. 004). And that " z3. " is telling SAS to format this variable as a numeric variable and keep the leading zeroes.
I tried to run the code you suggested, removing the dollar sign:
data libname.name;
set libname.dataset;
tsize=input(tctsize,3.);
format tsize z3.;
run;
and although the new variable tsize is now a numeric variable, all the zeroes in front of the values were dropped. So if tctsize=004 then tsize=4. Following this example, I'd want tsize to be a numeric variable that =004. Do you have any insight on why that might be the case?
@TL93 wrote:
Hi Tom,
Thank you for your response. I was under the impression that " $3. " is telling SAS that I am inputting a variable with a character informat, and the variable has meaningful leading zeroes (e.g. 004). And that " z3. " is telling SAS to format this variable as a numeric variable and keep the leading zeroes.
I tried to run the code you suggested, removing the dollar sign:
data libname.name; set libname.dataset; tsize=input(tctsize,3.); format tsize z3.; run;
and although the new variable tsize is now a numeric variable, all the zeroes in front of the values were dropped. So if tctsize=004 then tsize=4. Following this example, I'd want tsize to be a numeric variable that =004. Do you have any insight on why that might be the case?
A number does not have leading zeros (or alternatively it has an infinite number of leading zeros). There is no difference between any of these ways of expressing the value : 4, 04, 004, 0004, 4.00, 4E0, ...
If you are not seeing the leading zeros when you print the data then the method you are using is not using the format that you attached to the variable. Or you have viewed the output with a program like Excel that will suppress the leading zeros in the text because it thinks the value is a number.
@TL93 wrote:
Hi Tom,
Thank you for your response. I was under the impression that " $3. " is telling SAS that I am inputting a variable with a character informat, and the variable has meaningful leading zeroes (e.g. 004). And that " z3. " is telling SAS to format this variable as a numeric variable and keep the leading zeroes.
SAS has two types of variables. Fixed length character strings and floating point numbers. The FORMAT attached to a variable is just an attribute that tells SAS which format to use by default when displaying the value of the variable. A FORMAT is special instructions for how to convert the value into text. An INFORMAT is special instructions for how to convert a text string into a value. If you use $3 informat you are telling SAS to read the first three bytes and store them without conversion. (Except on an IBM mainframe where it will convert an EBCDIC string into its corresponding ASCII string.)
Hi Tom,
Thank you for the explanation. It was very valuable information. I'm happy to say that you're right and your suggestions ended up working. I was able to get the data I wanted by dropping the ' $ ' and converting the text string into a value. Cheers!
@Tom:
"If you use $3 informat you are telling SAS to read the first three bytes and store them without conversion."
"Without conversion" would be the case with $CHARw. By contrast, $w. does at least some conversion: (a) trims the leading blanks and left-aligns the value, (b) converts a single period to a blank if the field has only blanks and one single period, i.e. treats the period as a missing value.
"Except on an IBM mainframe where it will convert an EBCDIC string into its corresponding ASCII string."
It's an interesting news to me. Whenever I worked on the big iron, $w. never changed the encoding, and EBCDIC always remained EBCDIC. The $ACSIIw. informat would convert ASCII data to EBCDIC; and the only thing that would convert EBCDIC to ACSII would be the $ASCIIw. format. If nowadays the $w. informat on the mainframes behaves as you say, I wonder when the change occurred ... and what for.
Kind regards
Paul D.
Been 30 years since I used IBM mainframes. But when reading from text files the $ informat/formats converted between the EBCDIC characters in the files and the ASCII characters in the SAS variables. Perhaps the method was similar to how transcoding between UTF-8 and WLATIN1 is done now. If you want to actually read/write ASCII code you needed to use the $ASCII informat or the $ASCII format.
So on a Mainframe wrting with $EBCDIC is the same was writing with $. And on an ASCII based system writing with $ASCII is the same writing with $.
@Tom:
My mainframe exposure is quite a bit more recent.
Frankly, I'm not sure how to interpret your expression "$ informat/formats converted between the EBCDIC characters in the files and the ASCII characters in the SAS variables". This is because in my experience, any character read from a text file under z/OS into a SAS character variable using $CHARw. (identical to $EBCDICw. under this OS) keeps its EBCDIC encoding. Same with $w., except for the left justification and period conversion. The only situation I can fancy when an informat converts a text field from EBCDIC to ASCII is when a mainframe text file is ported to an ASCII system as binary and then the field is read under ASCII using the $EBCDICw. informat.
Character values in SAS datasets are stored in ASCII not EBCDIC. At least they were in SAS version 5. Use the $HEX format to check for yourself. It makes collating characters much easier since 'B' is one larger then 'A' etc onto 'Z' in ASCII, but in EBCDIC the codes for letters are all over the place.
@Tom :
I'm not sure about V5, but can aver beyond a shadow of a doubt that currently, if under z/OS (MVS, OS/390, etc.) you read a character from a text file into a SAS data set character variable using $w., it will have the same hex representation in the SAS data set as it has in the text file. If it were auto-converted to ASCII, you'd see, for example, the character "9" in the text file as "F9"x (EBCDIC) and the same in the SAS variable as "39"x (ASCII). This is not the case, or at least not what I see looking at the same value using the ISPF hex mode and the $HEX2. format in SAS; instead, both appear as "F9" (and I highly doubt that $HEXw. on the mainframe, when applied to a SAS variable, converts back from ASCII to EBCDIC - it it were the case, I'd find it extremely bizarre).
Perhaps you're saying that character values are stored in SAS data sets in ASCII internally regardless of the system encoding? If so, I can neither deny nor confirm it (though if it indeed were true, I'd find it equally bizarre).
Perhaps the implementation of encoding support changed the behavior. If you check the SAS 9.2 documentation on-line if definitely says that $EBCDIC and $CHAR are the same on main frames and $ASCII and $CHAR are the same on ASCII based hosts.
Yes, the $ informat will left align the results. Essentially moving any leading spaces to the end. Whereas the $CHAR informat does not. It will "preserve" the leading spaces.
@Tom:
Plus, if the field consists of a single period (the rest of the characters being blank), $w. will convert it to a space, while $CHARw. will not:
data _null_ ;
c1 = input (" . ", $3.) ;
c2 = input (" ..", $3.) ;
c3 = input (" . ", $char3.) ;
put (c:) (=$hex6./) ;
run ;
Result (20=blank, 2E=period):
c1=202020
c2=2E2E20
c3=202E20
Kind regards
Paul D.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.