BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
TL93
Obsidian | Level 7

Hi all! Sorry, I’ve already read through a few posts on converting char to num variables, and also documents on formats/informats. However, I can’t figure out why my code isn’t working. I’m hoping someone can help me solve this issue.

 

I want to convert a character variable to a numeric variable. The original variable is tctsize, and has a length of 3 and leading zeroes. I need to keep the zeroes. This variable measures the size of something in mm. The new numeric variable will be called tsize.

 

I am getting the following error: “Format $ z3 could not be loaded/found.” The variable’s format changes with my code to $3 but remains as a character variable. I need it to be a numeric variable so I can easily categorize them into groups.

 

My dataset:

PERSON_ID          tctsize

1                            004

2                            158

3                            016

...                            

3,000,000              016

 

My code

 

data libname.name; set libname.dataset;
tsize=input(tctsize,$3.);
format tsize z3.;
run;

 

Thank you!!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You use an INFORMAT to convert text to values and a FORMAT to convert values to text.

A character INFORMAT creates a character value.  A numeric informat creates a numeric value.  A character FORMAT displays character values. A numeric format displays numeric values.

 

Your program is creating a character variable because you are using a character informat. Use a numeric informat instead.

data libname.name;
  set libname.dataset;
  tsize=input(tctsize,3.);
  format tsize z3.;
run;

Note you can use the INPUTN() and INPUTC() functions also.  In addition to only working to generate the type of value their names imply they also allow (require) you to pass the informat to use as a value instead of as code.  So you could use them where you need to dynamically decide what informat to use.

tsize=inputn(tctsize,'3.');

 

View solution in original post

14 REPLIES 14
Tom
Super User Tom
Super User

You use an INFORMAT to convert text to values and a FORMAT to convert values to text.

A character INFORMAT creates a character value.  A numeric informat creates a numeric value.  A character FORMAT displays character values. A numeric format displays numeric values.

 

Your program is creating a character variable because you are using a character informat. Use a numeric informat instead.

data libname.name;
  set libname.dataset;
  tsize=input(tctsize,3.);
  format tsize z3.;
run;

Note you can use the INPUTN() and INPUTC() functions also.  In addition to only working to generate the type of value their names imply they also allow (require) you to pass the informat to use as a value instead of as code.  So you could use them where you need to dynamically decide what informat to use.

tsize=inputn(tctsize,'3.');

 

hashman
Ammonite | Level 13

@Tom :

A nice overview - and a good note on INPUTN and INPUTC, lest folks forget they exist. PUTN and PUTC can be added to the roster as well. At times, I find the foursome quite handy at applying informats and formats dynamically, and of course they are indispensable for using with %SYSFUNC. But for all these niceties, a few caveats to bear in mind:

 

  • The "dynamic" functions work about 5-7 times slower than their fixed-in/format counterparts.
  • R=PUTN(N,F) returns R as $200 if R is not sized beforehand.
  • If R in R=PUTC(C,F) is not pre-sized, its length is determined by that of C irrespective of the content of F.

Kind regards

Paul D.  

 

  

TL93
Obsidian | Level 7

Hi Tom,

 

Thank you for your response. I was under the impression that " $3. " is telling SAS that I am inputting a variable with a character informat, and the variable has meaningful leading zeroes (e.g. 004). And that " z3. " is telling SAS to format this variable as a numeric variable and keep the leading zeroes.

 

I tried to run the code you suggested, removing the dollar sign:

data libname.name;
  set libname.dataset;
  tsize=input(tctsize,3.);
  format tsize z3.;
run;

and although the new variable tsize is now a numeric variable, all the zeroes in front of the values were dropped. So if tctsize=004 then tsize=4. Following this example, I'd want tsize to be a numeric variable that =004. Do you have any insight on why that might be the case?

Tom
Super User Tom
Super User

@TL93 wrote:

Hi Tom,

 

Thank you for your response. I was under the impression that " $3. " is telling SAS that I am inputting a variable with a character informat, and the variable has meaningful leading zeroes (e.g. 004). And that " z3. " is telling SAS to format this variable as a numeric variable and keep the leading zeroes.

 

I tried to run the code you suggested, removing the dollar sign:

data libname.name;
  set libname.dataset;
  tsize=input(tctsize,3.);
  format tsize z3.;
run;

and although the new variable tsize is now a numeric variable, all the zeroes in front of the values were dropped. So if tctsize=004 then tsize=4. Following this example, I'd want tsize to be a numeric variable that =004. Do you have any insight on why that might be the case?


A number does not have leading zeros (or alternatively it has an infinite number of leading zeros).  There is no difference between any of these ways of expressing the value : 4, 04, 004, 0004, 4.00, 4E0, ...

 

If you are not seeing the leading zeros when you print the data then the method you are using is not using the format that you attached to the variable.  Or you have viewed the output with a program like Excel that will suppress the leading zeros in the text because it thinks the value is a number.

Tom
Super User Tom
Super User

@TL93 wrote:

Hi Tom,

 

Thank you for your response. I was under the impression that " $3. " is telling SAS that I am inputting a variable with a character informat, and the variable has meaningful leading zeroes (e.g. 004). And that " z3. " is telling SAS to format this variable as a numeric variable and keep the leading zeroes.

SAS has two types of variables.  Fixed length character strings and floating point numbers. The FORMAT attached to a variable is just an attribute that tells SAS which format to use by default when displaying the value of the variable.  A FORMAT is special instructions for how to convert the value into text.  An INFORMAT is special instructions for how to convert a text string into a value. If you use $3 informat you are telling SAS to read the first three bytes and store them without conversion. (Except on an IBM mainframe where it will convert an EBCDIC string into its corresponding ASCII string.) 

TL93
Obsidian | Level 7

Hi Tom,

 

Thank you for the explanation. It was very valuable information. I'm happy to say that you're right and your suggestions ended up working. I was able to get the data I wanted by dropping the ' $ ' and converting the text string into a value. Cheers!

hashman
Ammonite | Level 13

@Tom:

"If you use $3 informat you are telling SAS to read the first three bytes and store them without conversion."

 

"Without conversion" would be the case with $CHARw. By contrast, $w. does at least some conversion: (a) trims the leading blanks and left-aligns the value, (b) converts a single period to a blank if the field has only blanks and one single period, i.e. treats the period as a missing value.   

 

"Except on an IBM mainframe where it will convert an EBCDIC string into its corresponding ASCII string."

 

It's an interesting news to me. Whenever I worked on the big iron, $w. never changed the encoding, and EBCDIC always remained EBCDIC. The $ACSIIw. informat would convert ASCII data to EBCDIC; and the only thing that would convert EBCDIC to ACSII would be the $ASCIIw. format. If nowadays the $w. informat on the mainframes behaves as you say, I wonder when the change occurred ... and what for.  

 

Kind regards

Paul D.

Tom
Super User Tom
Super User

Been 30 years since I used IBM mainframes. But when reading from text files the $ informat/formats converted between the EBCDIC characters in the files and the ASCII characters in the SAS variables. Perhaps the method was similar to how transcoding between UTF-8 and WLATIN1 is done now. If you want to actually read/write ASCII code you needed to use the $ASCII informat or the $ASCII format.

 

So on a Mainframe wrting with $EBCDIC is the same was writing with $.  And on an ASCII based system writing with $ASCII is the same writing with $.

hashman
Ammonite | Level 13

@Tom:

My mainframe exposure is quite a bit more recent.

Frankly, I'm not sure how to interpret your expression "$ informat/formats converted between the EBCDIC characters in the files and the ASCII characters in the SAS variables". This is because in my experience, any character read from a text file under z/OS into a SAS character variable using $CHARw. (identical to $EBCDICw. under this OS) keeps its EBCDIC encoding. Same with $w., except for the left justification and period conversion. The only situation I can fancy when an informat converts a text field from EBCDIC to ASCII is when a mainframe text file is ported to an ASCII system as binary and then the field is read under ASCII using the $EBCDICw. informat. 

Tom
Super User Tom
Super User

Character values in SAS datasets are stored in ASCII not EBCDIC. At least they were in SAS version 5. Use the $HEX format to check for yourself.  It makes collating characters much easier since 'B' is one larger then 'A' etc onto 'Z' in ASCII, but in EBCDIC the codes for letters are all over the place.

hashman
Ammonite | Level 13

@Tom :

I'm not sure about V5, but can aver beyond a shadow of a doubt that currently, if under z/OS (MVS, OS/390, etc.) you read a character from a text file into a SAS data set character variable using $w., it will have the same hex representation in the SAS data set as it has in the text file. If it were auto-converted to ASCII, you'd see, for example, the character "9" in the text file as "F9"x (EBCDIC) and the same in the SAS variable as "39"x (ASCII). This is not the case, or at least not what I see looking at the same value using the ISPF hex mode and the $HEX2. format in SAS; instead, both appear as "F9" (and I highly doubt that $HEXw. on the mainframe, when applied to a SAS variable, converts back from ASCII to EBCDIC - it it were the case, I'd find it extremely bizarre). 

 

Perhaps you're saying that character values are stored in SAS data sets in ASCII internally regardless of the system encoding? If so, I can neither deny nor confirm it (though if it indeed were true, I'd find it equally bizarre).  

Tom
Super User Tom
Super User

Perhaps the implementation of encoding support changed the behavior.  If you check the SAS 9.2 documentation on-line if definitely says that $EBCDIC and $CHAR are the same on main frames and $ASCII and $CHAR are the same on ASCII based hosts.

Tom
Super User Tom
Super User

Yes, the $ informat will left align the results. Essentially moving any leading spaces to the end.  Whereas the $CHAR informat does not. It will "preserve" the leading spaces. 

hashman
Ammonite | Level 13

@Tom:

Plus, if the field consists of a single period (the rest of the characters being blank), $w. will convert it to a space, while $CHARw. will not:  

data _null_ ;                  
  c1 = input (" . ", $3.) ;    
  c2 = input (" ..", $3.) ;    
  c3 = input (" . ", $char3.) ;
  put (c:) (=$hex6./) ;         
run ;                          

Result (20=blank, 2E=period):

c1=202020
c2=2E2E20
c3=202E20

Kind regards

Paul D.

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 11409 views
  • 3 likes
  • 3 in conversation