A lot of people know this, but I thinks it's worth putting in a message:
Have you ever seen this table presenting information on the range and precision of numeric variables for varying storage lengths? This table has been in the SAS documentation for decades.
Significant Digits and Largest Integer by Length for SAS Variables under Windows Length in Bytes Largest Integer Represented Exactly Exponential Notation Significant Digits Retained 3 8,192 213 3 4 2,097,152 221 6 5 536,870,912 229 8 6 137,438,953,472 237 11 7 35,184,372,088,832 245 13 8 9,007,199,254,740,992 253 15Some variation in operating systems but the negative values require more storage than the simple integers. (Struck out as per @FreelanceReinh 's comment.
The column "Largest Integer Represented Exactly" is a bit misleading. The values displayed are actually the "Largest Consecutive Integer Represented Exactly". For instance, for a 3-byte variable you can precisely store many integers over the posted value of 8,192. They are:
So the number of integers above 8,192 approaches 8,192.
The same principle holds for all the other storage lengths.
However all the progressions of large integer values above stop at 1.797693E308, the largest double-precision (8 bytes) number representable on the (windows) system. That number is produced by the CONTSTANT('BIG') function.
Of course, you can't print these large integers if they require more the 32 decimal digits.
2nd edit: In the other direction, for length 3 values:
Edited Additional Note. Also remember that the numeric integer limits above apply only after the variable has been stored in a SAS data step - that's when the double-precision (8 byte) value generated and used in the DATA step is actually truncated to the user-specified length. For example if you run this program, notice how 8,193 gets truncated to 8,192 only after being stored in, and later retrieved from a data set:
data t;
length x3 3 x8 8;
do x3=8190 to 8194;
x8=x3;
put (x:) (=comma5.0);
output;
end;
run;
data _null_;
set t;
put (x:) (=comma5.0);
run;
which produces this log:
9 data t;
10 length x3 3 x8 8;
11 do x3=8190 to 8194;
12 x8=x3;
13 put (x:) (=comma5.0);
14 output;
15 end;
16 run;
x3=8,190 x8=8,190
x3=8,191 x8=8,191
x3=8,192 x8=8,192
x3=8,193 x8=8,193
x3=8,194 x8=8,194
17 data _null_;
18 set t;
19 put (x:) (=comma5.0);
20 run;
x3=8,190 x8=8,190
x3=8,191 x8=8,191
x3=8,192 x8=8,192
x3=8,192 x8=8,193
x3=8,194 x8=8,194
Notice X8 keeps the original values, but odd values of X3 (length 3) above 8,192 are truncated to the next lower even number.
@mkeintz: Yes, I agree that the wording ("largest integer") which is often used in this context can be misleading. (However, the statement that "negative values require more storage ..." contradicts the fact that the "sign bit" is always included in the internal representation. I guess you copied this sentence from this recent post, not from official SAS documentation.)
Just to add the explanation (for readers who are curious) why numeric variables of any length can store certain larger integers than those supposedly "largest" ones without losing precision, in particular powers of 2 (up to 2**1023=8.988...E307 -- displaying such values is a different issue): The internal floating-point representation of numeric values uses separate sets of bits for the exponent (containing a number's "order of magnitude") and the mantissa (relevant for the "precision"). For both parts of the number it uses the binary system.
This means: If a number can be stored exactly, such as an integer with an absolute value below the upper bounds 8192 etc. mentioned in @mkeintz's post, then one can multiply this number by any power of 2, i.e. 2**k with a positive or negative integer k, and the result, be it an integer or not, can still be stored exactly -- except in the extreme cases where its absolute value exceeds CONSTANT('BIG') or falls below CONSTANT('SMALL'). This is because that multiplication will only change the exponent, but not the mantissa. It's analogous to scientific notation in the decimal system, e.g. 6.022E23, where a multiplication by any power of the base 10 affects only the exponent (23), but leaves the mantissa digits (6022) unchanged and in particular doesn't require more of them.
For example, a numeric variable of length 3 (bytes) on a Windows or Unix system will happily store exact numbers like 4321*2**44=76015835897921536 or 123*2**-17=0.00093841552734375, but will miserably lose precision when being assigned values such as 8765 or 0.1 (!), for which 12 mantissa bits (3*8 bits due to length 3, minus 1 bit for the sign minus 11 bits for the exponent) are insufficient:
data test; length a1-a4 3; /* mathematical binary representation, mantissa bits blue: */ a1=4321*2**44; /* 100001110000100000000000000000000000000000000000000000000 */ a2=123*2**-17; /* 0.00000000001111011000000 */ a3=8765; /* 10001000111101 */ a4=0.1; /* 0.000110011001100110011001... (infinitely repeating "0011") */ run; proc print data=test; format a1 best32. a2 e20.; run;
Results (under Windows):
a1 a2 a3 a4 76015835897921536 9.3841552734375E-04 8764 0.099991
More details can be found in Numerical Accuracy in SAS Software.
[Edit: Corrected minor typo in the text and improved wording in one place.]
Edit 2: Just noticed: I think your calculations underestimate the number of integers with an exact representation in a 3-byte variable: Each of the ranges [8192, 16384), [16384, 32768), ..., [2**1023, 2**1024) -- these are 1023-13+1=1011 ranges -- contributes 2**12=4096 integers. (In fact, there are 4096, not 2048, integers divisible by 4 in the range (16384, 32768] or [16384, 32768) for that matter, again 4096 divisible by 8 in [32768, 65536) and so on. Both the length of the range and the divisor are doubled in each step.) These correspond to the 2**12 different combinations that can be formed with the 12 mantissa bits and each range corresponds to one combination of the exponent bits. The pertinent values of the exponent (which is shifted by 1023 ["bias"]) are 1036, 1037, ..., 2046. So, together with the 8191 integers 1, 2, 3, ..., 8191 (whose exponents range from 1023 to 1035 while only 1+2+4+8+...+4096=8191 mantissas yield integers) I end up with n=1011*4096+8191=4,149,247 positive integers and hence a total of 2*n+1=8,298,495 integers including zero and negative integers.
Mark,
thanks a 1e6 for this very useful analysis! I will definitely refer to your post during my classes.
All the best
Bart
@mkeintz: Yes, I agree that the wording ("largest integer") which is often used in this context can be misleading. (However, the statement that "negative values require more storage ..." contradicts the fact that the "sign bit" is always included in the internal representation. I guess you copied this sentence from this recent post, not from official SAS documentation.)
Just to add the explanation (for readers who are curious) why numeric variables of any length can store certain larger integers than those supposedly "largest" ones without losing precision, in particular powers of 2 (up to 2**1023=8.988...E307 -- displaying such values is a different issue): The internal floating-point representation of numeric values uses separate sets of bits for the exponent (containing a number's "order of magnitude") and the mantissa (relevant for the "precision"). For both parts of the number it uses the binary system.
This means: If a number can be stored exactly, such as an integer with an absolute value below the upper bounds 8192 etc. mentioned in @mkeintz's post, then one can multiply this number by any power of 2, i.e. 2**k with a positive or negative integer k, and the result, be it an integer or not, can still be stored exactly -- except in the extreme cases where its absolute value exceeds CONSTANT('BIG') or falls below CONSTANT('SMALL'). This is because that multiplication will only change the exponent, but not the mantissa. It's analogous to scientific notation in the decimal system, e.g. 6.022E23, where a multiplication by any power of the base 10 affects only the exponent (23), but leaves the mantissa digits (6022) unchanged and in particular doesn't require more of them.
For example, a numeric variable of length 3 (bytes) on a Windows or Unix system will happily store exact numbers like 4321*2**44=76015835897921536 or 123*2**-17=0.00093841552734375, but will miserably lose precision when being assigned values such as 8765 or 0.1 (!), for which 12 mantissa bits (3*8 bits due to length 3, minus 1 bit for the sign minus 11 bits for the exponent) are insufficient:
data test; length a1-a4 3; /* mathematical binary representation, mantissa bits blue: */ a1=4321*2**44; /* 100001110000100000000000000000000000000000000000000000000 */ a2=123*2**-17; /* 0.00000000001111011000000 */ a3=8765; /* 10001000111101 */ a4=0.1; /* 0.000110011001100110011001... (infinitely repeating "0011") */ run; proc print data=test; format a1 best32. a2 e20.; run;
Results (under Windows):
a1 a2 a3 a4 76015835897921536 9.3841552734375E-04 8764 0.099991
More details can be found in Numerical Accuracy in SAS Software.
[Edit: Corrected minor typo in the text and improved wording in one place.]
Edit 2: Just noticed: I think your calculations underestimate the number of integers with an exact representation in a 3-byte variable: Each of the ranges [8192, 16384), [16384, 32768), ..., [2**1023, 2**1024) -- these are 1023-13+1=1011 ranges -- contributes 2**12=4096 integers. (In fact, there are 4096, not 2048, integers divisible by 4 in the range (16384, 32768] or [16384, 32768) for that matter, again 4096 divisible by 8 in [32768, 65536) and so on. Both the length of the range and the divisor are doubled in each step.) These correspond to the 2**12 different combinations that can be formed with the 12 mantissa bits and each range corresponds to one combination of the exponent bits. The pertinent values of the exponent (which is shifted by 1023 ["bias"]) are 1036, 1037, ..., 2046. So, together with the 8191 integers 1, 2, 3, ..., 8191 (whose exponents range from 1023 to 1035 while only 1+2+4+8+...+4096=8191 mantissas yield integers) I end up with n=1011*4096+8191=4,149,247 positive integers and hence a total of 2*n+1=8,298,495 integers including zero and negative integers.
@FreelanceReinh Thanks for providing a deeper dive into numeric precision in SAS. I was hoping for someone to discuss the WHY that explains my description of the WHAT.
You're welcome. Numeric representation issues have been one of my "favorite" topics in SAS for many years. Yet I still "discover" new surprising oddities about them from time to time.
Sorry, I hadn't checked earlier your calculations regarding the number of exactly representable integers in a 3-byte variable. I have added my own calculations (resulting in larger values) in "Edit 2" of my post.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.