data testint;
x=1117.11;
y=x*100;
z=int(x*100);
run;
I would expect y and z to be the same, but they are not. I get 111711 and 111710 respectively.
Is INT doing something wrong, or am I doing something wrong?
If I change x from 1117.11 to 1117.01, it works as I would expect it to work - y and z are both 111701.
Thanks
Welcome to the world of BASE 2 numbers.
Some decimal fractions cannot be represented exactly as base 2 fractions.
11/100ths cannot be represented exactly in BASE 2.
Because of the impossibility of storing the 11/100ths exactly in a finite number of binary digits 1,117.11 times 100 is not exactly equal 111,711 .
1 data _null_; 2 x=1117.11; 3 y=x*100; 4 diff = 111711-y; 5 put x= / y= / diff=best32. ; 6 run; x=1117.11 y=111711 diff=1.4551915228366E-11
Welcome to the world of BASE 2 numbers.
Some decimal fractions cannot be represented exactly as base 2 fractions.
11/100ths cannot be represented exactly in BASE 2.
Because of the impossibility of storing the 11/100ths exactly in a finite number of binary digits 1,117.11 times 100 is not exactly equal 111,711 .
1 data _null_; 2 x=1117.11; 3 y=x*100; 4 diff = 111711-y; 5 put x= / y= / diff=best32. ; 6 run; x=1117.11 y=111711 diff=1.4551915228366E-11
Ok, but INT fuzzes the results to take that into account. INTZ does not, so I wouldn't be entirely surprised to see INTZ return the occasional weird answer. But not INT.
Thanks
@cklager44 wrote:
Ok, but INT fuzzes the results to take that into account. INTZ does not, so I wouldn't be entirely surprised to see INTZ return the occasional weird answer. But not INT.
Thanks
What FUZZ does it use? Best to add your own ROUND() to whatever level of precision you need.
1 data _null_; 2 x0=1117.11; 3 y0=111711; 4 z1=x0*100; 5 z2=int(x0*100); 6 z3=int(round(x0*100,1e-10)); 7 d1 = y0-z1; 8 d2 = y0-z2; 9 d3 = y0-z3; 10 put (_all_) (/=best32.); 11 put (_all_) (/=hex16.); 12 run; x0=1117.11 y0=111711 z1=111711 z2=111710 z3=111711 d1=1.4551915228366E-11 d2=1 d3=0 x0=40917470A3D70A3D y0=40FB45F000000000 z1=40FB45EFFFFFFFFF z2=40FB45E000000000 z3=40FB45F000000000 d1=3DB0000000000000 d2=3FF0000000000000 d3=0000000000000000
The difference appears be about one bit.
y0=40FB45F000000000
z1=40FB45EFFFFFFFFF
Using formats, we can see that 1117.11*100 is an irrational number in binary:
data test;
x=1117.11;
y=x*100;
z1=int(x*100);
z2=int(y);
run;
proc sql;
select x format=binary64.
,y format=binary64.
,z1 format=binary64.
,z2 format=binary64.
from test
;
quit;
Result:
Variable | Value | Result |
---|---|---|
y | 111711 | 0100000011111011010001011110111111111111111111111111111111111111 |
z1 | 111710 | 0100000011111011010001011110000000000000000000000000000000000000 |
z2 | 111710 | 0100000011111011010001011110000000000000000000000000000000000000 |
In all cases the ROUND function truncates the repeating 1 giving consistent, repeatable results. Check out the SAS Docs section titled "Troubleshooting Errors in Precision", under the main topic Numerical Accuracy in SAS Software.
@SASJedi wrote:
Using formats, we can see that 1117.11*100 is an irrational number in binary:
(...)
Variable Value Result y 111711 0100000011111011010001011110111111111111111111111111111111111111
Hi @SASJedi,
I like using the BINARY64. format in such situations. But I wouldn't call that number "irrational" because in mathematics this term has a special meaning (i.e., not being the ratio of two integers; cf. Wikipedia article Irrational number) which doesn't apply here. It's not even a repeating fraction (like 1/3=0.33333... in the decimal system), although the long string of 1s resembles 1/9=0.11111... in the decimal system. Since 64-bit floating-point numbers have only a finite number of binary digits (not more than 64 obviously) they are always rational.
The long string of 1s above is just the result of a rounding error: The decimal number 1117.11 written (mathematically) in the binary system is a (rational) repeating fraction:
10001011101.00011100001010001111010111000010100011110101110000101000111101...
The 20-digit-pattern highlighted in blue is actually repeated forever (only three blocks of these are shown above). However, in the BINARY64. representation of that number (shown below) the infinite sequence is rounded to fit into the 64 bits, more precisely: into the 52 mantissa bits plus the implied bit (highlighted in green above).
0100000010010001011101000111000010100011110101110000101000111101
Due to the space restriction in the mantissa, the 20-digit-pattern is repeated only once, the next block "011100..." is rounded off.
Multiplying this number by the decimal number 100 (binary: 1100100) amounts to summing three shifted copies of it (because of the three 1s in "1100100").
Written mathematically as multiples of 2**10=1024:
1000101.1101000111000010100011110101110000101000111101 + 100010.11101000111000010100011110101110000101000111101 + 100.01011101000111000010100011110101110000101000111101 ----------------------------------------------------------- 1101101.00010111101111111111111111111111111111111111110101
Now we see how the round-off error mentioned above affects the summation and leads to a result that is slightly too small. Moreover, it contains 57 binary digits, so it must be rounded again to 52+1=53 bits: The "0101" at the end is rounded down (because of the leading zero), leaving that long (but not infinite) string of 1s in the mantissa.
@cklager44: "If the argument is within 1E-12 of an integer, the INT function fuzzes the result to be equal to that integer." (INT function documentation)
Unfortunately, the difference between the result and the exact integer 111711 exceeds 1E-12, as Tom has shown already (diff=1.4551...E-11).
The situation with 1117.01*100 is actually similar, but in the last rounding step -- from 57 to 53 bits -- we are just lucky: Instead of "0101" (as above from 1117.11*100) a "1111" needs to be rounded and this is rounded up (because of the leading 1), which compensates the previous rounding error and leads to the proper internal binary representation of the integer 111701.
@FreelanceReinh An excellent, cogent, and detailed description of this situation. I learned something today - thank you for sharing it.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.