12-04-2015 05:25 AM
12-04-2015 05:26 AM
This is a general thing with decimals. I can't remember the exact terminology for it, but the underlying number is stored with a very tiny part of the number, so you may be comparing 0.3 to 0.3000000000000001, which is different. This is down to the memory storgae of the number and you will come across it a bit. I would always recommend, unless you need to leave a raw number, to round to a know decimal value to omit these kind of problems:
data have; x = 0.3; y = 0.1 + 0.1 + 0.1; if round(x,.1)=round(y,.1) then z="same"; else z="not"; run;
12-04-2015 06:21 AM - edited 12-04-2015 06:22 AM
You have encountered a numeric representation issue. Roughly speaking, numeric values are stored in the binary system in the computer's memory, not in the familiar decimal system.
In your example:
0.1 (decimal) = 0.00011001100110011... (binary)
Note that the binary value is a periodic fraction. For an exact representation the "0011" would need to be repeated infinitely many times, like the 3s in 1/3=0.3333... in the decimal system. Since there are only finitely many bits available to store that value (64 bits on Windows, provided that the numeric variable is stored with the default length of 8 Bytes), the computer has no other choice but to round or truncate the infinite sequence of binary digits. But this produces a small rounding error, a numeric representation error. For the decimal number 0.1 this error is 0.4*2^-56 (on Windows), which is approx. 5.55*10^-18. Very small, but, as you know, rounding errors can accumulate easily in calculations and thus lead to surprising results, like 0.1+0.1+0.1 ne 0.3.
If you want to see how your numeric values are stored internally, use the BINARY64. or (for a shorter, but equivalent output in hexacedimal notation) HEX16. format (and not something like 32.30 or BEST32. format):
data _null_; x=0.3; y=0.1+0.1+0.1; put x binary64.; put y binary64.; put x hex16.; put y hex16.; run;
0011111111010011001100110011001100110011001100110011001100110011 0011111111010011001100110011001100110011001100110011001100110100 3FD3333333333333 3FD3333333333334
There you see the difference in the least significant bit.
Indeed, as @RW9 has suggested, rounding is probably the most important strategy to avoid these errors. The appropriate rounding unit depends on the order of magnitude of the numbers to be rounded, but for most practical applications (i.e., if you're dealing with numbers from, say, 0.000001 to 1000000) I found 10^-9 (1E-9) to be suitable.
If you want to dig deeper into the technical details, you could read one of the two standard references on this subject:
Conference papers about this topic include:
I have given talks about this subject several times in one of my projects in the past and could tell you much more. So, please don't hesitate to ask further questions about it.
12-04-2015 09:10 AM
I've experienced similar issues and have solved with the rounding function. Working for the banking industry, financial calculations we perform requires the decimal rounding function even while doing proc compares to provide the tolerances to flag as differences when exceeded.
12-04-2015 09:58 PM
Besides of rounding you can also use the compfuzz() function for such situations.
Alternatively - and what I would do - you round the result of such operations from the very beginning. Something like:
y = round(0.1 + 0.1 + 0.1,0.00000001);