12-14-2015 09:16 PM
12-14-2015 09:30 PM - edited 12-14-2015 09:42 PM
this result is very weird..
tt=ceil(0.2553 * 100000);
Computer uses binary, when computer try to address numbers in decimal, many times it can only do approximation if the length is limited, so the outcome of 0.2553 * 100000 is actually slightly greater than 25530. Try the following:
data tt; equal=(0.2553*100000 = 25530); greater=(0.2553*100000 > 25530); tt=ceil(round(0.2553 * 100000)); a=25530; b=0.2553*100000; a_hex=put(a,hex16.); b_hex=put(b,hex16.); run;
as you can see, a_hex is less than b_hex, and one way to get the consistent outcome is to get rid of the possible decimals by applying ROUND(). There is nothing to worry about, it is just the way it is from the beginning of the computer era, and it has little to do with hardware and software.
12-15-2015 03:14 PM
Here is a more detailed explanation of why this is happening:
It is surprisingly difficult for the computer to store the decimal fraction 0.2553, although it has only four decimals. (And there is nothing specific to this particular number, see further below.) The reason is that numeric values are not stored using the decimal system, but using the binary (i.e. base 2) system, as @Haikuo has pointed out already. And, in fact, this is not SAS-specific.
Here's how 0.2553 looks like in the binary system:
(One can recognize that it is slightly larger than 0.01 (binary) = 2**(-2) = 1/4 = 0.25 (decimal).)
The issue is: The sequence of binary digits to the right of the binary point is infinite! We observe the same phenomenon in many decimal numbers such as 1/3=0.33333... where the 3s are, in theory, repeated endlessly. (In fact, similar to 1/3 in the decimal system, the above binary number is a recurring binary fraction: a complicated pattern of 500 (!) 0s and 1s, starting at the fifth binary place, is repeated endlessly.)
For a numeric SAS variable of (the default) length 8 bytes the internal so-called floating-point representation consists of 64 bits (i.e. binary digits). By using the BINARY64. format, we can see how the value is stored internally:
data _null_; x=0.2553; put x binary64.; run;
On Windows and Unix systems, the first 12 bits contain information on the sign and the order of magnitude of the number. The remaining 52 bits essentially hold binary digits from the actual binary fraction which we saw above:
Comparison: 0011111111010000010101101101010111001111101010101100110110011111 (this is the internal representation) 0.01000001010110110101011100111110101010110011011001111010000011111001... (the original binary fraction) ^
Not surprisingly, those 52 bits cannot accommodate the infinite sequence of 0s and 1s which would be necessary for an exact representation. Instead, the rightmost bit (indicated with a "^") is rounded. It is rounded up in this case because the next digit is 1 and not 0. Obviously, this causes a small rounding error. In our case this rounding error amounts to approx. 2.69E-17. In other words, the number 0.2553 is represented in the computer's memory by a slightly larger number the exact value of which is
The small deviation from the intended value 0.2553, called numeric representation error, harbors the risk of even larger deviations from intended results when this number is used in calculations.
Now, let's look at your specific calculation, 0.2553*100000. Given the above information, you might think that this (written in decimal notation) results in
But this is not the case, because the calculation is performed in the binary system. The result obtained by the computer happens to be even larger:
As you can see, the multiplication by the relatively large number 100000 (=1E05) has increased the deviation from the intended value from 2.69E-17 to almost 3.64E-12, i.e. by about 5 orders of magnitude.
(Side note: Due to a built-in rounding feature, standard numeric formats such as BEST32., E32. or 32.26 display this number the same way as the exact integer 25530, which can be very misleading to the unaware! Only special-purpose formats such as BINARY64. and HEX16. show the true internal representation.)
The CEIL function (unlike the CEILZ function) "fuzzes" the result to avoid unwanted results due to such rounding issues in many cases. (Same in SAS 9.1.3.) However, its tolerance is limited to deviations up to 1E-12 from integers. This is why it considers the deviation of 3.64E-12 from 25530 large enough to give the next larger integer: 25531.
Now you know more exactly why that is happening.
You may be wondering how likely it is to encounter a number, such as 0.2553, which has an infinite binary representation and hence an inherent rounding error (the numeric representation error). The answer is alarming:
Those numeric representation errors affect
and so on.
That is, even among numbers with only a few decimal places the vast majority are not represented exactly in the computer's memory!
The good news is that integers (up to 15 decimal digits) are not affected by numeric representation issues (provided that they are stored with length 8 bytes).
As has been suggested already, the ROUND function is one of the most important tools to combat rounding errors that have accumulated in calculations. The appropriate rounding unit (second argument of ROUND) depends on the order of magnitude of the numbers to be rounded, but for most practical applications (i.e., if you're dealing with numbers from, say, 0.000001 to 1000000) I found 1E-9 to be suitable. It must be larger than the small deviations to be flattened (3.64E-12 in our example), but small enough not to distort results.
Please note that the result of the ROUND function is still affected by the unavoidable numeric representation error (if any). But it is adjusted to have the same "standard" internal representation as the mathematically rounded value.
data _null_; if 0.1+0.7 ne 0.8 then put 'Unequal!'; if round(0.1+0.7, 1e-9)=0.8 then put 'Equal after rounding.'; run;
References: please see this earlier post of mine.