12-18-2015 03:08 AM
I have a very basic question. I see a multiplication error. Below code outputs 69999999999999991808.
but it should be 70000000000000000000.
data temp; format gg comma20.; gg = 0.7*(10**20); run;
Why does SAS output so?
12-18-2015 04:58 AM
Your number is quite long and probably hits the ceiling for big numbers, hence some kind of rounding or truncation is happening on the underlying binary representation of the number:
Why do you need numbers that big? You may have to look at other options if you want:
There is also the possibity of using types from other languages, but I can't find that post at the moment.
12-18-2015 05:11 AM
Numeric variable can store up to maximum of 16 digit value. but your case it is close to 20 digits. I guess this is the reason why you are seeing weired output .
12-18-2015 06:34 AM - edited 12-18-2015 03:42 PM
You have encountered a numeric representation issue.
Earlier this month, I wrote two detailed posts on this subject, which may help you to understand this and similar "strange" phenomena:
Your current issue involves two relevant aspects:
I will write a more detailed explanation later today.
12-18-2015 12:14 PM - edited 12-18-2015 12:20 PM
As promised, here's the more detailed explanation:
As pointed out in the posts earlier this month, every calculation has to be viewed with the binary system in mind and with the limited number of bits (64 with Windows and Unix SAS) which are available to store numeric values.
So, let's start with 10**20. This is already a calculation. In the binary system the result looks like this:
As you can see, it has 67 binary digits. Luckily, the last 20 digits are all 0 (due to the fact that 10**20 is obviously divisible by 2**20).
Here is the floating-point representation that SAS uses internally for this number (the BINARY64. and HEX16. formats reveal that):
%put %sysfunc(putn(10**20, binary64.)); /* Result: 0100010000010101101011110001110101111000101101011000110001000000 */
Comparing this to the "mathematical" binary representation above, we see that the internal representation accommodates all significant digits:
0100010000010101101011110001110101111000101101011000110001000000 /* internal floating-point representation */ 1010110101111000111010111100010110101100011000100000000000000000000 /* binary integer */
(In the first link provided by RW9 you can find a description of what the 64 bits of the internal representation stand for [sign, exponent, mantissa]. Thanks @RW9 for providing this link! This page has been improved considerably compared to the earlier SAS versions up to 9.3. Previously, this documentation was harder to find.)
This means that the internal representation of 10**20 is exact, although this number exceeds the order of magnitude where this is guaranteed: The largest integer up to which no accuracy problems occur is CONSTANT('EXACTINT'):
%put %sysfunc(constant(exactint)); /* Result: 9007199254740992=2**53, approx. 9E15 (Windows/Unix) */
Side note: For larger powers of 10 it can easily happen that 10**xx is not equal to 1Exx because they contain more than 53 significant binary digits and rounding errors occur in the calculation! Example:
data _null_; if 10**32 ne 1E32 then put 'Surprise!'; run;
The really problematic number in your calculation is 0.7! In the binary system it looks like this:
The 4-digit sequence 0110 is repeated endlessly, similar to the 3s in 1/3=0.33333... in the decimal system. So, there is no chance to store all of these infinitely many significant digits!
Let's look at the internal representation:
%put %sysfunc(putn(0.7, binary64.)); /* Result: 0011111111100110011001100110011001100110011001100110011001100110 */
You recognize the repeating 0110s. After 13 repetitions the 52 mantissa bits are exhausted. The rightmost, least significant bit is rounded down to 0, because the next digit would be 0 (and not 1), namely the first digit in the next "0110" block.
So, we've incurred a rounding error. The lost digits 01100110... amount to 0.4*2**(-53), approx. 4.44E-17.
In other words, when we say x=0.7; (in SAS) the number stored internally is already not exactly equal to 0.7, but has the decimal equivalent of 0.6999999999999999555910790149937383830547332763671875.
This inherent rounding error, called numeric representation error, tends to propagate (like any rounding error) as soon as we start doing calculations with the affected number. Multiplying by a large number -- as you did -- is prone to inflating the rounding error as well.
In the decimal system it's easy to multiply a number by 10**20, but the computer, of course, performs the calculation in the binary system. The less obvious internal result of this calculation is, again, revealed by using the BINARY64. format:
data _null_; gg = 0.7*(10**20); put gg binary64.; run; /* Result: 0100010000001110010110111000111110101000111111100010101010111111 */
The rightmost bit 1 here -- which is representing a place value of 2**13=8192! -- is rounded down. The bit sequence would actually continue with 011101...
Converted to the decimal system, this result equals exactly 69999999999999991808.
The difference of -8192 to the intended value 7E19 (which has an error-free internal representation, like 1E20) can be regarded as the sum of
@fuatengin: So, this is the answer to your question.
@pearsoninst: You were wondering why we don't get such surprising results when we replace 0.7 by 0.1 or 0.2. This is a reasonable question, because both 0.1 and 0.2, like 0.7, are repeating binary fractions with infinitely many digits. Their numeric representation errors are approx. 5.55E-18 and 1.11E-17, respectively (in both cases round-up errors, i.e. the internal values are slightly larger than 0.1 and 0.2, resp.).
Multiplied by 10**20, these are approx. 555 and 1110, resp. As it turns out, the rounding errors from the multiplication (with 10**20) are exactly the same, but with a negative sign (approx. -555 and -1110, resp.).
Hence, the two errors cancel each other out and we obtain correct results! This may sound like a rare coincidence, but it isn't. Please note that both 1E19 and 2E19 have exact internal representations. So, if a calculation with (mathematically) these results deviated from these values, it would most probably deviate by +1 or -1 in the least significant bit of the internal representation. As soon as the deviation is less than that, it must be exactly zero.
What can we do to avoid these rounding errors? As explained in my earlier posts, the ROUND function is an appropriate tool. When working with numbers as large as 7E19 we have to use a relatively large rounding unit. In your example even 10000 would be too small a rounding unit, but 100000 is sufficient:
data _null_; gg = round(0.7*(10**20),1e5); put gg binary64.; /* 0100010000001110010110111000111110101000111111100010101011000000 */ put gg comma20.; /* 70000000000000000000 */ run;
Since gg is much larger than CONSTANT('EXACTINT'), calculations involving numbers <100000, like gg+12345, would not yield sensible results anyway.
12-18-2015 12:26 PM - edited 12-18-2015 12:45 PM
Or as work around in your case:
data temp; format gg comma26.; gg =(10**20)*7/10; run;
If you stayed away from the infinite binary representation for decimal fraction .7. And as the Order of Evaluation in your case from L to R. So you do not have decimal point in your calculation, the above code will work.
But you really need to be careful and keep in your mind all the previous advices. For example if you increased the power, you will again get unexpected results.