Solved: Rounding and scientific notation

Zaghini · Posted 11-03-2021 04:01 AM

Hi, I have a problem with scientific notation.

My dataset prova has 9 rows and it is so composed:

Vincolo	bil22	bil23	bil24
BIL031 - BIL031	80147,35	80147,35	80147,35
BIL031 - BIL031	0	0	0
BIL031 - BIL031	0	0	0
BIL031 - BIL031	0	0	0
BIL031 - BIL031	0	0	0
BIL031 - BIL031	-23793,09	-23793,09	-23793,09
BIL031 - BIL031	-9750,34	-9750,34	-9750,34
BIL031 - BIL031	-22784,59	-22784,59	-22784,59
BIL031 - BIL031	-23819,33	-23819,33	-23819,33

Variables bil22, bil23 and bil24 are created with:

data prova;

set vincolati;

bil22=round(bilancio2022,.01);

bil23=round(bilancio2023,.01);

bil24=round(bilancio2024,.01);

run;

when I summarize

proc sql;
CREATE TABLE vinc_tot AS
SELECT
vincolo, sum(Bil22) as vincsum22, sum(Bil23) as vincsum23, sum(Bil24) as vincsum24
FROM prova

GROUP BY vincolo;
QUIT;

I expect a risult like this:

vincolo	vincsum22	vincsum23	vincsum24
BIL031 - BIL031	0	0	0

But I obtain this:

vincolo	vincsum22	vincsum23	vincsum24
BIL031 - BIL031	3.637979E-12	3.637979E-12	3.637979E-12

As if in Bil22, Bil23 and Bil24 (rounded .01 !!! ) there where several decimals.

Someone can help me?

Thank you

Massimo

FreelanceReinh · Posted 11-03-2021 10:09 AM

Hi @Zaghini and welcome to the SAS Support Communities!

I see that @Rick_SAS has already provided the explanation and solution. Let me just add a few details specific to your example.

You have encountered a numeric representation issue. Computers store numeric values in the binary system. Most of the numbers with one or more decimals (in the familiar decimal system) have infinitely (!) many digits when converted to the binary system. Similar to 1/3=0.333333... in the decimal system they are periodic fractions. SAS uses a 64-bit floating-point representation, which is rounded if the exact number would require more than 64 bits, let alone infinitely many, in that representation. Hence, most numbers with one or more decimals and in fact all numbers except 0 in your sample data contain a small rounding error that you don't see with standard SAS formats.

Example:

Your first value, 80147.35, written in the binary system, looks like this:

10011100100010011.01011001100110...

The pattern "0110" (highlighted in boldface) is repeated periodically.

Compare this to the internal 64-bit floating-point representation used by SAS (here: under Windows or Unix; see Numerical Accuracy in SAS Software for the documentation).

0100000011110011100100010011010110011001100110011001100110011010

You can recognize the blue digits of the integer part (the first "1" is the so-called "implied bit") and the red digits of the fractional part with eight copies of the repeating 4-digit pattern. (The first 12 bits of the internal representation store the sign and the order of magnitude of the number.) The last two digits, 10 (in purple italics), are the result of rounding 01100110... to only two digits. So the number is rounded up and thus a little greater than the exact value 80147.35. Indeed, converting the internal representation back to the decimal system would result in

80147.35000000000582076609134674072265625

which means an unavoidable rounding error of 1.6*2**-38 ≈ 5.82E-12.

Now, if you add several numbers with rounding errors (or do other calculations with them), it's likely that the result will be contaminated with a rounding error as well. In your example, the mathematically exact result happens to be 0, so that the rounding error, although as small as 2**-38 ≈ 3.637979E-12, becomes apparent.

Rounding the sum with a rounding unit small enough not to influence the results, but greater than the tiny rounding error described above, gives the desired result:

round(sum(Bil22),1e-9) as vincsum22

In your example, knowing that the result can have at most two decimals like all numbers involved, you could also use 0.01=1e-2 as the rounding unit to obtain the same result.

Edit:

The other numbers involved in your sum, converted back and forth as above, are:

-23793.09000000000014551915228366851806640625 = -23793.09 - 1.28*2**-43
- 9750.34000000000014551915228366851806640625 = - 9750.34 - 1.28*2**-43
-22784.59000000000014551915228366851806640625 = -22784.59 - 1.28*2**-43
-23819.330000000001746229827404022216796875   = -23819.33 - 1.92*2**-40

In this particular example the rounding error you obtained for the sum equals the sum of the rounding errors of the summands: 1.6*2**-38 - 3*1.28*2**-43 - 1.92*2**-40 = 2**-38. But this is not always the case. The exact rounding error depends on the addition of the internal representations and on the order of the summands, i.e., a+b+c in SAS (and other software) is not necessarily equal to b+c+a, which is particularly problematic in PROC SQL where "order of observations" is not stable in general. Indeed, I get six different sums ranging from -3.63798E-12 over 0 to 1.455192E-11 when I permute the five summands in all possible ways.

View solution in original post

Rick_SAS · Posted 11-03-2021 09:06 AM

Yes, it can be very frustrating when you realize that every decimal number cannot be represented exactly as a finite-precision binary number.

In base-10 (decimal), certain numbers such as 1/3 cannot be represented exactly by any finite decimal number. So an expression such as 1/3 + 1/3 + 1/3 becomes (in decimal) 0.33333 + 0.33333 + 0.33333 = 0.99999, when we know that in EXACT arithmetic the answer should be 1.0.

The same thing is happening here. Run this DATA step to see:

data Test;
s = sum(1.35, -0.09, -0.34, -0.59, -0.33);
run;

proc print data=Test;
run;

Even though in EXACT arithmetic these numbers sum to 0.0, when you represent the numbers in finite-precision binary, the sum differs from zero by a small amount.

One way to handle this issue is to round the FINAL SUM, not the individual elements.

By the way, in numerical analysis, this issue is sometimes called "10.0 times 0.1 is hardly ever 1.0" (Kernighan and Plauger, 1974, The Elements of Programming Style). For another example and a discussion of defensive programming related to finite precision, see "Programming tip: Avoid testing floating-point values for equality".

Kurt_Bremser · Posted 11-03-2021 10:02 AM

Rule of thumb: always round after all calculations are done, and always before any comparison.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

FreelanceReinh · Posted 11-03-2021 10:09 AM

Hi @Zaghini and welcome to the SAS Support Communities!

I see that @Rick_SAS has already provided the explanation and solution. Let me just add a few details specific to your example.

You have encountered a numeric representation issue. Computers store numeric values in the binary system. Most of the numbers with one or more decimals (in the familiar decimal system) have infinitely (!) many digits when converted to the binary system. Similar to 1/3=0.333333... in the decimal system they are periodic fractions. SAS uses a 64-bit floating-point representation, which is rounded if the exact number would require more than 64 bits, let alone infinitely many, in that representation. Hence, most numbers with one or more decimals and in fact all numbers except 0 in your sample data contain a small rounding error that you don't see with standard SAS formats.

Example:

Your first value, 80147.35, written in the binary system, looks like this:

10011100100010011.01011001100110...

The pattern "0110" (highlighted in boldface) is repeated periodically.

Compare this to the internal 64-bit floating-point representation used by SAS (here: under Windows or Unix; see Numerical Accuracy in SAS Software for the documentation).

0100000011110011100100010011010110011001100110011001100110011010

You can recognize the blue digits of the integer part (the first "1" is the so-called "implied bit") and the red digits of the fractional part with eight copies of the repeating 4-digit pattern. (The first 12 bits of the internal representation store the sign and the order of magnitude of the number.) The last two digits, 10 (in purple italics), are the result of rounding 01100110... to only two digits. So the number is rounded up and thus a little greater than the exact value 80147.35. Indeed, converting the internal representation back to the decimal system would result in

80147.35000000000582076609134674072265625

which means an unavoidable rounding error of 1.6*2**-38 ≈ 5.82E-12.

Now, if you add several numbers with rounding errors (or do other calculations with them), it's likely that the result will be contaminated with a rounding error as well. In your example, the mathematically exact result happens to be 0, so that the rounding error, although as small as 2**-38 ≈ 3.637979E-12, becomes apparent.

Rounding the sum with a rounding unit small enough not to influence the results, but greater than the tiny rounding error described above, gives the desired result:

round(sum(Bil22),1e-9) as vincsum22

In your example, knowing that the result can have at most two decimals like all numbers involved, you could also use 0.01=1e-2 as the rounding unit to obtain the same result.

Edit:

The other numbers involved in your sum, converted back and forth as above, are:

-23793.09000000000014551915228366851806640625 = -23793.09 - 1.28*2**-43
- 9750.34000000000014551915228366851806640625 = - 9750.34 - 1.28*2**-43
-22784.59000000000014551915228366851806640625 = -22784.59 - 1.28*2**-43
-23819.330000000001746229827404022216796875   = -23819.33 - 1.92*2**-40

In this particular example the rounding error you obtained for the sum equals the sum of the rounding errors of the summands: 1.6*2**-38 - 3*1.28*2**-43 - 1.92*2**-40 = 2**-38. But this is not always the case. The exact rounding error depends on the addition of the internal representations and on the order of the summands, i.e., a+b+c in SAS (and other software) is not necessarily equal to b+c+a, which is particularly problematic in PROC SQL where "order of observations" is not stable in general. Indeed, I get six different sums ranging from -3.63798E-12 over 0 to 1.455192E-11 when I permute the five summands in all possible ways.

Zaghini · Posted 11-04-2021 03:30 AM

thank you all @Rick_SAS @Kurt_Bremser @FreelanceReinhard, it is my first time in sas communities. You did a great job! I followed your instructions and it works perfectly.

m.

Rounding and scientific notation

Re: Rounding and scientific notation

Re: Rounding and scientific notation

Re: Rounding and scientific notation

Re: Rounding and scientific notation

Re: Rounding and scientific notation