Solved: Re: Round function issue on large number

SHFBD · Posted 11-18-2020 08:52 AM

I have an odd exprience when rounding a number.

One number is rounded down and the other up:

data rounding;
   d1=round(201578736.46500, .01);
   d2=round(276578736.46500, .01);
run;
proc print data=rounding noobs;
run;

The first one rounds to .47 and the other one to .46

Are there some kind of limit on the input to round()?

FreelanceReinh · Posted 11-18-2020 02:34 PM

Hello @SHFBD and welcome to the SAS Support Communities!

As PaigeMiller has mentioned, this has to do with the unavoidable loss of precision that typically occurs when decimal fractions are stored as binary representations.

Let's take a look at how SAS stores these two numbers, 201578736.46500 and 276578736.46500, in a numeric variable of length 8 (bytes) under Windows or Unix.

The decimal number 201578736.465, converted to the binary system (and with some line breaks inserted), looks like this:

1100000000111101100011110000.01110111000010100011110
                                10111000010100011110
                                10111000010100011110
                                101110...

The 20-binary-digit pattern "10111000010100011110" is repeated infinitely many times ("periodic fraction").

The internal 64-bit floating-point representation that SAS under Windows or Unix uses to store the number in an 8-byte numeric variable is based on the above binary representation. However, only 52 of the 64 bits (let alone infinitely many) are available for the binary digits following the first "1". The remaining binary digits are rounded (in this particular example: rounded up), as can be seen with the BINARY64. format:

data _null_;
x=201578736.465;
put x binary64.;
run;

Result:

0100000110101000000001111011000111100000111011100001010001111011

From the mathematical binary representation further above we know that the last two bits, 11, resulted from rounding 1011100001... to 1100000000... Apart from this, the 52 "mantissa bits" (in bold face above) are copied from the periodic binary representation and we recognize the repeating 20-binary-digit pattern (highlighted in blue). If we convert this rounded binary representation back to the decimal system, we obtain

201578736.4650000035762786865234375

and the (so called) numeric representation error of approx. 3.576E-9 becomes obvious. Given that this number is even slightly larger than 201578736.465, we're not surprised that the ROUND function (with a rounding unit of 0.01) rounds it correctly to 201578736.47.

Now we do the same investigation with 276578736.465:

The mathematical conversion to the binary system results in:

10000011111000100000110110000.01110111000010100011110
                                 10111000010100011110
                                 10111000010100011110
                                 101110...

The fractional part is, of course, identical to that of the first number because it represents the decimal fraction of 0.465 in both cases. Only the integer part has changed. Note that it has become longer! Since we have crossed a power of 2, namely 2**28 = 268435456, we now have 29 rather than 28 binary digits on the left of the point. As a consequence, the rounding for the internal representation occurs at a different digit! Indeed, the mantissa is now rounded down, as can be seen again with the BINARY64. format:

data _null_;
x=276578736.465;
put x binary64.;
run;

Result:

0100000110110000011111000100000110110000011101110000101000111101

Once more thanks to the exact periodic fraction, we know that the last bit, 1, resulted from rounding 1011100001... to 1000000000... If we convert this rounded binary representation back to the decimal system, we obtain

276578736.464999973773956298828125

and the representation error of approx. -2.623E-8 becomes obvious. Given that this number is slightly less than 276578736.465, it seems now plausible that it is rounded down to 276578736.46.

For a full explanation, however, we would need to know the internal workings of the ROUND function. Suffice it to say that the ROUND function performs some "extra computations, called fuzzing, to try to make the result agree with decimal arithmetic in the most common situations" (documentation; see also the ROUNDZ function, which "does not fuzz the result"). Apparently, for the second number the amount of fuzzing applied by the ROUND function does not suffice to compensate for the numeric representation error -2.623E-8, which is much larger (EDIT: in absolute value) than it would be for a number with a magnitude in the thousands or so. I would not expect such incorrect rounding to occur for much smaller numbers like 1234.465.

For more details and strategies for handling this issue (such as "defining your own rounding routine") please see http://support.sas.com/resources/papers/dealing-with-numeric-representation-error-in-sas-application... (in particular p. 11 ff.).

View solution in original post

PaigeMiller · Posted 11-18-2020 09:07 AM

Most numbers that are not integers cannot be represented exactly in digital computers. This is called "machine precision". The numbers can be represented only to within ± a small value epsilon. So the problem isn't the ROUND() function, the problem is a systematic problem that all computers have.

--
Paige Miller

FreelanceReinh · Posted 11-18-2020 02:34 PM

Hello @SHFBD and welcome to the SAS Support Communities!

As PaigeMiller has mentioned, this has to do with the unavoidable loss of precision that typically occurs when decimal fractions are stored as binary representations.

Let's take a look at how SAS stores these two numbers, 201578736.46500 and 276578736.46500, in a numeric variable of length 8 (bytes) under Windows or Unix.

The decimal number 201578736.465, converted to the binary system (and with some line breaks inserted), looks like this:

1100000000111101100011110000.01110111000010100011110
                                10111000010100011110
                                10111000010100011110
                                101110...

The 20-binary-digit pattern "10111000010100011110" is repeated infinitely many times ("periodic fraction").

The internal 64-bit floating-point representation that SAS under Windows or Unix uses to store the number in an 8-byte numeric variable is based on the above binary representation. However, only 52 of the 64 bits (let alone infinitely many) are available for the binary digits following the first "1". The remaining binary digits are rounded (in this particular example: rounded up), as can be seen with the BINARY64. format:

data _null_;
x=201578736.465;
put x binary64.;
run;

Result:

0100000110101000000001111011000111100000111011100001010001111011

From the mathematical binary representation further above we know that the last two bits, 11, resulted from rounding 1011100001... to 1100000000... Apart from this, the 52 "mantissa bits" (in bold face above) are copied from the periodic binary representation and we recognize the repeating 20-binary-digit pattern (highlighted in blue). If we convert this rounded binary representation back to the decimal system, we obtain

201578736.4650000035762786865234375

and the (so called) numeric representation error of approx. 3.576E-9 becomes obvious. Given that this number is even slightly larger than 201578736.465, we're not surprised that the ROUND function (with a rounding unit of 0.01) rounds it correctly to 201578736.47.

Now we do the same investigation with 276578736.465:

The mathematical conversion to the binary system results in:

10000011111000100000110110000.01110111000010100011110
                                 10111000010100011110
                                 10111000010100011110
                                 101110...

The fractional part is, of course, identical to that of the first number because it represents the decimal fraction of 0.465 in both cases. Only the integer part has changed. Note that it has become longer! Since we have crossed a power of 2, namely 2**28 = 268435456, we now have 29 rather than 28 binary digits on the left of the point. As a consequence, the rounding for the internal representation occurs at a different digit! Indeed, the mantissa is now rounded down, as can be seen again with the BINARY64. format:

data _null_;
x=276578736.465;
put x binary64.;
run;

Result:

0100000110110000011111000100000110110000011101110000101000111101

Once more thanks to the exact periodic fraction, we know that the last bit, 1, resulted from rounding 1011100001... to 1000000000... If we convert this rounded binary representation back to the decimal system, we obtain

276578736.464999973773956298828125

and the representation error of approx. -2.623E-8 becomes obvious. Given that this number is slightly less than 276578736.465, it seems now plausible that it is rounded down to 276578736.46.

For a full explanation, however, we would need to know the internal workings of the ROUND function. Suffice it to say that the ROUND function performs some "extra computations, called fuzzing, to try to make the result agree with decimal arithmetic in the most common situations" (documentation; see also the ROUNDZ function, which "does not fuzz the result"). Apparently, for the second number the amount of fuzzing applied by the ROUND function does not suffice to compensate for the numeric representation error -2.623E-8, which is much larger (EDIT: in absolute value) than it would be for a number with a magnitude in the thousands or so. I would not expect such incorrect rounding to occur for much smaller numbers like 1234.465.

For more details and strategies for handling this issue (such as "defining your own rounding routine") please see http://support.sas.com/resources/papers/dealing-with-numeric-representation-error-in-sas-application... (in particular p. 11 ff.).

SHFBD · Posted 11-19-2020 01:37 AM

Thank you for a very comprehensive answer.

Maybe I just have to accept this loss of precision, but then I would expect it to be the same for all roundings on 276578736.46500, right? But if I use input(276578736.46500, 15.2) it rounds to .47, aswell as if I use "format=15.2" in PROC SQL?

data rounding;
   d1=round(201578736.46500, .01); /*28 bit integer*/
   d2=round(276578736.46500, .01); /*29 bit integer*/
   d3=round(971578736.46500, .01); /*30 bit integer*/
   d4=input(276578736.46500, 15.2);
run;
proc print data=rounding noobs;
run;

FreelanceReinh · Posted 11-19-2020 04:47 AM

You're welcome.

@SHFBD wrote:

Thank you for a very comprehensive answer.

Maybe I just have to accept this loss of precision, but then I would expect it to be the same for all roundings on 276578736.46500, right? But if I use input(276578736.46500, 15.2) it rounds to .47, aswell as if I use "format=15.2" in PROC SQL?

No, that's not how SAS does it. The ROUND function uses a different (I think: more sophisticated) algorithm than a mere numeric format.

Your example using the INPUT function actually involves an automatic numeric-to-character conversion (see note in the log), which is done using the BEST12. format. So, the relevant intermediate result is that of

put(276578736.46500, best12.)

which is the character string '276578736.47' (rounded up, correctly, but not matching the result of the ROUND function). Then the decimal specification ".2" of the informat 15.2 is ignored because the string contains a decimal point. Applying the format 15.2 is a different case, but the result for this particular number is the same as with BEST12. (except for three leading blanks).

I think you obtain results which are more consistent with the internal binary representation if you switch the DECIMALCONV= system option from COMPATIBLE (the default) to STDIEEE:

options decimalconv=stdieee;

Now both the BEST12. and the 15.2 format round 276578736.465 down to 276578736.46 as does the ROUND function for the reason I explained earlier.

The result in variable d3 can be explained in the same way as in the other two examples. The internal binary floating-point representation of 971578736.46500 (under Windows), converted back to the decimal system is the number

971578736.46500003337860107421875

and the ROUND function (with rounding unit 0.01) rounds it up.

SHFBD · Posted 11-19-2020 05:19 AM

Thank you again 😃

Classroom Training Available!