Re: Expected numeric precision behaviour or unexpected issue? - Page 3

kmw · Posted 12-05-2016 11:54 AM

I support numeric precision in SAS Technical Support division. Instead of creating a SAS note, I worked with our Publications department to add a section to the documentation and the section is titled "Accuracy on x64 Windows Processors".

ChrisNZ · Posted 12-05-2016 10:04 PM

Thanks @kmw

The updated document states "The following section shows the conversion process for a decimal number that cannot be represented precisely in floating-point representation. "

I don't see how 0.5 cannot be represented precisely in floating-point representation.

I reformulated the process described above the new paragraph for value 0.5 instead of 255.75.

3FE0000000000000 is the exact representation.

See below:

This example shows the conversion process for the decimal value 0.5 to floating-point representation.

Use the base 2 number system to write out the value 0.5 in binary.

Note: Each bit in the mantissa represents a fraction whose numerator is 1 and whose denominator is a power of 2; that is, the mantissa is the sum of a series of fractions such as 1 half , 1 fourth , 1 eighth , and so on. Therefore, for any floating-point number to be represented exactly, you must express it as the previously mentioned sum.

Base 2
	2⁷	2⁶	2⁵	2⁴	2³	2²	2¹	2⁰	.2^-1	2^-2
	128	64	32	16	8	4	2	1	1/2	1/4
255.75 =	0 x 2⁷	0 x 2⁶	0 x 2⁵	0x 2⁴	0 x 2³	0 x 2²	0 x 2¹	0 x 2⁰	1 x 2^-1	0 x 2^-2

So, the value 0.5 is represented in binary format as 0000 0000.10

Move the decimal over until there is one digit to the left of it. This process is called normalizing the value. Normalizing a value in scientific notation is the process by which the exponent is chosen so that the absolute value of the mantissa is at least one but less than ten. For this number, you move the decimal point -1 places:

1.000 000 00

Because the decimal point was moved -1 places, the exponent is now -1.

The bias is 1023, so add -1 to 1023 to get

1022

Convert the decimal value, 1022, to hexadecimal using the base 16 number system:

Base 16
16⁷	...	16⁴	16³	16²	16¹	16⁰
268,435,456	...	65,536	4096	256	16	1

1022=3x16^2 + 15*16^1 + 14*16^0

The converted hexadecimal value for 1022 will be placed in the exponent portion of the final result.
Convert 3FE to binary format:

0011 1111 1110
3 F E

In Step 2 above, delete the first digit and decimal (the implied one-bit):

0000 0000

Break these up into nibbles (half bytes) so that you have

0000 0000

To have a complete nibble at the end, add enough zeros to complete 4 bits:

0000 0000

Convert

0000 0000

to its hexadecimal equivalent to get the mantissa portion:

0000 0000

0 0

The final floating-point representation for 0.5 is

3FE0 0000 0000 0000

High-Performance SAS Coding - Third Edition

mkeintz · Posted 12-05-2016 11:17 PM

I thank @kmw as well. The issue needs documentation.

I agree entirely with @ChrisNZ comments. I mentally tracked through the conversion process description in the new documentation for, using .500000000000000000000000000 just as Chris describes, and I still do not see how it generates anything but 3FE0 0000 0000 0000.

Something seems to be happening that is not captured in the description as I read it. If you put a 1 in front of the decimal, then the inequality disappears. Is there something different about the conversion when the absolute value is less than 1?

And really, the fact that 2**-1 cannot be accurately stored because of too many trailing insignificant zeroes in the decimal-format ascii token being converted to floating point is hard to accept.

Question: is this also the behavior of other languages that generate 8-byte floating points on Windows from ascii numeric literals?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ChrisNZ · Posted 12-06-2016 04:53 AM

@mkeintz No issue for 8-byte numbers in VisualStudio's C++ or C.

#include <iostream>

int main() {

	double a, b;

	a = 0.50000000000000000000;
	b = 0.5000000000000000000000000;
	if (a == b) {
		std::cout << "Same\n";
	}
	else {
		std::cout << "Different\n";
	}

}

Same

#include <stdio.h>

int main() {

	double a, b;

	a = 0.50000000000000000000;
	b = 0.5000000000000000000000000;
	if (a == b) {
		printf("Same\n");
	}
	else {
		printf("Different\n");
	}
	
}

Same


data _null_;
  A=0.50000000000000000000;
  B=0.5000000000000000000000000;
  if A=B then putlog 'Same'; else putlog 'Different';
run;

Different

High-Performance SAS Coding - Third Edition

ChrisNZ · Posted 12-05-2016 11:49 PM

Also , regarding the explanation:

"The routine used to compute the result is slightly different on Windows than on any other host (Linux, UNIX, AIX, and so on).

It'd be interesting to know if this difference is due to a quirk in Windows or in SAS.

High-Performance SAS Coding - Third Edition

kmw · Posted 12-06-2016 01:59 PM

You are correct that .5 can be represented precisely as shown in the first example of the new section of the documentation. If I'm reading your responses correctly, it seems the line of text just before the new section title is causing the confusion.

It states: "The following section shows the conversion process for a decimal number that cannot be represented precisely in floating-point representation.

It should instead state: The following section shows the conversion process for a decimal number that cannot be represented precisely in some scenarios.

Also @ChrisNZ you asked if the difference is due to a quirk in Windows or SAS. It's SAS' proprietary way of processing the floating point numbers that go beyond the 15 significant digits.

mkeintz · Posted 12-06-2016 02:59 PM

@kmw

Is it safe to say that the SAS proprietary floating point treatment can produce this anomalous impact of trailing non-significant zeroes only for values between -1 and 1?

In other words, can adding excessive trailing non-significant zeroes chanage floating point representation for values greater than 1 (or less than -1)?

thanks,

Mark

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ChrisNZ · Posted 12-06-2016 05:26 PM

@kmw The page also states:

Although these values appear to be alike, the internal representations differ slightly, because the IEEE floating-point representation can only represent 15 digits.

This issue pertains to floating-point representation on the x64 processors.

I find these 2 statements misleading:

1- The 15 digit representation limit does not explain (the word "because" is used) in any way that SAS reads these numbers in this way:

3FE0000000000000 L=21 NB=0.50000000000000000000000

3FE0000000000000 L=22 NB=0.500000000000000000000000

3FDFFFFFFFFFFFFE L=23 NB=0.5000000000000000000000000

3FDFFFFFFFFFFFFE L=24 NB=0.50000000000000000000000000

3FDFFFFFFFFFFFFF L=25 NB=0.500000000000000000000000000

3FE0000000000000 L=26 NB=0.5000000000000000000000000000

3FE0000000000000 L=27 NB=0.50000000000000000000000000000

3FDFFFFFFFFFFFFF L=28 NB=0.500000000000000000000000000000

3FDFFFFFFFFFFFFF L=29 NB=0.5000000000000000000000000000000

3FE0000000000000 L=30 NB=0.50000000000000000000000000000000

3FE0000000000000 L=31 NB=0.500000000000000000000000000000000

3FE0000000000000 L=32 NB=0.5000000000000000000000000000000000

2- Likewise, the floating-point representation on the x64 processors does not exhibit issues explaining this behaviour of only SAS only under Windows.

3-Unrelated to this discussion but related to numerical precision, and much much wider and very puzzling:

The whole numerical precision issue exists because rational numbers were introduced to represent floating-point numbers.This seems like a ludicrous idea.

Rational numbers were introduced to represent the mantissa value when negative exponents were used as part of the algorithm detailed above.
Had the mantissa portion of the number kept positive exponent representations only, with the position of the decimal dot shifted using the exponent portion of the number, this numerical precision issue would not even exist.

There must have been a reason to decide using negative exponents and try to represent the decimal portion of numbers using the sum of rational numbers.

Do you know what on earth could this justification could be?

High-Performance SAS Coding - Third Edition

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away