Solved: Re: int function odd behavior - Page 2

Kurt_Bremser · Posted 07-09-2020 12:28 AM

@Rick_SAS wrote:

Kurt, I haven't seen explicit code from you, so perhaps I am misunderstanding, but I think in principle Paige is correct. If the two numbers are on opposite sides of a cutpoint, the rounding will take them in opposite directions.
data A;
epsilon = 1e-16;
x = 0.00005;
y = x - epsilon;
RndX = round(x, 0.0001);
RndY = round(y, 0.0001);
put RndX=;
put RndY=;
run;

You are right, that's an edge case where round() can't fix it.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ChrisNZ · Posted 07-08-2020 09:38 PM

@PaigeMiller >And I'm still not convinced that round is a solution for all possible floating point numbers.

My experience is that rounding always works if done appropriately. Since the error is always very small, rounding by say a billionth gets rid of it.

To reuse @Rick_SAS 's code:

data A;
  X = 0.00005;
  Y = 0.00005 - 1e-16; 
  A=X=Y; 

  RndX = round(X, 1e-9);
  RndY = round(Y, 1e-9);

  B=RndX=RndY; 
  C=RndY=0.00005;

  putlog A= B= C=;
run;

A=0 B=1 C=1

Though I agree than in principle the decimal number cannot be represented, at least rounding correctly homogenises the binary values.

High-Performance SAS Coding - Third Edition

FreelanceReinh · Posted 07-08-2020 03:52 PM

@Tom wrote:

The MOD() seems to fix this. Not sure if it does for all values or just this specific one.

The real solution is to ROUND() the value(s) before comparing.

Thanks for pointing this out. It's interesting that the MOD function seems to handle a number of these cases correctly, albeit in general only up to four decimals, e.g., mod(2.12345,1) ne 0.12345, whereas equality holds for decimals .0000 through .9999.

I guess this is a result of the "extra computations, called fuzzing, to return an exact zero when the result would otherwise differ from zero because of numerical error" (documentation). The remarkable thing is that this involves creating correct, non-trivial binary digits of the result which are not contained in the argument, especially if the integer part is large.

Example:

data _null_;
x=1414.2135;
y=mod(x,1);
z=.2135;
put x= binary64. / (y z)(+13 =binary64. /);
run;

Result:

x=0100000010010110000110001101101010011111101111100111011011001001
             y=0011111111001011010100111111011111001110110110010001011010000111
             z=0011111111001011010100111111011111001110110110010001011010000111

I've indented the (equal) values of y and z in order to align the parts of the mantissas which represent the fractional parts of the three numbers: For y and z, these are the 39 blue and the 13 green bits. The latter are not contained in the internal representation of x because 10 bits (red) are occupied by the integer part (1414) and 3 additional bits (purple) are needed for the fractional part (without normalization). Moreover, the least significant bit is rounded up.

I virtually always use the ROUND function (typically with rounding units such as 1e-9) when comparing numeric values with a small to moderate number of decimals.

ChrisNZ · Posted 07-09-2020 02:05 AM

@FreelanceReinh > I virtually always use the ROUND function (typically with rounding units such as 1e-9) when comparing numeric values ...

Proud to be aligned with an expert here! 🙂

High-Performance SAS Coding - Third Edition

Rick_SAS · Posted 07-08-2020 11:27 AM

I'm late to this discussion, but a guiding principle for numerical computations is "don't perform exact comparisons on floating-point numbers." You can use the COMPFUZZ function in Base SAS to help compare two values for equality.

For inequalities, the typical fuzzy comparison is to compare a difference with a "small value." "Small" is relative to the magnitude of the numbers. For example: [EDIT: I had signs wrong originally]

Instead of testing x < y, you can test (x - y) < -epsilon
Instead of x > y, you can test (x - y) > epsilon

The article "Avoid testing floating-point values for equality" includes references and some discussion of how to choose epsilon.

FreelanceReinh · Posted 07-09-2020 03:20 PM

Hi @Rick_SAS,

I think the signs of "epsilon" in your inequalities should be reversed (unless epsilon is negative), assuming that the intention is to make the inequality stronger.

Rick_SAS · Posted 07-09-2020 04:13 PM

The equations are what I intended to write. Whether you agree with them depends on what you want to test. If I want to do a fuzzy test of

x > y

then I am willing to allow x to be a tiny bit larger than y due to numerical roundoff. Thus I decide to test

x > y + epsilon

or

x - y > epsilon

A similar argument holds for the less-than inequality.

FreelanceReinh · Posted 07-09-2020 05:27 PM

@Rick_SAS wrote:

The equations are what I intended to write. Whether you agree with them depends on what you want to test. If I want to do a fuzzy test of

x > y

then I am willing to allow x to be a tiny bit larger than y due to numerical roundoff. Thus I decide to test

x > y + epsilon

or

x - y > epsilon

A similar argument holds for the less-than inequality.

Now the inequality is correct and matches my understanding, but in your previous post (quoted below) the signs are reversed, aren't they?

@Rick_SAS wrote:

(...)

Instead of testing x < y, you can test (x - y) < epsilon

Instead of x > y, you can test (x - y) > -epsilon

ChrisNZ · Posted 07-09-2020 06:22 PM

The missing function:

options cmplib=WORK.FUNCS;        

proc fcmp outlib=WORK.FUNCS.MATH;
  function equalz(A,B);
    if      missing(A) & missing(B) then return ( A = B            ) ;
    else if missing(A) | missing(B) then return ( 0                ) ;
    else                                 return ( abs(A-B) < 1e-12 ) ;
  endsub;
run;

High-Performance SAS Coding - Third Edition

Rick_SAS · Posted 07-10-2020 05:39 AM

Yes, you are correct. Thanks for the sharp eyes. I edited my original response to correct the mistake.

FreelanceReinh · Posted 07-08-2020 03:45 PM

Hi @SP_SAS,

Here is what happens internally in the calculation 2.55-2:

In the binary system (mathematically), the decimal number 2.55 looks like this:

10.1000110011001100110011001100110011001100110011001100110011001100...

The pattern "0011" is infinitely repeating.

The internal 64-bit floating-point representations of 2.55, 2.55−2 and 0.55 in SAS under Windows (as documented in Numerical Accuracy in SAS Software) can be displayed with the BINARY64. format:

720  data _null_;
721  do x=2.55, 2.55-2, 0.55;
722    put x binary64.;
723  end;
724  run;

0100000000000100011001100110011001100110011001100110011001100110
0011111111100001100110011001100110011001100110011001100110011000
0011111111100001100110011001100110011001100110011001100110011010

You see the repeating pattern here as well. The binary digits highlighted in blue represent the fractional parts of the numbers. For 2.55 two more bits (red and purple) are needed due to the non-zero integer part: These are the two digits surrounding the point (".") in the mathematical binary representation above. When 2 is subtracted from 2.55, the integer part cancels out, these two bits are removed, the subsequent bits are shifted to the left and leave room for two bits at the end.

In the result 2.55−2 these two least significant bits are simply filled with zeros (highlighted in bold) because the representation of 2.55 doesn't contain the corresponding information. The best possible representation of 0.55, however, contains non-trivial digits in these places: The exact last two digits 01, followed by 10011001... in the mathematical representation, are rounded up to 10.

So, the difference between 0.55 and 2.55−2 in the internal representation is the "1" in the penultimate bit. This "1" stands for the value 2**-52=2.2204...E-16 (see @Tom's variable diff).

Classroom Training Available!