Solved: Re: comparing two values

Batta · Posted 08-12-2021 06:36 AM

Hi guys,

Although this was supposed to be very simple code and to function without any issues, I have the outcome I am not able to solve or see where the problem, either with my proc sql code or SAS data download:

So it's kind of simple, I need to compare if the sum of var2 values is equal to the value of of var3 (the reason for that is that users are entering var3 manually by summing up the values of var2, so the mistakes are present).

The code is:

create table newTable as

select var1, sum(var2) as newVar2, var3

case when sum(var2)=var3 then "OK"

else "CHECK!" end as Var4

from table1

group by var1;

quit;

The problem is that there are records where sum(var2)=var3 but the var4 has the output "CHECK"

for example sum(var2) = 1.9 and var 3 =1.9 but the var4 will be CHECK (out of over 600 records, this happens with 20 records), for all other records works well.

Maybe I should mention that I've checked the format of this sum(var2) and var3: sum(var2) is BEST12 numeric, var3 is 5.1 numeric. Even if I bring them to the same format (BEST12), it would be still the same (though that some format discrepancy may cause that...)

So no matter what I do, I would get something like this:

var1	newVar2	var3	var4
rec1	3.1	3.1	CHECK!
rec2	18.2	18.2	OK
rec3	5	5	OK
rec4	12.4	12.4	CHECK!
rec5	17.8	17.8	OK

I am not able to see the problem in terms of my code, data format... is there anything elsewhere I should look for he?

Thank you very much

FreelanceReinh · Posted 08-12-2021 07:46 AM

Hi @Batta,

Here is an example (using SAS 9.4 under Windows) illustrating Kurt Bremser's points (see Numerical Accuracy in SAS Software for details):

1505  data _null_;
1506  if 1.4+1.7 ne 3.1 then put 'Surprised?';
1507  run;

Surprised?
NOTE: DATA statement used (Total process time):

Default formats and most other numeric formats don't reveal the tiny difference (of about 4.44E-16) between the internal values (i.e., binary representations) of 1.4+1.7 and 3.1. Special formats like HEX16. do so:

data test;
var3=3.1;
var2=1.4; output;
var2=1.7; output;
run;

proc sql;
select sum(var2) as s format=hex16., var3 format=hex16.,
       calculated s-var3 as d, round(calculated s, 1e-9) as r format=hex16.
from test
group by var3;
quit;

Result:

               s              var3         d                 r
--------------------------------------------------------------
4008CCCCCCCCCCCC  4008CCCCCCCCCCCD  -444E-18  4008CCCCCCCCCCCD

View solution in original post

Kurt_Bremser · Posted 08-12-2021 07:05 AM

You run into the usual problem with numeric precision. SAS stores numbers in 8-byte real, which means that there are 52 bits available for the mantissa (the rest is sign and exponent), which translates to a maximum of 15 decimal digits that can be reliably stored.

And most decimal fractions turn into periodic numbers when translated to binary notation.

All this leads to imprecisions during calculations, where you get differences in the 1E-14 range or so.

Use the ROUND function or check for a maximum difference:

case when round(sum(var2),.0001)=round(var3,.0001)

or

case when abs(sum(var2) - var3) lt 1E-5

Do a google search for "sas numeric precision", and you will find lots of stuff to read about this.

Also see https://en.wikipedia.org/wiki/Double-precision_floating-point_format

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Batta · Posted 08-13-2021 08:03 AM

Thank you very much for your help.

Sincerely,

Batta

FreelanceReinh · Posted 08-12-2021 07:46 AM

Hi @Batta,

Here is an example (using SAS 9.4 under Windows) illustrating Kurt Bremser's points (see Numerical Accuracy in SAS Software for details):

1505  data _null_;
1506  if 1.4+1.7 ne 3.1 then put 'Surprised?';
1507  run;

Surprised?
NOTE: DATA statement used (Total process time):

Default formats and most other numeric formats don't reveal the tiny difference (of about 4.44E-16) between the internal values (i.e., binary representations) of 1.4+1.7 and 3.1. Special formats like HEX16. do so:

data test;
var3=3.1;
var2=1.4; output;
var2=1.7; output;
run;

proc sql;
select sum(var2) as s format=hex16., var3 format=hex16.,
       calculated s-var3 as d, round(calculated s, 1e-9) as r format=hex16.
from test
group by var3;
quit;

Result:

               s              var3         d                 r
--------------------------------------------------------------
4008CCCCCCCCCCCC  4008CCCCCCCCCCCD  -444E-18  4008CCCCCCCCCCCD

PaigeMiller · Posted 08-12-2021 07:53 AM

The FUZZ() function can help here, detecting when two values are within roundoff error, and then treating them as equal.

data _null_;
    if 1.4+1.7 ne 3.1 then put 'Surprised?';
run;

data _null_;
     if fuzz(1.4+1.7 - 3.1) then put 'Surprised?';
run;

--
Paige Miller

FreelanceReinh · Posted 08-12-2021 08:36 AM

@PaigeMiller wrote:

The FUZZ() function can help here, detecting when two values are within roundoff error, and then treating them as equal.

Yes, this can be a handy function in many such cases. However, it is less flexible than Kurt Bremser's approaches in that it uses the fixed constant 1E-12 as the maximum tolerated absolute difference. The rounding error in sums of moderately large numbers (or sufficiently many numbers) can exceed this threshold.

Example (where the difference is about -1.8E-12):

1797  data _null_;
1798  if fuzz(4101.4+4101.7 - 8203.1) then put 'Surprised?';
1799  run;

Surprised?

Batta · Posted 08-13-2021 07:55 AM

Thank you very much for this explanation.

Very useful.

Sincerely,

Batta

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away