Hi guys,
Although this was supposed to be very simple code and to function without any issues, I have the outcome I am not able to solve or see where the problem, either with my proc sql code or SAS data download:
So it's kind of simple, I need to compare if the sum of var2 values is equal to the value of of var3 (the reason for that is that users are entering var3 manually by summing up the values of var2, so the mistakes are present).
The code is:
create table newTable as
select var1, sum(var2) as newVar2, var3
case when sum(var2)=var3 then "OK"
else "CHECK!" end as Var4
from table1
group by var1;
quit;
The problem is that there are records where sum(var2)=var3 but the var4 has the output "CHECK"
for example sum(var2) = 1.9 and var 3 =1.9 but the var4 will be CHECK (out of over 600 records, this happens with 20 records), for all other records works well.
Maybe I should mention that I've checked the format of this sum(var2) and var3: sum(var2) is BEST12 numeric, var3 is 5.1 numeric. Even if I bring them to the same format (BEST12), it would be still the same (though that some format discrepancy may cause that...)
So no matter what I do, I would get something like this:
var1 | newVar2 | var3 | var4 |
rec1 | 3.1 | 3.1 | CHECK! |
rec2 | 18.2 | 18.2 | OK |
rec3 | 5 | 5 | OK |
rec4 | 12.4 | 12.4 | CHECK! |
rec5 | 17.8 | 17.8 | OK |
I am not able to see the problem in terms of my code, data format... is there anything elsewhere I should look for he?
Thank you very much
Hi @Batta,
Here is an example (using SAS 9.4 under Windows) illustrating Kurt Bremser's points (see Numerical Accuracy in SAS Software for details):
1505 data _null_; 1506 if 1.4+1.7 ne 3.1 then put 'Surprised?'; 1507 run; Surprised? NOTE: DATA statement used (Total process time):
Default formats and most other numeric formats don't reveal the tiny difference (of about 4.44E-16) between the internal values (i.e., binary representations) of 1.4+1.7 and 3.1. Special formats like HEX16. do so:
data test;
var3=3.1;
var2=1.4; output;
var2=1.7; output;
run;
proc sql;
select sum(var2) as s format=hex16., var3 format=hex16.,
calculated s-var3 as d, round(calculated s, 1e-9) as r format=hex16.
from test
group by var3;
quit;
Result:
s var3 d r -------------------------------------------------------------- 4008CCCCCCCCCCCC 4008CCCCCCCCCCCD -444E-18 4008CCCCCCCCCCCD
You run into the usual problem with numeric precision. SAS stores numbers in 8-byte real, which means that there are 52 bits available for the mantissa (the rest is sign and exponent), which translates to a maximum of 15 decimal digits that can be reliably stored.
And most decimal fractions turn into periodic numbers when translated to binary notation.
All this leads to imprecisions during calculations, where you get differences in the 1E-14 range or so.
Use the ROUND function or check for a maximum difference:
case when round(sum(var2),.0001)=round(var3,.0001)
or
case when abs(sum(var2) - var3) lt 1E-5
Do a google search for "sas numeric precision", and you will find lots of stuff to read about this.
Also see https://en.wikipedia.org/wiki/Double-precision_floating-point_format
Thank you very much for your help.
Sincerely,
Batta
Hi @Batta,
Here is an example (using SAS 9.4 under Windows) illustrating Kurt Bremser's points (see Numerical Accuracy in SAS Software for details):
1505 data _null_; 1506 if 1.4+1.7 ne 3.1 then put 'Surprised?'; 1507 run; Surprised? NOTE: DATA statement used (Total process time):
Default formats and most other numeric formats don't reveal the tiny difference (of about 4.44E-16) between the internal values (i.e., binary representations) of 1.4+1.7 and 3.1. Special formats like HEX16. do so:
data test;
var3=3.1;
var2=1.4; output;
var2=1.7; output;
run;
proc sql;
select sum(var2) as s format=hex16., var3 format=hex16.,
calculated s-var3 as d, round(calculated s, 1e-9) as r format=hex16.
from test
group by var3;
quit;
Result:
s var3 d r -------------------------------------------------------------- 4008CCCCCCCCCCCC 4008CCCCCCCCCCCD -444E-18 4008CCCCCCCCCCCD
The FUZZ() function can help here, detecting when two values are within roundoff error, and then treating them as equal.
data _null_;
if 1.4+1.7 ne 3.1 then put 'Surprised?';
run;
data _null_;
if fuzz(1.4+1.7 - 3.1) then put 'Surprised?';
run;
@PaigeMiller wrote:
The FUZZ() function can help here, detecting when two values are within roundoff error, and then treating them as equal.
Yes, this can be a handy function in many such cases. However, it is less flexible than Kurt Bremser's approaches in that it uses the fixed constant 1E-12 as the maximum tolerated absolute difference. The rounding error in sums of moderately large numbers (or sufficiently many numbers) can exceed this threshold.
Example (where the difference is about -1.8E-12):
1797 data _null_; 1798 if fuzz(4101.4+4101.7 - 8203.1) then put 'Surprised?'; 1799 run; Surprised?
Thank you very much for this explanation.
Very useful.
Sincerely,
Batta
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.