While computer science folks learn this early in their studies, this note describes what for some will be a non-intuitive consequence of finite numeric precision in digital computing. It is NOT a SAS issue – it’s a digital computing issue.
The largest consecutive integer exactly represented in floating point storage (see "Largest Integer Represented Exactly?" - NOT Exactly) as used by SAS on Windows and many other machines is 9,007,199,254,740,992, which can be generated by the CONSTANT function:
A=constant(‘exactint’,8); /*largest consecutive integer for 8-byte real number */
But as the link notes, there are lots of non-consecutive integers greater than A which are also accurately stored. For instance, every even number between A and 2*A, every 0mod4 integer between 2*A and 4*A, etc.
Edited note: In fact, there are only integers above A, with increasingly spaced integers. (And there are only integers between A down to 0.5*A, with no integers skipped).
This means that the expression A+1 will not result in 9,007,199,254,740,993, because that number can’t be represented. Instead the arithmetic implementation will generate 9,007,199,254,740,992 – the original A. And so will 1+A. At least in this case A+B=B+A. Both of them have the same “rounding error”.
So what about adding 2 to A? Both A+2 and 2+A generate 9,007,199,254,740,994 – as should be intuitively expected – no rounding error. And of course A+B=B+A.
But consider A+B+C versus C+B+A.
This is because A+1+1 is processed as
Code to illustrate this follows:
data _null_;
A=constant('exactint'); B=1; C=1;
put (A B C) (= +3 );
sum_AB=sum(A,B);
sum_BA=sum(B,A);
put / '*** SUM(A,B) equals SUM(B,A) *** ' / sum_AB= / sum_BA=;
sum_ABC=sum(A,B,C);
sum_CBA=sum(C,B,A);
put / '*** But SUM(A,B,C) need NOT equal SUM(B,C,A) *** '
/ sum_ABC= / sum_CBA=;
format A: sum_: comma22.0;
run;
This is a trivial example of why generating summary statistics from a given dataset can change when that data set is sorted. The most "accurate" way to get, say, a sum would be to sort the data in ascending absolute value order. But you'd have to have some very pathological data (like mixing values such as 1 and 9,007,199,254,740,992) to get meaningful differences.
> But you'd have to have some very pathological data (like mixing values such as 1 and 9,007,199,254,740,992) to get meaningful differences.
Do you have to have to have some pathological OCD issue to think of this post? 😉 Interesting, and obvious when well explained as you did. 😁
@ChrisNZ wrote:
> But you'd have to have some very pathological data (like mixing values such as 1 and 9,007,199,254,740,992) to get meaningful differences.
Do you have to have to have some pathological OCD issue to think of this post? 😉
Actually I've always found plain vanilla OCD to be sufficient.
> Actually I've always found plain vanilla OCD to be sufficient.
Fair enough. And agree. 🙂
Thanks, @mkeintz. This is an instructive example where it's easy to see how the differences come about.
Here's another example (using SAS 9.4 under Windows) illustrating your point with "real-world" data:
data test;
set sashelp.class;
bmi=round(703*weight/height**2, .1);
run;
proc transpose data=test out=bmi prefix=BMI_;
var bmi;
id name;
run;
data sums;
set bmi(keep=BMI_Alfred BMI_Carol BMI_Janet);
array b[3] BMI:;
sum_ACJ=b[1]+b[2]+b[3];
sum_CAJ=b[2]+b[1]+b[3];
sum_AJC=b[1]+b[3]+b[2];
sum_JAC=b[3]+b[1]+b[2];
sum_CJA=b[2]+b[3]+b[1];
sum_JCA=b[3]+b[2]+b[1];
format s: hex16.;
run;
Result:
BMI_ BMI_ BMI_ Alfred Carol Janet sum_ACJ sum_CAJ sum_AJC 16.6 18.3 20.2 404B8CCCCCCCCCCE 404B8CCCCCCCCCCE 404B8CCCCCCCCCCC sum_JAC sum_CJA sum_JCA 404B8CCCCCCCCCCC 404B8CCCCCCCCCCD 404B8CCCCCCCCCCD
As soon as non-integer values are involved (even with only a single decimal place, except .5), there's a substantial risk of rounding errors in calculations.
Since "A+B always equals B+A," we can expect sum_ACJ=sum_CAJ, sum_AJC=sum_JAC and sum_CJA=sum_JCA, but there's no guarantee for more equalities. Indeed, three different sums occur in the example above, which is just one of dozens of similar cases that occur within the BMI values of SASHELP.CLASS.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.