Hi all,
I'm wondering why I'm getting diffrent values for mean using proc summary - example below:
1) when input dataset is sorted by variable 'a' then I get mean=0.15525
2) when input dataset is not sorted , I get mean=0.15524999999999
data b;
format a best20.;
input a;
datalines;
0.187
0.171
0.183
0.08
;run;
/*proc sort data=b;by a;run;*/
proc summary data=b nway missing noprint ;
var a;
output out = out_b mean=mean ;
run;
SAS mathematically has about 14 digits of precision, so these are the same answers. You will drive yourself crazy trying to understand the effects of machine precision.
The only difference between the two results is the format.
You sort dataset B where you added the format a best20.; - which results into mean=0.15524999999999.
If you round that result to 8 characters, which is the default, you get the mean=0.155250 (= 0.15525)
The only difference between the two results is the format.
You sort dataset B where you added the format a best20.; - which results into mean=0.15524999999999.
If you round that result to 8 characters, which is the default, you get the mean=0.155250 (= 0.15525)
thanks for answer Shmuel, but:
a) which step is rounding it to 8 characters and why in first scenario this 'default' rounding didn't work ?
b) why you think there is different format? dataset B has best20. format, after sorting there is still best20. format, and when proc summary is creating output there is again best20. format.
final dataset 'out_b' in both scenarios has still the same format best20.
Try this to see just how similar your two results are. As others correctly pointed out, small floating point differences are common, expected, and do not indicate anything went wrong.
data b;
format a best20.;
input a;
datalines;
0.187
0.171
0.183
0.08
;
proc summary data=b nway missing noprint ;
var a;
output out = out_b mean=mean1;
run;
proc sort data=b;by a;run;
proc summary data=b nway missing noprint ;
var a;
output out = out_c mean=mean2;
run;
data all(drop=_:);
merge out_b out_c;
diff = mean1 - mean2;
format _numeric_ 20.18;
run;
proc print; run;
Obs mean1 mean2 diff
1 0.155249999999990000 0.155250000000000000 -.000000000000000028
N
thanks for answer WarrenKuhfeld.
I'm not saying that something went wrong or difference is huge.
Question is why sorting has influence on small floating point differences?
It changes the order of the floating point arithmetic. Try fiddling around with programs like this, and you will see that different orders give different results.
data x;
x = 100;
x = x + 1/10;
x = x + 1/3;
y = 1/10;
y = y + 1/3;
y = y + 100;
diff = x - y;
format _numeric_ 20.16;
run;
proc print; run;
@m491_2 wrote:
I'm not saying that something went wrong or difference is huge.
Question is why sorting has influence on small floating point differences?
I doubt SAS is going to release their underlying code to us so we can see how this happens. As I said, I think the whole idea of trying to figure out why machine precision gives one answer in one situation and a different answer in another situation is not worth the time and effort.
You are right.
It seems that the sort changes somehow the precision of data so that proc summary (proc means too)
calulates the mean into a round value.
By the way, I have changed one value - from 0.08 into 0.080001
and got the same mean (=0.15525025) value before and after sort.
I have no better answer.
Intermediate results get stored for each sum. They can change slightly depending on which numbers get added to which other numbers. So yes, sorting affects the results.
support.sas.com/resources/papers/proceedings11/275-2011.pdf
Here are some sources of more information. @PaigeMiller is right though; I would not spend a lot of time worrying about such things.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.