DATA Step, Macro, Functions and more

proc summary calculating mean

Reply
New Contributor
Posts: 3

proc summary calculating mean

 

Hi all,

 

I'm wondering why I'm getting diffrent values for mean using proc summary - example below:

1) when input dataset is sorted by variable 'a' then I get  mean=0.15525

2) when input dataset is not sorted , I get  mean=0.15524999999999

 

data b;
format a best20.;
input a;
datalines;
0.187
0.171
0.183
0.08
;run;

 

/*proc sort data=b;by a;run;*/

 

proc summary data=b nway missing noprint ;
var a;
output out = out_b mean=mean ;
run;

Respected Advisor
Posts: 2,647

Re: proc summary calculating mean

[ Edited ]

SAS mathematically has about 14 digits of precision, so these are the same answers. You will drive yourself crazy trying to understand the effects of machine precision.

--
Paige Miller
Trusted Advisor
Posts: 1,822

Re: proc summary calculating mean

The only difference between the two results is the format.

You sort dataset B where you added the format a best20.;  - which results into mean=0.15524999999999.

If you round that result to 8 characters, which is the default, you get the mean=0.155250 (= 0.15525)

New Contributor
Posts: 3

Re: proc summary calculating mean

The only difference between the two results is the format.

You sort dataset B where you added the format a best20.;  - which results into mean=0.15524999999999.

If you round that result to 8 characters, which is the default, you get the mean=0.155250 (= 0.15525)

 

 

thanks for answer Shmuel, but:

a) which step is rounding it to 8 characters and why in first scenario this 'default' rounding didn't work ?

b) why you think there is different format? dataset B has best20. format, after sorting there is still best20. format, and when proc summary is creating output there is again best20. format. 

 

final dataset 'out_b' in both scenarios has still the same format best20.

 

 

SAS Super FREQ
Posts: 496

Re: proc summary calculating mean

Try this to see just how similar your two results are.  As others correctly pointed out, small floating point differences are common, expected, and do not indicate anything went wrong.

data b;
   format a best20.;
   input a;
   datalines;
0.187
0.171
0.183
0.08
;
 
proc summary data=b nway missing noprint ;
   var a;
   output out = out_b mean=mean1;
run;

proc sort data=b;by a;run;
 
proc summary data=b nway missing noprint ;
   var a;
   output out = out_c mean=mean2;
run;

data all(drop=_:);
   merge out_b out_c;
   diff = mean1 - mean2;
   format _numeric_ 20.18;
run;   

proc print; run;

  Obs                   mean1                   mean2                    diff

   1     0.155249999999990000    0.155250000000000000    -.000000000000000028
N
New Contributor
Posts: 3

Re: proc summary calculating mean

Posted in reply to WarrenKuhfeld

thanks for answer WarrenKuhfeld.

 

I'm not saying that something went wrong or difference is huge.

Question is why sorting has influence on  small floating point differences?

 

 

SAS Super FREQ
Posts: 496

Re: proc summary calculating mean

It changes the order of the floating point arithmetic.  Try fiddling around with programs like this, and you will see that different orders give different results.

data x;
   x = 100;
   x = x + 1/10;
   x = x + 1/3;
   y = 1/10;
   y = y + 1/3;
   y = y + 100;
   diff = x - y;
   format _numeric_ 20.16;
   run;
   
proc print; run;   
Respected Advisor
Posts: 2,647

Re: proc summary calculating mean


m491_2 wrote:

 

I'm not saying that something went wrong or difference is huge.

Question is why sorting has influence on  small floating point differences?


I doubt SAS is going to release their underlying code to us so we can see how this happens. As I said, I think the whole idea of trying to figure out why machine precision gives one answer in one situation and a different answer in another situation is not worth the time and effort.

--
Paige Miller
Trusted Advisor
Posts: 1,822

Re: proc summary calculating mean

You are right.

 

It seems that the sort changes somehow the precision of data so that proc summary (proc means too)

calulates the mean into a round value.

 

By the way, I have changed one value - from 0.08 into 0.080001

and got the same mean (=0.15525025)  value before and after sort.

 

I have no better answer.

SAS Super FREQ
Posts: 496

Re: proc summary calculating mean

Intermediate results get stored for each sum.  They can change slightly depending on which numbers get added to which other numbers.  So yes, sorting affects the results.

SAS Super FREQ
Posts: 496

Re: proc summary calculating mean

Posted in reply to WarrenKuhfeld

support.sas.com/resources/papers/proceedings11/275-2011.pdf 

http://go.documentation.sas.com/?docsetId=lrcon&docsetTarget=p0ji1unv6thm0dn1gp4t01a1u0g6.htm&docset...

 

Here are some sources of more information. @PaigeMiller is right though; I would not spend a lot of time worrying about such things.

Ask a Question
Discussion stats
  • 10 replies
  • 150 views
  • 3 likes
  • 4 in conversation