BookmarkSubscribeRSS Feed
m491_2
Calcite | Level 5

 

Hi all,

 

I'm wondering why I'm getting diffrent values for mean using proc summary - example below:

1) when input dataset is sorted by variable 'a' then I get  mean=0.15525

2) when input dataset is not sorted , I get  mean=0.15524999999999

 

data b;
format a best20.;
input a;
datalines;
0.187
0.171
0.183
0.08
;run;

 

/*proc sort data=b;by a;run;*/

 

proc summary data=b nway missing noprint ;
var a;
output out = out_b mean=mean ;
run;

10 REPLIES 10
PaigeMiller
Diamond | Level 26

SAS mathematically has about 14 digits of precision, so these are the same answers. You will drive yourself crazy trying to understand the effects of machine precision.

--
Paige Miller
Shmuel
Garnet | Level 18

The only difference between the two results is the format.

You sort dataset B where you added the format a best20.;  - which results into mean=0.15524999999999.

If you round that result to 8 characters, which is the default, you get the mean=0.155250 (= 0.15525)

m491_2
Calcite | Level 5

The only difference between the two results is the format.

You sort dataset B where you added the format a best20.;  - which results into mean=0.15524999999999.

If you round that result to 8 characters, which is the default, you get the mean=0.155250 (= 0.15525)

 

 

thanks for answer Shmuel, but:

a) which step is rounding it to 8 characters and why in first scenario this 'default' rounding didn't work ?

b) why you think there is different format? dataset B has best20. format, after sorting there is still best20. format, and when proc summary is creating output there is again best20. format. 

 

final dataset 'out_b' in both scenarios has still the same format best20.

 

 

WarrenKuhfeld
Rhodochrosite | Level 12

Try this to see just how similar your two results are.  As others correctly pointed out, small floating point differences are common, expected, and do not indicate anything went wrong.

data b;
   format a best20.;
   input a;
   datalines;
0.187
0.171
0.183
0.08
;
 
proc summary data=b nway missing noprint ;
   var a;
   output out = out_b mean=mean1;
run;

proc sort data=b;by a;run;
 
proc summary data=b nway missing noprint ;
   var a;
   output out = out_c mean=mean2;
run;

data all(drop=_:);
   merge out_b out_c;
   diff = mean1 - mean2;
   format _numeric_ 20.18;
run;   

proc print; run;

  Obs                   mean1                   mean2                    diff

   1     0.155249999999990000    0.155250000000000000    -.000000000000000028
N
m491_2
Calcite | Level 5

thanks for answer WarrenKuhfeld.

 

I'm not saying that something went wrong or difference is huge.

Question is why sorting has influence on  small floating point differences?

 

 

WarrenKuhfeld
Rhodochrosite | Level 12

It changes the order of the floating point arithmetic.  Try fiddling around with programs like this, and you will see that different orders give different results.

data x;
   x = 100;
   x = x + 1/10;
   x = x + 1/3;
   y = 1/10;
   y = y + 1/3;
   y = y + 100;
   diff = x - y;
   format _numeric_ 20.16;
   run;
   
proc print; run;   
PaigeMiller
Diamond | Level 26

@m491_2 wrote:

 

I'm not saying that something went wrong or difference is huge.

Question is why sorting has influence on  small floating point differences?


I doubt SAS is going to release their underlying code to us so we can see how this happens. As I said, I think the whole idea of trying to figure out why machine precision gives one answer in one situation and a different answer in another situation is not worth the time and effort.

--
Paige Miller
Shmuel
Garnet | Level 18

You are right.

 

It seems that the sort changes somehow the precision of data so that proc summary (proc means too)

calulates the mean into a round value.

 

By the way, I have changed one value - from 0.08 into 0.080001

and got the same mean (=0.15525025)  value before and after sort.

 

I have no better answer.

WarrenKuhfeld
Rhodochrosite | Level 12

Intermediate results get stored for each sum.  They can change slightly depending on which numbers get added to which other numbers.  So yes, sorting affects the results.

WarrenKuhfeld
Rhodochrosite | Level 12

support.sas.com/resources/papers/proceedings11/275-2011.pdf 

http://go.documentation.sas.com/?docsetId=lrcon&docsetTarget=p0ji1unv6thm0dn1gp4t01a1u0g6.htm&docset...

 

Here are some sources of more information. @PaigeMiller is right though; I would not spend a lot of time worrying about such things.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 2895 views
  • 3 likes
  • 4 in conversation