I want to compute the pooled sigma (standard deviation) for the below-mentioned group:
Group | Count | Mean | Sigma |
A | 2 | 661.7 | 45.3 |
B | 2 | 18.06 | 1.64 |
C | 2 | 191.4 | 1.41 |
D | 2 | 5.51 | 0.352 |
E | 2 | 37.265 | 0.898 |
I have calculated it using the below-mentioned program in SAS. The result I am getting for pooled sigma in SAS is 20.2864.
I verified this pooled sigma value for the same dataset using Minitab and the Minitab output for pooled sigma is 20.2686 which is slightly different than the one I am getting in SAS using the below-mentioned PROC SQL program (got it from SAS help).
-------------------
data Deba;
input group $ Count Mean Sigma;
datalines;
A 2 661.7 45.3
B 2 18.06 1.64
C 2 191.4 1.41
D 2 5.51 0.352
E 2 37.265 0.898
;
proc sql;
select *,sqrt(sum((count-1)*sigma**2) / (sum(count)-count(*))) as pool_sigma from Deba
quit;
-------------------------
I am wondering why the Minitab and SAS values are not matching? In MINITAB I am feeding the raw data which is giving me pooled sigma of 20.2686, whereas in SAS I am feeding the above-mentioned data group as I don't know the whole program in SAS to calculate the pooled sigma from my raw data.
Could this be a reason for the difference in MINITAB and SAS results? OR Do I need to use a different formula for pooled sigma?
My raw data is below-
Groups | Results |
A | 693.70 |
A | 629.70 |
B | 19.22 |
B | 16.90 |
C | 192.40 |
C | 190.40 |
D | 5.76 |
D | 5.26 |
E | 36.63 |
E | 37.90 |
Request if anyone could please help me with this.
Since you use ANOVA in Minitab, let's do an ANOVA in SAS
data a;
input group $ results;
cards;
A 693.70
A 629.70
B 19.22
B 16.90
C 192.40
C 190.40
D 5.76
D 5.26
E 36.63
E 37.90
;
run;
ods listing;
proc glm data=a;
class group;
model results=group;
run; quit;
ods listing close;
This produces the following output.
Source DF Squares Mean Square F Value Pr > F Model 4 618341.2342 154585.3085 376.37 <.0001 Error 5 2053.6226 410.7245 Corrected Total 9 620394.8568 R-Square Coeff Var Root MSE results Mean 0.996690 11.08741 20.26634 182.7870
The Root MSE of 20.26634 is the pooled standard deviation.
@debashree wrote:
I am wondering why the Minitab and SAS values are not matching? In MINITAB I am feeding the raw data which is giving me pooled sigma of 20.2686, whereas in SAS I am feeding the above-mentioned data group as I don't know the whole program in SAS to calculate the pooled sigma from my raw data.
Could this be a reason for the difference in MINITAB and SAS results? OR Do I need to use a different formula for pooled sigma?
One computer has the data typed in, where the sigma has been truncated to two decimal places, while another computer actually computed the sigma and has 14 decimal places.
This is just round-off error. So I would not expect them to match.
@debashree wrote:
Thank you @PaigeMiller . So, that confirms my doubt that this is the reason that the results are not matching.
But then is there a program in SAS so that I can calculate Pooled Sigma from my raw data itself in SAS?
I am not able to figure out how to calculate pooled sigma in SAS from raw data.
The problem you are having is not SAS. The problem you are having is that the data is different between the two different softwares (SAS and Minitab)
PROC MEANS/PROC SUMMARY and many other procedures will calculate a pooled standard deviation from any data.
Since you use ANOVA in Minitab, let's do an ANOVA in SAS
data a;
input group $ results;
cards;
A 693.70
A 629.70
B 19.22
B 16.90
C 192.40
C 190.40
D 5.76
D 5.26
E 36.63
E 37.90
;
run;
ods listing;
proc glm data=a;
class group;
model results=group;
run; quit;
ods listing close;
This produces the following output.
Source DF Squares Mean Square F Value Pr > F Model 4 618341.2342 154585.3085 376.37 <.0001 Error 5 2053.6226 410.7245 Corrected Total 9 620394.8568 R-Square Coeff Var Root MSE results Mean 0.996690 11.08741 20.26634 182.7870
The Root MSE of 20.26634 is the pooled standard deviation.
How did you calculate the group sigma in Minitab?
Hello @debashree (and BTW welcome to the SAS Support Communities),
Possibly rounding errors are involved twice: in the raw data and, obviously, in dataset Deba with its rounded Sigma values.
Using the raw data you get yet another slightly different value for the pooled standard deviation (the same as Paige Miller obtained with PROC GLM):
data have;
input group :$1. result;
cards;
A 693.70
A 629.70
B 19.22
B 16.90
C 192.40
C 190.40
D 5.76
D 5.26
E 36.63
E 37.90
;
proc sql;
create table stats as
select group, count(result) as count, mean(result) as mean, std(result) as sigma from have
group by group;
select sqrt(sum((count-1)*sigma**2) / (sum(count)-count(*))) as pool_sigma from stats;
quit;
Result: 20.26634. Rounded to two decimals, i.e. the precision of the raw data, this already matches the Minitab result.
Note that for group D Sigma=0.5/sqrt(2)=0.35355..., which does not match the 0.352 from dataset Deba. However, the exact result values for that group could be (almost) as extreme as 5.755 and 5.265, respectively, without changing their rounded values, leading to Sigma=0.49/sqrt(2)=0.34648... So, 0.352 (if it's not just a typo) is well within the possible range due to rounding errors.
Similarly, other result values can be modified without changing their rounded values. Here is an example (with modifications only in groups A and D), which yields the same rounded values for Mean and Sigma as given in dataset Deba and the pooled standard deviation 20.2686 as you computed with Minitab (use the above PROC SQL step to confirm):
data have;
input group :$1. result;
cards;
A 693.70358
A 629.69642
B 19.22
B 16.90
C 192.40
C 190.40
D 5.759
D 5.261
E 36.63
E 37.90
;
If you go by middle school algebra, here is the simple Data Step:
proc sort data = have out = have1;
by Groups;
run;
data _null_;
do n = 1 by 1 until(last.Groups);
set have1 end = last;
by Groups;
Tsum + Results; *Total Group Sum;
TSS + (Results * Results); *Total Group SS;
if last.Groups then do;
Css = TSS - Tsum * Tsum / n; * Corrected SS;
Sigma = sqrt(Css /(n-1)); * Sigma for Group;
put Sigma =;
call missing(Tsum,TSS);
TCSS + CSS; * Total Corrected SS;
TDF + (n-1); * Total DF;
end;
end;
if last then do;
Sigma = sqrt(TCSS/TDF); * Pooled Sigma;
put TCSS = TDF = Sigma =;
end;
run;
This proves that the pooled Sigma is
20.266339827
For completeness, I mentioned that you can do this in PROC SUMMARY, so I present that code as well. It seems that you ahve to run PROC SUMMARY twice to make this work.
proc summary data=a nway;
class group;
var results;
output out=_stats_ css=;
run;
proc summary data=_stats_(rename=(_freq_=nobs));
var results nobs;
output out=_stats1_ sum=;
run;
data pooled_Std;
set _stats1_;
/* Compute degrees of freedom for error */
dfe=nobs-_freq_;
pooled_std = sqrt(results/dfe);
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.