BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
debashree
Fluorite | Level 6

I want to compute the pooled sigma (standard deviation) for the below-mentioned group:

 

GroupCount Mean Sigma
A2661.745.3
B218.061.64
C2191.41.41
D25.510.352
E237.2650.898

 

I have calculated it using the below-mentioned program in SAS. The result I am getting for pooled sigma in SAS is 20.2864. 

I verified this pooled sigma value for the same dataset using Minitab and the Minitab output for pooled sigma is 20.2686 which is slightly different than the one I am getting in SAS using the below-mentioned PROC SQL program (got it from SAS help). 

-------------------

data Deba;
input group $ Count Mean Sigma;
datalines;
A 2 661.7 45.3
B 2 18.06 1.64
C 2 191.4 1.41
D 2 5.51 0.352
E 2 37.265 0.898
;
proc sql;

select *,sqrt(sum((count-1)*sigma**2) / (sum(count)-count(*))) as pool_sigma from Deba

quit;

-------------------------

 

I am wondering why the Minitab and SAS values are not matching? In MINITAB I am feeding the raw data which is giving me pooled sigma of 20.2686, whereas in SAS I am feeding the above-mentioned data group as I don't know the whole program in SAS to calculate the pooled sigma from my raw data.

Could this be a reason for the difference in MINITAB and SAS results? OR Do I need to use a different formula for pooled sigma?

 

My raw data is below- 

GroupsResults
A693.70
A629.70
B19.22
B16.90
C192.40
C190.40
D5.76
D5.26
E36.63
E37.90

 

Request if anyone could please help me with this. 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

Since you use ANOVA in Minitab, let's do an ANOVA in SAS

 

data a;
    input group $ results;
	cards;
A	693.70
A	629.70
B	19.22
B	16.90
C	192.40
C	190.40
D	5.76
D	5.26
E	36.63
E	37.90
;
run;

ods listing;
proc glm data=a;
	class group;
	model results=group;
run; quit;
ods listing close;

This produces the following output.

Source                      DF         Squares     Mean Square    F Value    Pr > F

Model                        4     618341.2342     154585.3085     376.37    <.0001

Error                        5       2053.6226        410.7245

Corrected Total              9     620394.8568


R-Square     Coeff Var      Root MSE    results Mean

0.996690      11.08741      20.26634        182.7870

The Root MSE of 20.26634 is the pooled standard deviation.

 

--
Paige Miller

View solution in original post

15 REPLIES 15
PaigeMiller
Diamond | Level 26

@debashree wrote:

I am wondering why the Minitab and SAS values are not matching? In MINITAB I am feeding the raw data which is giving me pooled sigma of 20.2686, whereas in SAS I am feeding the above-mentioned data group as I don't know the whole program in SAS to calculate the pooled sigma from my raw data.

Could this be a reason for the difference in MINITAB and SAS results? OR Do I need to use a different formula for pooled sigma?


One computer has the data typed in, where the sigma has been truncated to two decimal places, while another computer actually computed the sigma and has 14 decimal places.

 

This is just round-off error. So I would not expect them to match.

--
Paige Miller
debashree
Fluorite | Level 6
Thank you @PaigeMiller . So, that confirms my doubt that this is the reason that the results are not matching. 

But then is there a program in SAS so that I can calculate Pooled Sigma from my raw data itself in SAS? 

I am not able to figure out how to calculate pooled sigma in SAS from raw data. 
PaigeMiller
Diamond | Level 26

@debashree wrote:
Thank you @PaigeMiller . So, that confirms my doubt that this is the reason that the results are not matching. 

But then is there a program in SAS so that I can calculate Pooled Sigma from my raw data itself in SAS? 

I am not able to figure out how to calculate pooled sigma in SAS from raw data. 

The problem you are having is not SAS. The problem you are having is that the data is different between the two different softwares (SAS and Minitab)

 

PROC MEANS/PROC SUMMARY and many other procedures will calculate a pooled standard deviation from any data.

--
Paige Miller
debashree
Fluorite | Level 6
@PaigeMiller  Yes, I agree with this.

But I always use SAS for my data analysis and I am struggling to obtain a pooled standard deviation in SAS using my raw data.

Will you please be able to share syntax for proc means/proc summary that can give pooled standard deviation in SAS using my raw data? I didn't come across any. 
PaigeMiller
Diamond | Level 26

Since you use ANOVA in Minitab, let's do an ANOVA in SAS

 

data a;
    input group $ results;
	cards;
A	693.70
A	629.70
B	19.22
B	16.90
C	192.40
C	190.40
D	5.76
D	5.26
E	36.63
E	37.90
;
run;

ods listing;
proc glm data=a;
	class group;
	model results=group;
run; quit;
ods listing close;

This produces the following output.

Source                      DF         Squares     Mean Square    F Value    Pr > F

Model                        4     618341.2342     154585.3085     376.37    <.0001

Error                        5       2053.6226        410.7245

Corrected Total              9     620394.8568


R-Square     Coeff Var      Root MSE    results Mean

0.996690      11.08741      20.26634        182.7870

The Root MSE of 20.26634 is the pooled standard deviation.

 

--
Paige Miller
debashree
Fluorite | Level 6
Hello @Kurt_Bremser , 

I calculated the individual group sigma in Minitab using ANOVA. 

Minitab calculates the group sigma automatically and also generates a pooled sigma. An example in Minitab can be found from their link below.

https://support.minitab.com/en-us/minitab/18/help-and-how-to/modeling-statistics/anova/how-to/one-wa...

This is the exact way I am getting my group and pooled sigma in MINITAB.

 
debashree
Fluorite | Level 6
I am looking for a procedure in SAS that can give me pooled standard deviation using my raw data.
Will you please be able to share any? I couldn't find anything in SAS.
FreelanceReinh
Jade | Level 19

Hello @debashree (and BTW welcome to the SAS Support Communities),

 

Possibly rounding errors are involved twice: in the raw data and, obviously, in dataset Deba with its rounded Sigma values.

 

Using the raw data you get yet another slightly different value for the pooled standard deviation (the same as Paige Miller obtained with PROC GLM):

data have;
input group :$1. result;
cards;
A 693.70
A 629.70
B 19.22
B 16.90
C 192.40
C 190.40
D 5.76
D 5.26
E 36.63
E 37.90
;

proc sql;
create table stats as
select group, count(result) as count, mean(result) as mean, std(result) as sigma from have
group by group;

select sqrt(sum((count-1)*sigma**2) / (sum(count)-count(*))) as pool_sigma from stats;
quit;

Result: 20.26634. Rounded to two decimals, i.e. the precision of the raw data, this already matches the Minitab result.

 

Note that for group D Sigma=0.5/sqrt(2)=0.35355..., which does not match the 0.352 from dataset Deba. However, the exact result values for that group could be (almost) as extreme as 5.755 and 5.265, respectively, without changing their rounded values, leading to Sigma=0.49/sqrt(2)=0.34648... So, 0.352 (if it's not just a typo) is well within the possible range due to rounding errors.

 

Similarly, other result values can be modified without changing their rounded values. Here is an example (with modifications only in groups A and D), which yields the same rounded values for Mean and Sigma as given in dataset Deba and the pooled standard deviation 20.2686 as you computed with Minitab (use the above PROC SQL step to confirm):

data have;
input group :$1. result;
cards;
A 693.70358
A 629.69642
B 19.22
B 16.90
C 192.40
C 190.40
D 5.759
D 5.261
E 36.63
E 37.90
;

 

debashree
Fluorite | Level 6
Thank you
KachiM
Rhodochrosite | Level 12

@debashree 

If you go by middle school algebra, here is the simple Data Step:

 

proc sort data = have out = have1;
   by Groups;
run;

data _null_;
   do n = 1 by 1 until(last.Groups);
      set have1 end = last;
      by Groups;
      Tsum + Results; *Total Group Sum;
      TSS  + (Results * Results); *Total Group SS;
      if last.Groups then do;
         Css = TSS - Tsum * Tsum / n; * Corrected SS;
         Sigma = sqrt(Css /(n-1));    * Sigma for Group;
         put Sigma =;
         call missing(Tsum,TSS);
         TCSS + CSS; * Total Corrected SS;
         TDF + (n-1); * Total DF;
      end;
   end;
   if last then do;
      Sigma = sqrt(TCSS/TDF); * Pooled Sigma;
      put TCSS = TDF = Sigma =;
   end;
run;

This proves that the pooled Sigma is 

20.266339827
PaigeMiller
Diamond | Level 26

For completeness, I mentioned that you can do this in PROC SUMMARY, so I present that code as well. It seems that you ahve to run PROC SUMMARY twice to make this work.

 

proc summary data=a nway;
	class group;
	var results;
	output out=_stats_ css=;
run;
proc summary data=_stats_(rename=(_freq_=nobs));
	var results nobs;
	output out=_stats1_ sum=;
run;

data pooled_Std;
	set _stats1_;
	/* Compute degrees of freedom for error */
	dfe=nobs-_freq_;
	pooled_std = sqrt(results/dfe);
run;
--
Paige Miller
debashree
Fluorite | Level 6
Thank you so much. I got it now.
So much happy to have such a supportive environment.
I have been struggling with this last couple of days, and no one could correctly answer my queries. Its the first time I am using SAS support community and genuinely delighted with the support.
debashree
Fluorite | Level 6
Thank you for this too.

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 15 replies
  • 6985 views
  • 12 likes
  • 5 in conversation