I need to calculate the power of equivalence test for paired data. I use the following code:
/* ************************************************************************ */
/* This program is used to calculate the power of an equivalence test */
/* between two groups of paired data based the following data: */
/* mean difference and standard deviation */
/* of the differences between each pair; */
/* coefficient of correlation between the two groups; */
/* sample size (number of pairs); */
/* significance level, and */
/* acceptance interval. */
/* proc POWER is used for the calculation */
*/ */
*/************************************************************************* */
proc power;
pairedmeans
TEST=EQUIV_DIFF
meandiff=1
stddev=2
power = . /*to be calculated*/
npairs =3
corr = 0.9 0.1
alpha =0.05
upper = 7
lower =-7;
ods output output = pow;
run;
In the SAS documentation the paired equivalence test is not described very clear.
My questions are:
@Bim wrote:
I have one more question.
If the correlation is zero in the proc power for paired equivalents test, should I expect that the calculated power of the paired equivalents test is the same as the power for not paired equivalents test?
When I run the following two codes I get different results. The power for the paired test is 0.710 while the power for the not paired equivalents test is 0.899.
Why there is a difference, if the standard deviations, mean difference and sample size are the same in both cases?
The reason for the difference is that different tests are performed: The equivalence test is based on the paired t-test in the former case and on the two-sample t-test in the latter case. The paired t-test uses a different estimator for the standard error than the two-sample t-test (see the formulas for SE in sections "One-Sample Design" and "Two-Independent-Sample Design" of the PROC TTEST documentation). Note that in the tests the standard deviations are estimated from the samples. Our specification sigma=2 is not used there.
As a consequence, the t-statistics have different denominators (while the numerators are the same). Essentially, the different terms are
With our assumption of zero correlation these two may even have the same expected value, but the crucial point is that the degrees of freedom are different: n-1 vs. 2(n-1). In your example with the small n=3 the difference between the corresponding t-distributions (i.e. with df=2 vs. 4) is particularly large.
It's not difficult to verify the power calculation with a simulation (using the two different tests):
%let n=3;
%let mu_diff=1;
%let stddev=2;
%let theta_L=-7;
%let theta_U=7;
%let alpha=0.05;
%let nsim=1e6;
proc format;
value signif
low-&alpha = 'significant'
&alpha<-high = 'not significant';
run;
data pairs;
call streaminit(27182818);
do i=1 to ≁
do id=1 to &n;
x1=rand('normal',0,&stddev);
x2=rand('normal',&mu_diff,&stddev);
output;
end;
end;
run;
proc transpose data=pairs out=twosamp(drop=id rename=(col1=x)) name=grp;
by i id;
var x1 x2;
run;
ods graphics off;
ods exclude all;
ods noresults;
proc ttest data=pairs tost(&theta_L, &theta_U);
by i;
paired x1*x2;
ods output EquivTests=eqtst1(where=(test='Overall'));
run;
proc ttest data=twosamp tost(&theta_L, &theta_U);
by i;
class grp;
ods output EquivTests=eqtst2(where=(variances='Equal' & test='Overall'));
run;
ods exclude none;
ods results;
ods exclude BinomialTest;
proc freq data=eqtst1;
format probt signif.;
tables probt / binomial;
run;
ods exclude BinomialTest;
proc freq data=eqtst2;
format probt signif.;
tables probt / binomial;
run;
Result of the first PROC FREQ step (paired-sample design):
P-Value Cumulative Cumulative Probt Frequency Percent Frequency Percent -------------------------------------------------------------------- significant 709895 70.99 709895 70.99 not significant 290105 29.01 1000000 100.00 Binomial Proportion Probt = significant Proportion 0.7099 ASE 0.0005 95% Lower Conf Limit 0.7090 95% Upper Conf Limit 0.7108 Exact Conf Limits 95% Lower Conf Limit 0.7090 95% Upper Conf Limit 0.7108 Sample Size = 1000000
Result of the second PROC FREQ step (two-independent-sample design):
P-Value Cumulative Cumulative Probt Frequency Percent Frequency Percent -------------------------------------------------------------------- significant 898786 89.88 898786 89.88 not significant 101214 10.12 1000000 100.00 Binomial Proportion Probt = significant Proportion 0.8988 ASE 0.0003 95% Lower Conf Limit 0.8982 95% Upper Conf Limit 0.8994 Exact Conf Limits 95% Lower Conf Limit 0.8982 95% Upper Conf Limit 0.8994 Sample Size = 1000000
Note that I used PROC TRANSPOSE to create the independent samples from the paired samples so that identical data are used for the two tests. The first components of the three pairs comprise group 1 (grp='x1') and the second components of the three pairs comprise group 2 (grp='x2') in the two-independent-sample design.
Hello @Bim and welcome to the SAS Support Communities!
Thanks for asking these interesting questions.
@Bim wrote:
- In my code I use the standard deviation for differences (first, a difference is calculated for each of 3 pairs, and then the standard deviation is calculated from 3 differences). Is it correct to use the standard deviation for differences or should I use the common standard deviation per group?
The documentation says that STDDEV "specifies the standard deviation assumed to be common to both members of a pair." So, in general it is not correct to enter the standard deviation of the differences here*. You can either specify a common standard deviation (STDDEV= option) or the individual standard deviation of each member of the pair (PAIREDSTDDEVS= option, alias: PSTDS=) if the two sigmas are different. So, a specification pstds=(2,2) would be equivalent to stddev=2. (I verified this with your example.)
_____________________
* Edit: A second look at the formulas (see answers to the other questions) reveals: Both the individual standard deviations and the correlation coefficient are used in the power calculation only through the standard deviation of the differences. So, you can enter that value in the STDDEV= option if you specify CORR=0.5 because then sigma_diff=sqrt(2*sigma**2-2*0.5*sigma**2)=sigma.
_____________________
@Bim wrote:
- How the correlation coefficient is used in the calculation of power?
It is used in the formula for the standard deviation of the differences (which is calculated from the two individual standard deviations and the correlation coefficient). See the second link in my answer to your next question.
@Bim wrote:
2. What is a formula for the power calculated in proc power?
You can find the formula in the PROC POWER documentation --> Details --> Computational Methods and Formulas
--> Analyses in the PAIREDMEANS Statement, section "Additive Equivalence Test for Mean Difference with Normal Data (TEST=EQUIV_DIFF)."
Just for fun, let's check if we can replicate the result from PROC POWER by implementing the formula using only Base SAS. Let's assume, just as an example, that the common standard deviation was 2 (as specified in your code).
proc print data=pow;
id corr;
var power;
format power 12.10;
run;
Result:
Corr Power 0.9 0.9999975766 0.1 0.7509335978
Now our own computation:
data param;
N=3;
alpha=0.05;
mu_diff=1;
rho=0.9; /* rho=0.1; */
sigma1=2;
sigma2=2;
sigma_diff=sqrt(sigma1**2+sigma2**2-2*rho*sigma1*sigma2);
theta_U=7;
theta_L=-7;
arg1=tinv(1-alpha,N-1);
arg2a=(mu_diff-theta_U)/(sigma_diff/sqrt(N));
arg2b=(mu_diff-theta_L)/(sigma_diff/sqrt(N));
arg3=sqrt(N-1)*(theta_U-theta_L)/(2*sigma_diff/sqrt(N)*arg1);
f=sqrt(2*constant('PI'))/(gamma((N-1)/2)*2**((N-3)/2));
call symputx('N',N);
call symputx('t',arg1);
call symputx('delta1',arg2a);
call symputx('delta2',arg2b);
call symputx('b',arg3);
run;
%put &=N;
%put &=t;
%put &=delta1;
%put &=delta2;
%put &=b;
%macro func(x);
y=cdf('normal',-&t*&x/sqrt(&N-1)-&delta1)*&x**(&N-2)*pdf('normal',&x);
%mend func;
%AdpSimps(0,&b,1E-12);
%let q1=&integral;
%macro func(x);
y=cdf('normal',&t*&x/sqrt(&N-1)-&delta2)*&x**(&N-2)*pdf('normal',&x);
%mend func;
%AdpSimps(0,&b,1E-12);
%let q2=&integral;
%put &=q1;
%put &=q2;
data check;
set param;
power=f*(&q1-&q2);
run;
proc print data=check noobs;
var power;
format power 12.10;
run;
(The code of macro %AdpSimps can be found in this SESUG 2007 paper.)
Result:
power 0.9999975765
And with rho=0.1:
power 0.7509335977
So, the results from the "manual" calculation differ by only 1E-10 from those obtained with PROC POWER, suggesting that PROC POWER indeed uses this formula.
@Bim wrote:
I have one more question.
If the correlation is zero in the proc power for paired equivalents test, should I expect that the calculated power of the paired equivalents test is the same as the power for not paired equivalents test?
When I run the following two codes I get different results. The power for the paired test is 0.710 while the power for the not paired equivalents test is 0.899.
Why there is a difference, if the standard deviations, mean difference and sample size are the same in both cases?
The reason for the difference is that different tests are performed: The equivalence test is based on the paired t-test in the former case and on the two-sample t-test in the latter case. The paired t-test uses a different estimator for the standard error than the two-sample t-test (see the formulas for SE in sections "One-Sample Design" and "Two-Independent-Sample Design" of the PROC TTEST documentation). Note that in the tests the standard deviations are estimated from the samples. Our specification sigma=2 is not used there.
As a consequence, the t-statistics have different denominators (while the numerators are the same). Essentially, the different terms are
With our assumption of zero correlation these two may even have the same expected value, but the crucial point is that the degrees of freedom are different: n-1 vs. 2(n-1). In your example with the small n=3 the difference between the corresponding t-distributions (i.e. with df=2 vs. 4) is particularly large.
It's not difficult to verify the power calculation with a simulation (using the two different tests):
%let n=3;
%let mu_diff=1;
%let stddev=2;
%let theta_L=-7;
%let theta_U=7;
%let alpha=0.05;
%let nsim=1e6;
proc format;
value signif
low-&alpha = 'significant'
&alpha<-high = 'not significant';
run;
data pairs;
call streaminit(27182818);
do i=1 to ≁
do id=1 to &n;
x1=rand('normal',0,&stddev);
x2=rand('normal',&mu_diff,&stddev);
output;
end;
end;
run;
proc transpose data=pairs out=twosamp(drop=id rename=(col1=x)) name=grp;
by i id;
var x1 x2;
run;
ods graphics off;
ods exclude all;
ods noresults;
proc ttest data=pairs tost(&theta_L, &theta_U);
by i;
paired x1*x2;
ods output EquivTests=eqtst1(where=(test='Overall'));
run;
proc ttest data=twosamp tost(&theta_L, &theta_U);
by i;
class grp;
ods output EquivTests=eqtst2(where=(variances='Equal' & test='Overall'));
run;
ods exclude none;
ods results;
ods exclude BinomialTest;
proc freq data=eqtst1;
format probt signif.;
tables probt / binomial;
run;
ods exclude BinomialTest;
proc freq data=eqtst2;
format probt signif.;
tables probt / binomial;
run;
Result of the first PROC FREQ step (paired-sample design):
P-Value Cumulative Cumulative Probt Frequency Percent Frequency Percent -------------------------------------------------------------------- significant 709895 70.99 709895 70.99 not significant 290105 29.01 1000000 100.00 Binomial Proportion Probt = significant Proportion 0.7099 ASE 0.0005 95% Lower Conf Limit 0.7090 95% Upper Conf Limit 0.7108 Exact Conf Limits 95% Lower Conf Limit 0.7090 95% Upper Conf Limit 0.7108 Sample Size = 1000000
Result of the second PROC FREQ step (two-independent-sample design):
P-Value Cumulative Cumulative Probt Frequency Percent Frequency Percent -------------------------------------------------------------------- significant 898786 89.88 898786 89.88 not significant 101214 10.12 1000000 100.00 Binomial Proportion Probt = significant Proportion 0.8988 ASE 0.0003 95% Lower Conf Limit 0.8982 95% Upper Conf Limit 0.8994 Exact Conf Limits 95% Lower Conf Limit 0.8982 95% Upper Conf Limit 0.8994 Sample Size = 1000000
Note that I used PROC TRANSPOSE to create the independent samples from the paired samples so that identical data are used for the two tests. The first components of the three pairs comprise group 1 (grp='x1') and the second components of the three pairs comprise group 2 (grp='x2') in the two-independent-sample design.
Thank you! The replies were very helpful and have solved my problem.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.