Solved: Understanding discrepancy between proc power and hand calculated sampl...

atbeczkiewicz · Posted 03-28-2019 02:00 PM

After reading through the "Computational Methods and Formulas" for proc power within the User Guide as well as other posts within these communities about sample size calculation, I'm still having a hard time understanding why there is a discrepancy between the sample size calculated by SAS 9.4 and the one I obtain by hand.

I understand that onesamplefreq results in different values because the formula used by SAS accounts for type II error by including zpower whereas the formula typically used for hand calculations of this does not.

My problem is that I do not understand where the discrepancy when using onesamplemeans is introduced.

Example

If I'm trying to calculate sample size needed for a halfwidth of 50 (stddev=250) with 95% confidence by hand I do:

N = [z(1-alpha/2) * sigma / halfwidth]^2 = [1.96*250/50]^2 = 96.04 > 97

Using proc power I do:

proc power;
  onesamplemeans ci = t
  halfwidth = 50
  stddev=250
  probwidth= 0.95
  ntotal=.
  ;
run

which gets 120

Deriving a formula for sample size from the half-width formula provided in the User Guide for onesamplemeans results in essentially the same equation I'm using for hand calculations. While the difference does not really matter to me in this instance, I would greatly appreciate any help in understanding how SAS is arriving at this number so that I know for studies I'm designing later this year.

Thanks,

aaron

FreelanceReinh · Posted 03-29-2019 07:11 AM

To verify power or sample size calculations I like to do a simulation. For your question this is fairly easy:

%let alpha=0.05; /* significance level */
%let s=250;      /* standard deviation of normal variate */
%let h=50;       /* target maximum half-width of CI for the mean */
%let n=120;      /* sample size */
%let n_alt=97;   /* alternative lower sample size */
%let nsim=1e5;   /* number of samples */

/* Simulate &NSIM samples of &N normal random variates each */

data sim / view=sim;
call streaminit(27182818);
do i=1 to &nsim;
  do k=1 to &n;
    x=rand('normal',0,&s);
    output;
  end;
end;
run;

/* Compute sample means and sample standard deviations */

proc summary data=sim;
by i;
*where k<=&n_alt;
var x;
output out=stats(drop=_:) mean=m std=s;
run;

/* Compute relevant characteristics of CIs for the mean */

data ci;
set stats;
h=tinv(1-&alpha/2,&n-1)*s/sqrt(&n); /* half-width of CI */
valid=(-h<=m<=h); /* indicator "CI contains true mean" */
narrow=(h<=&h);   /* indicator "CI is sufficiently narrow" */
run;

/* Check quality criteria for the CIs */

ods exclude BinomialTest;
proc freq data=ci;
tables valid narrow / bin(level='1');
tables valid*narrow;
run;

Results:

With the sample size n=120 suggested by PROC POWER a proportion of 0.9493 (95% CI 0.9479 to 0.9506) of the 100,000 simulated confidence intervals were valid, i.e., contained the true mean zero. Also, a proportion of 0.9520 (95% CI 0.9506 to 0.9533) of the confidence intervals had a half-width of at most 50. Good.

However, after reducing the sample size to your suggested n=97 (by activating the WHERE statement commented out above) the two proportions deteriorate to 0.9209 (95% CI 0.9192 to 0.9226) and 0.9343 (95% CI 0.9327 to 0.9358), respectively. So, there is strong evidence that n=97 is insufficient if the goal is 0.95.

View solution in original post

FreelanceReinh · Posted 03-28-2019 07:22 PM

Hello @atbeczkiewicz and welcome to the SAS Support Communities!

I haven't checked your formula, but my "hand" (i.e. Base SAS) calculation of the power* obtained with sample sizes 97 (your suggestion), 119 and 120 (PROC POWER result) using the formula in the documentation (section Confidence Interval for Mean (CI=T), Pr(half-width<=h)=..., two-sided) would look something like this:

data chk;
h=50;
sigma=250;
alpha=0.05;
do n=97, 119, 120;
  power=cdf('chisq',h**2*n*(n-1)/(sigma**2*tinv(1-alpha/2,n-1)**2),n-1);
  output;
end;
run;

Result:

Obs     h    sigma    alpha     n      power

 1     50     250      0.05     97    0.47682
 2     50     250      0.05    119    0.94304
 3     50     250      0.05    120    0.95130

So, for a power of 0.95 you'd need n=120 -- exactly the result of PROC POWER.

Edit:

* (sloppy use of the term "power" in the sense "probability that half-width of CI does not exceed h")

FreelanceReinh · Posted 03-29-2019 07:11 AM

To verify power or sample size calculations I like to do a simulation. For your question this is fairly easy:

%let alpha=0.05; /* significance level */
%let s=250;      /* standard deviation of normal variate */
%let h=50;       /* target maximum half-width of CI for the mean */
%let n=120;      /* sample size */
%let n_alt=97;   /* alternative lower sample size */
%let nsim=1e5;   /* number of samples */

/* Simulate &NSIM samples of &N normal random variates each */

data sim / view=sim;
call streaminit(27182818);
do i=1 to &nsim;
  do k=1 to &n;
    x=rand('normal',0,&s);
    output;
  end;
end;
run;

/* Compute sample means and sample standard deviations */

proc summary data=sim;
by i;
*where k<=&n_alt;
var x;
output out=stats(drop=_:) mean=m std=s;
run;

/* Compute relevant characteristics of CIs for the mean */

data ci;
set stats;
h=tinv(1-&alpha/2,&n-1)*s/sqrt(&n); /* half-width of CI */
valid=(-h<=m<=h); /* indicator "CI contains true mean" */
narrow=(h<=&h);   /* indicator "CI is sufficiently narrow" */
run;

/* Check quality criteria for the CIs */

ods exclude BinomialTest;
proc freq data=ci;
tables valid narrow / bin(level='1');
tables valid*narrow;
run;

Results:

With the sample size n=120 suggested by PROC POWER a proportion of 0.9493 (95% CI 0.9479 to 0.9506) of the 100,000 simulated confidence intervals were valid, i.e., contained the true mean zero. Also, a proportion of 0.9520 (95% CI 0.9506 to 0.9533) of the confidence intervals had a half-width of at most 50. Good.

However, after reducing the sample size to your suggested n=97 (by activating the WHERE statement commented out above) the two proportions deteriorate to 0.9209 (95% CI 0.9192 to 0.9226) and 0.9343 (95% CI 0.9327 to 0.9358), respectively. So, there is strong evidence that n=97 is insufficient if the goal is 0.95.

atbeczkiewicz · Posted 03-29-2019 08:32 AM

Thanks @FreelanceReinh!

Understanding discrepancy between proc power and hand calculated sample size

Re: Understanding discrepancy between proc power and hand calculated sample size

Re: Understanding discrepancy between proc power and hand calculated sample size

Re: Understanding discrepancy between proc power and hand calculated sample size

Re: Understanding discrepancy between proc power and hand calculated sample size

Understanding discrepancy between proc power and hand calculated sample size

Re: Understanding discrepancy between proc power and hand calculated sample size

Re: Understanding discrepancy between proc power and hand calculated sample size

Re: Understanding discrepancy between proc power and hand calculated sample size

Re: Understanding discrepancy between proc power and hand calculated sample size

SAS Innovate 2025: Call for Content