BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
atbeczkiewicz
Calcite | Level 5

After reading through the "Computational Methods and Formulas" for proc power within the User Guide as well as other posts within these communities about sample size calculation, I'm still having a hard time understanding why there is a discrepancy between the sample size calculated by SAS 9.4 and the one I obtain by hand.

 

I understand that onesamplefreq results in different values because the formula used by SAS accounts for type II error by including zpower whereas the formula typically used for hand calculations of this does not.

 

My problem is that I do not understand where the discrepancy when using onesamplemeans is introduced.

 

Example

If I'm trying to calculate sample size needed for a halfwidth of 50 (stddev=250) with 95% confidence by hand I do:

N = [z(1-alpha/2) * sigma / halfwidth]^2 = [1.96*250/50]^2 = 96.04 > 97

Using proc power I do:

proc power;
  onesamplemeans ci = t
  halfwidth = 50
  stddev=250
  probwidth= 0.95
  ntotal=.
  ;
run

which gets 120

 

Deriving a formula for sample size from the half-width formula provided in the User Guide for onesamplemeans results in essentially the same equation I'm using for hand calculations. While the difference does not really matter to me in this instance, I would greatly appreciate any help in understanding how SAS is arriving at this number so that I know for studies I'm designing later this year.

Thanks,

aaron

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

To verify power or sample size calculations I like to do a simulation. For your question this is fairly easy:

%let alpha=0.05; /* significance level */
%let s=250;      /* standard deviation of normal variate */
%let h=50;       /* target maximum half-width of CI for the mean */
%let n=120;      /* sample size */
%let n_alt=97;   /* alternative lower sample size */
%let nsim=1e5;   /* number of samples */

/* Simulate &NSIM samples of &N normal random variates each */

data sim / view=sim;
call streaminit(27182818);
do i=1 to ≁
  do k=1 to &n;
    x=rand('normal',0,&s);
    output;
  end;
end;
run;

/* Compute sample means and sample standard deviations */

proc summary data=sim;
by i;
*where k<=&n_alt;
var x;
output out=stats(drop=_:) mean=m std=s;
run;

/* Compute relevant characteristics of CIs for the mean */

data ci;
set stats;
h=tinv(1-&alpha/2,&n-1)*s/sqrt(&n); /* half-width of CI */
valid=(-h<=m<=h); /* indicator "CI contains true mean" */
narrow=(h<=&h);   /* indicator "CI is sufficiently narrow" */
run;

/* Check quality criteria for the CIs */

ods exclude BinomialTest;
proc freq data=ci;
tables valid narrow / bin(level='1');
tables valid*narrow;
run;

Results:

With the sample size n=120 suggested by PROC POWER a proportion of 0.9493 (95% CI 0.9479 to 0.9506) of the 100,000 simulated confidence intervals were valid, i.e., contained the true mean zero. Also, a proportion of 0.9520 (95% CI 0.9506 to 0.9533) of the confidence intervals had a half-width of at most 50. Good.

 

However, after reducing the sample size to your suggested n=97 (by activating the WHERE statement commented out above) the two proportions deteriorate to 0.9209 (95% CI 0.9192 to 0.9226) and 0.9343 (95% CI 0.9327 to 0.9358), respectively. So, there is strong evidence that n=97 is insufficient if the goal is 0.95.

 

View solution in original post

3 REPLIES 3
FreelanceReinh
Jade | Level 19

Hello @atbeczkiewicz and welcome to the SAS Support Communities!

 

I haven't checked your formula, but my "hand" (i.e. Base SAS) calculation of the power* obtained with sample sizes 97 (your suggestion), 119 and 120 (PROC POWER result) using the formula in the documentation (section Confidence Interval for Mean (CI=T), Pr(half-width<=h)=..., two-sided) would look something like this:

data chk;
h=50;
sigma=250;
alpha=0.05;
do n=97, 119, 120;
  power=cdf('chisq',h**2*n*(n-1)/(sigma**2*tinv(1-alpha/2,n-1)**2),n-1);
  output;
end;
run;

Result:

Obs     h    sigma    alpha     n      power

 1     50     250      0.05     97    0.47682
 2     50     250      0.05    119    0.94304
 3     50     250      0.05    120    0.95130

So, for a power of 0.95 you'd need n=120 -- exactly the result of PROC POWER.

 

Edit:

* (sloppy use of the term "power" in the sense "probability that half-width of CI does not exceed h")

FreelanceReinh
Jade | Level 19

To verify power or sample size calculations I like to do a simulation. For your question this is fairly easy:

%let alpha=0.05; /* significance level */
%let s=250;      /* standard deviation of normal variate */
%let h=50;       /* target maximum half-width of CI for the mean */
%let n=120;      /* sample size */
%let n_alt=97;   /* alternative lower sample size */
%let nsim=1e5;   /* number of samples */

/* Simulate &NSIM samples of &N normal random variates each */

data sim / view=sim;
call streaminit(27182818);
do i=1 to &nsim;
  do k=1 to &n;
    x=rand('normal',0,&s);
    output;
  end;
end;
run;

/* Compute sample means and sample standard deviations */

proc summary data=sim;
by i;
*where k<=&n_alt;
var x;
output out=stats(drop=_:) mean=m std=s;
run;

/* Compute relevant characteristics of CIs for the mean */

data ci;
set stats;
h=tinv(1-&alpha/2,&n-1)*s/sqrt(&n); /* half-width of CI */
valid=(-h<=m<=h); /* indicator "CI contains true mean" */
narrow=(h<=&h);   /* indicator "CI is sufficiently narrow" */
run;

/* Check quality criteria for the CIs */

ods exclude BinomialTest;
proc freq data=ci;
tables valid narrow / bin(level='1');
tables valid*narrow;
run;

Results:

With the sample size n=120 suggested by PROC POWER a proportion of 0.9493 (95% CI 0.9479 to 0.9506) of the 100,000 simulated confidence intervals were valid, i.e., contained the true mean zero. Also, a proportion of 0.9520 (95% CI 0.9506 to 0.9533) of the confidence intervals had a half-width of at most 50. Good.

 

However, after reducing the sample size to your suggested n=97 (by activating the WHERE statement commented out above) the two proportions deteriorate to 0.9209 (95% CI 0.9192 to 0.9226) and 0.9343 (95% CI 0.9327 to 0.9358), respectively. So, there is strong evidence that n=97 is insufficient if the goal is 0.95.

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1003 views
  • 4 likes
  • 2 in conversation