Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Understanding discrepancy between proc power and hand calculated sampl...

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 03-28-2019 02:00 PM
(1002 views)

After reading through the "Computational Methods and Formulas" for proc power within the User Guide as well as other posts within these communities about sample size calculation, I'm still having a hard time understanding why there is a discrepancy between the sample size calculated by SAS 9.4 and the one I obtain by hand.

I understand that onesamplefreq results in different values because the formula used by SAS accounts for type II error by including zpower whereas the formula typically used for hand calculations of this does not.

My problem is that I do not understand where the discrepancy when using onesamplemeans is introduced.

Example

If I'm trying to calculate sample size needed for a halfwidth of 50 (stddev=250) with 95% confidence by hand I do:

N = [z(1-alpha/2) * sigma / halfwidth]^2 = [1.96*250/50]^2 = 96.04 > 97

Using proc power I do:

```
proc power;
onesamplemeans ci = t
halfwidth = 50
stddev=250
probwidth= 0.95
ntotal=.
;
run
```

which gets 120

Deriving a formula for sample size from the half-width formula provided in the User Guide for onesamplemeans results in essentially the same equation I'm using for hand calculations. While the difference does not really matter to me in this instance, I would greatly appreciate any help in understanding how SAS is arriving at this number so that I know for studies I'm designing later this year.

Thanks,

aaron

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

To verify power or sample size calculations I like to do a simulation. For your question this is fairly easy:

```
%let alpha=0.05; /* significance level */
%let s=250; /* standard deviation of normal variate */
%let h=50; /* target maximum half-width of CI for the mean */
%let n=120; /* sample size */
%let n_alt=97; /* alternative lower sample size */
%let nsim=1e5; /* number of samples */
/* Simulate &NSIM samples of &N normal random variates each */
data sim / view=sim;
call streaminit(27182818);
do i=1 to ≁
do k=1 to &n;
x=rand('normal',0,&s);
output;
end;
end;
run;
/* Compute sample means and sample standard deviations */
proc summary data=sim;
by i;
*where k<=&n_alt;
var x;
output out=stats(drop=_:) mean=m std=s;
run;
/* Compute relevant characteristics of CIs for the mean */
data ci;
set stats;
h=tinv(1-&alpha/2,&n-1)*s/sqrt(&n); /* half-width of CI */
valid=(-h<=m<=h); /* indicator "CI contains true mean" */
narrow=(h<=&h); /* indicator "CI is sufficiently narrow" */
run;
/* Check quality criteria for the CIs */
ods exclude BinomialTest;
proc freq data=ci;
tables valid narrow / bin(level='1');
tables valid*narrow;
run;
```

__Results:__

With the sample size n=120 suggested by PROC POWER a proportion of 0.9493 (95% CI 0.9479 to 0.9506) of the 100,000 simulated confidence intervals were valid, i.e., contained the true mean zero. Also, a proportion of 0.9520 (95% CI 0.9506 to 0.9533) of the confidence intervals had a half-width of at most 50. Good.

However, after reducing the sample size to your suggested n=97 (by activating the WHERE statement commented out above) the two proportions deteriorate to 0.9209 (95% CI 0.9192 to 0.9226) and 0.9343 (95% CI 0.9327 to 0.9358), respectively. So, there is strong evidence that n=97 is insufficient if the goal is 0.95.

3 REPLIES 3

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hello @atbeczkiewicz and welcome to the SAS Support Communities!

I haven't checked your formula, but my "hand" (i.e. Base SAS) calculation of the power* obtained with sample sizes 97 (your suggestion), 119 and 120 (PROC POWER result) using the formula in the documentation (section *Confidence Interval for Mean (CI=T)*, Pr(half-width<=h)=..., two-sided) would look something like this:

```
data chk;
h=50;
sigma=250;
alpha=0.05;
do n=97, 119, 120;
power=cdf('chisq',h**2*n*(n-1)/(sigma**2*tinv(1-alpha/2,n-1)**2),n-1);
output;
end;
run;
```

Result:

Obs h sigma alpha n power 1 50 250 0.05 97 0.47682 2 50 250 0.05 119 0.94304 3 50 250 0.05 120 0.95130

So, for a power of 0.95 you'd need n=120 -- exactly the result of PROC POWER.

Edit:

* (sloppy use of the term "power" in the sense "probability that half-width of CI does not exceed h")

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

To verify power or sample size calculations I like to do a simulation. For your question this is fairly easy:

```
%let alpha=0.05; /* significance level */
%let s=250; /* standard deviation of normal variate */
%let h=50; /* target maximum half-width of CI for the mean */
%let n=120; /* sample size */
%let n_alt=97; /* alternative lower sample size */
%let nsim=1e5; /* number of samples */
/* Simulate &NSIM samples of &N normal random variates each */
data sim / view=sim;
call streaminit(27182818);
do i=1 to ≁
do k=1 to &n;
x=rand('normal',0,&s);
output;
end;
end;
run;
/* Compute sample means and sample standard deviations */
proc summary data=sim;
by i;
*where k<=&n_alt;
var x;
output out=stats(drop=_:) mean=m std=s;
run;
/* Compute relevant characteristics of CIs for the mean */
data ci;
set stats;
h=tinv(1-&alpha/2,&n-1)*s/sqrt(&n); /* half-width of CI */
valid=(-h<=m<=h); /* indicator "CI contains true mean" */
narrow=(h<=&h); /* indicator "CI is sufficiently narrow" */
run;
/* Check quality criteria for the CIs */
ods exclude BinomialTest;
proc freq data=ci;
tables valid narrow / bin(level='1');
tables valid*narrow;
run;
```

__Results:__

With the sample size n=120 suggested by PROC POWER a proportion of 0.9493 (95% CI 0.9479 to 0.9506) of the 100,000 simulated confidence intervals were valid, i.e., contained the true mean zero. Also, a proportion of 0.9520 (95% CI 0.9506 to 0.9533) of the confidence intervals had a half-width of at most 50. Good.

However, after reducing the sample size to your suggested n=97 (by activating the WHERE statement commented out above) the two proportions deteriorate to 0.9209 (95% CI 0.9192 to 0.9226) and 0.9343 (95% CI 0.9327 to 0.9358), respectively. So, there is strong evidence that n=97 is insufficient if the goal is 0.95.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks @FreelanceReinh!

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.