BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mekono
Fluorite | Level 6

Hi SAS users!

 

I'm trying to calculate the sample size needed for a new analysis using a new data source based on findings from a preliminary analysis using a different dataset.  In the prelim analysis, there were 3 groups a control and 2 treatment groups (1 for compliant subjects and 1 for non compliant subjects). The prelim data findings are below.

 

 CompliantNon-CompliantControl
 5003002000
Total Cost$6,955$9,368$7,705

 

I've been trying to use proc power one way anova to calculate the estimated subjects needed for the new analysis but have run into issues- particularly with the standard deviation that I should place there as well as the group means section. 

 

 

PROC POWER;
ONEWAYANOVA TEST = OVERALL
ALPHA = 0.05
GROUPMEANS = 500| 300 |2000
STDDEV = 1.0
NPERGROUP = .
POWER = 0.8;
RUN;

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

To use PROC POWER you must have an estimate of the variability around the means.  I assume the means are the dollar amounts presented.  The estimate of variability may be the same across all groups, or different by group.  I am going to do a PROC POWER with a constant standard deviation of $1000, and the means you present:  I used this code:

PROC POWER;
ONEWAYANOVA TEST = OVERALL
ALPHA = 0.05
GROUPMEANS = 500| 300 |2000
STDDEV = 1.0
NPERGROUP = .
POWER = 0.8;
RUN;

And obtained this result:

Computed N per
Group
Actual Power N per Group
0.875 5

 

This says 5 patients per group with these group means and a common standard deviation of 1000 would have an 87.5% chance of providing an F test significant at the 0.05 level.  Now I suspect that the other row of numbers are some other type of average, and that is what you want to compare.  To get an estimate, you MUST have some estimate of the variability in those numbers.

 

SteveDenham

 

 

View solution in original post

5 REPLIES 5
SteveDenham
Jade | Level 19

To use PROC POWER you must have an estimate of the variability around the means.  I assume the means are the dollar amounts presented.  The estimate of variability may be the same across all groups, or different by group.  I am going to do a PROC POWER with a constant standard deviation of $1000, and the means you present:  I used this code:

PROC POWER;
ONEWAYANOVA TEST = OVERALL
ALPHA = 0.05
GROUPMEANS = 500| 300 |2000
STDDEV = 1.0
NPERGROUP = .
POWER = 0.8;
RUN;

And obtained this result:

Computed N per
Group
Actual Power N per Group
0.875 5

 

This says 5 patients per group with these group means and a common standard deviation of 1000 would have an 87.5% chance of providing an F test significant at the 0.05 level.  Now I suspect that the other row of numbers are some other type of average, and that is what you want to compare.  To get an estimate, you MUST have some estimate of the variability in those numbers.

 

SteveDenham

 

 

mekono
Fluorite | Level 6

Thank you Steve! I have calculated the standard deviation among  the average total cost values across the 3 groups (1992) and inserting that into the code increases the N needed per group to 24. 

Two questions-

 In the new dataset, I'm expecting there to be over a thousand claims in each group, so finding that I need only 24 or more cases is helpful but doesn't specify how many hundred or thousands of cases I need. Is there something I can do to narrow down the specific number needed? or would you suggest testing the power after I've identified the volumes in the new dataset?

 

Is the large value of the SD causing the volume of N needed per group to be lower?

 

SteveDenham
Jade | Level 19

I will start from the bottom up.  Larger SD's mean that a bigger sample size will be needed to detect a statistically significant difference.  As an example, when I used 1000 as an SD, I got an N of 5 per group  When you essentially doubled the SD to 1996, you got an N of 24 per group.

 

Now on to the other question.  Frankly, the way you state this confuses me.  If you need 24 per group to show a difference, and you have thousands, then there should be little concern that your conclusions may be due to a unusual sample.  Post data collection but pre-analysis calculation of power can provide peace of mind that you need not go out and get more data.  Just for fun, let's assume the means and standard deviations are unchanged from your calculation, but we turn this around to calculate the power with 1000 cases per group. I get >0.999.  How can you use this information?  Well, if there is a fixed cost for collecting each case, you would save (3000 - 72) * (fixed cost) dollars by just using the first 24 cases that are collected in each group.  Even if the cost were a dollar per case, that is a saving of over 2900 dollars.

 

SteveDenham

mekono
Fluorite | Level 6
Thanks, your explanation makes perfect sense!
SteveDenham
Jade | Level 19

Well, when i pasted in the code, I used your code, rather than the changes I said I was making.  It should have been:

ONEWAYANOVA TEST = OVERALL
ALPHA = 0.05
GROUPMEANS = 6955 9368 7705
STDDEV = 1000
NPERGROUP = .
POWER = 0.8;
RUN;

SteveDenham

 

 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 773 views
  • 2 likes
  • 2 in conversation