Hi SAS users!
I'm trying to calculate the sample size needed for a new analysis using a new data source based on findings from a preliminary analysis using a different dataset. In the prelim analysis, there were 3 groups a control and 2 treatment groups (1 for compliant subjects and 1 for non compliant subjects). The prelim data findings are below.
Compliant | Non-Compliant | Control | |
500 | 300 | 2000 | |
Total Cost | $6,955 | $9,368 | $7,705 |
I've been trying to use proc power one way anova to calculate the estimated subjects needed for the new analysis but have run into issues- particularly with the standard deviation that I should place there as well as the group means section.
PROC POWER;
ONEWAYANOVA TEST = OVERALL
ALPHA = 0.05
GROUPMEANS = 500| 300 |2000
STDDEV = 1.0
NPERGROUP = .
POWER = 0.8;
RUN;
To use PROC POWER you must have an estimate of the variability around the means. I assume the means are the dollar amounts presented. The estimate of variability may be the same across all groups, or different by group. I am going to do a PROC POWER with a constant standard deviation of $1000, and the means you present: I used this code:
PROC POWER;
ONEWAYANOVA TEST = OVERALL
ALPHA = 0.05
GROUPMEANS = 500| 300 |2000
STDDEV = 1.0
NPERGROUP = .
POWER = 0.8;
RUN;
And obtained this result:
Computed N per Group |
|
---|---|
Actual Power | N per Group |
0.875 | 5 |
This says 5 patients per group with these group means and a common standard deviation of 1000 would have an 87.5% chance of providing an F test significant at the 0.05 level. Now I suspect that the other row of numbers are some other type of average, and that is what you want to compare. To get an estimate, you MUST have some estimate of the variability in those numbers.
SteveDenham
To use PROC POWER you must have an estimate of the variability around the means. I assume the means are the dollar amounts presented. The estimate of variability may be the same across all groups, or different by group. I am going to do a PROC POWER with a constant standard deviation of $1000, and the means you present: I used this code:
PROC POWER;
ONEWAYANOVA TEST = OVERALL
ALPHA = 0.05
GROUPMEANS = 500| 300 |2000
STDDEV = 1.0
NPERGROUP = .
POWER = 0.8;
RUN;
And obtained this result:
Computed N per Group |
|
---|---|
Actual Power | N per Group |
0.875 | 5 |
This says 5 patients per group with these group means and a common standard deviation of 1000 would have an 87.5% chance of providing an F test significant at the 0.05 level. Now I suspect that the other row of numbers are some other type of average, and that is what you want to compare. To get an estimate, you MUST have some estimate of the variability in those numbers.
SteveDenham
Thank you Steve! I have calculated the standard deviation among the average total cost values across the 3 groups (1992) and inserting that into the code increases the N needed per group to 24.
Two questions-
In the new dataset, I'm expecting there to be over a thousand claims in each group, so finding that I need only 24 or more cases is helpful but doesn't specify how many hundred or thousands of cases I need. Is there something I can do to narrow down the specific number needed? or would you suggest testing the power after I've identified the volumes in the new dataset?
Is the large value of the SD causing the volume of N needed per group to be lower?
I will start from the bottom up. Larger SD's mean that a bigger sample size will be needed to detect a statistically significant difference. As an example, when I used 1000 as an SD, I got an N of 5 per group When you essentially doubled the SD to 1996, you got an N of 24 per group.
Now on to the other question. Frankly, the way you state this confuses me. If you need 24 per group to show a difference, and you have thousands, then there should be little concern that your conclusions may be due to a unusual sample. Post data collection but pre-analysis calculation of power can provide peace of mind that you need not go out and get more data. Just for fun, let's assume the means and standard deviations are unchanged from your calculation, but we turn this around to calculate the power with 1000 cases per group. I get >0.999. How can you use this information? Well, if there is a fixed cost for collecting each case, you would save (3000 - 72) * (fixed cost) dollars by just using the first 24 cases that are collected in each group. Even if the cost were a dollar per case, that is a saving of over 2900 dollars.
SteveDenham
Well, when i pasted in the code, I used your code, rather than the changes I said I was making. It should have been:
ONEWAYANOVA TEST = OVERALL
ALPHA = 0.05
GROUPMEANS = 6955 9368 7705
STDDEV = 1000
NPERGROUP = .
POWER = 0.8;
RUN;
SteveDenham
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.