07-26-2017 06:10 PM
I am a rookie in the statistics field. Recently I conducted a two sample t-test for a pre/post comparison. There are five data points before and after the intervention. The client wants to know if the occurance of a harmful event reduces after the intervention. The data is pulled retrospectively.
Although there is an observed 30% event count reduction after the intervention, the p-value for the t-test is .20. The client asked me if there is a way to calculate how much more data we need for the p-value to be lower than .05. We are pretty certain the event reduction trend will continue.
I know that for a difference between two groups with small sample sizes to be statistically significant, the difference needs to be very large.Is it possible to calculate the smallest sample size and difference required for the p-value to be smaller than .05? How should I approach such a question? What method I should look into?
07-26-2017 06:29 PM
The basic methodology is power analysis. This links to a SAS overview
But before diving into specific details about power analysis...
Does your entire dataset currently consist of 10 observations (5 pre and 5 post)? Are you trying to determine how many more post observations you need to achieve a p value less than 0.05?
Or do you have multiple subjects, each with 5 pre and 5 post? And are you trying to determine how many more post observations you need to achieve a p value less than 0.05?
Or are you planning a new study, rather than fishing for significance in the current study?
07-27-2017 10:31 AM
Thanks for your help!
Currently the entire dataset consists of 10 observations, 5 pre and 5 post. Yes, I'm trying to determine how many more observations we need for the post period in order to show a significantly difference. We have enough data for the pre period and my client wants to know how many more time periods we need to wait and collect data for for the difference to be statistically significant.
07-27-2017 10:58 AM
It's not a matter of time periods, your N is too small regardless of how long you collect data. Your experiment is under powered.
07-27-2017 12:31 PM
Thank you for the clarifications.
I see several issues here.
One issue is inflated Type I error. Notably "Adding sample size to an already completed experiment in order to increase power will increase the Type I error rate (alpha) unless extraordinary measures are taken...." http://depts.washington.edu/oawhome/wordpress/wp-content/uploads/2013/10/Improved_Stopping_Rules.pdf (p 3).
Another issue is that it looks like you have only one realization of the intervention. Although n=1 provides some information, it is merely anecdotal; to demonstrate the impact of the intervention, you would need independent replication.
Another issue is that these observations through time may not be independent, i.e., there may be autocorrelation. Lack of independence reduces the effective degrees of freedom--each observation does not carry a full complement of information. This impacts your assessment of sample size when sample size is the number of observations.
Another issue is the use of statistics derived from the dataset to estimate the sample size desired for that same dataset. This problem is addressed in the retrospective power analysis literature.
So, in summary and in my opinion, although you could retrospectively estimate the number of post observations need to detect a particular effect, the value of that effort is minimal and does not advance your effort to assess the impact of intervention in any meaningful way.
07-26-2017 06:47 PM
This example from PROC POWER may help.