BookmarkSubscribeRSS Feed
yaozhang
Fluorite | Level 6

I am a rookie in the statistics field. Recently I conducted a two sample t-test for a pre/post comparison. There are five data points before and after the intervention. The client wants to know if the occurance of a harmful event reduces after the intervention. The data is pulled retrospectively. 

 

Although there is an observed 30% event count reduction after the intervention, the p-value for the t-test is .20. The client asked me if there is a way to calculate how much more data we need for the p-value to be lower than .05. We are pretty certain the event reduction trend will continue. 

 

I know that for a difference between two groups with small sample sizes to be statistically significant, the difference needs to be very large.Is it possible to calculate the smallest sample size and difference required for the p-value to be smaller than .05? How should I approach such a question?  What method I should look into? 

6 REPLIES 6
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

The basic methodology is power analysis. This links to a SAS overview 

https://support.sas.com/documentation/onlinedoc/stat/142/intropss.pdf

 

But before diving into specific details about power analysis...

 

Does your entire dataset currently consist of 10 observations (5 pre and 5 post)? Are you trying to determine how many more post observations you need to achieve a p value less than 0.05? 

 

Or do you have multiple subjects, each with 5 pre and 5 post? And are you trying to determine how many more post observations you need to achieve a p value less than 0.05? 

 

Or are you planning a new study, rather than fishing for significance in the current study?

 

yaozhang
Fluorite | Level 6

Thanks for your help! 

 

Currently the entire dataset consists of 10 observations, 5 pre and 5 post. Yes, I'm trying to determine how many more observations we need for the post period in order to show a significantly difference. We have enough data for the pre period and my client wants to know how many more time periods we need to wait and collect data for for the difference to be statistically significant.

Reeza
Super User

It's not a matter of time periods, your N is too small regardless of how long you collect data. Your experiment is under powered.

 

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Thank you for the clarifications.

 

I see several issues here.

 

One issue is inflated Type I error. Notably "Adding sample size to an already completed experiment in order to increase power will increase the Type I error rate (alpha) unless extraordinary measures are taken...." http://depts.washington.edu/oawhome/wordpress/wp-content/uploads/2013/10/Improved_Stopping_Rules.pdf (p 3).

 

Another issue is that it looks like you have only one realization of the intervention. Although n=1 provides some information, it is merely anecdotal; to demonstrate the impact of the intervention, you would need independent replication.

 

Another issue is that these observations through time may not be independent, i.e., there may be autocorrelation. Lack of independence reduces the effective degrees of freedom--each observation does not carry a full complement of information. This impacts your assessment of sample size when sample size is the number of observations.

 

Another issue is the use of statistics derived from the dataset to estimate the sample size desired for that same dataset. This problem is addressed in the retrospective power analysis literature.

 

So, in summary and in my opinion, although you could retrospectively estimate the number of post observations need to detect a particular effect, the value of that effort is minimal and does not advance your effort to assess the impact of intervention in any meaningful way. 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1605 views
  • 3 likes
  • 3 in conversation