Calcite | Level 5

## Sample Size Calculation for Factorial Design

Hi Everyone,

I am currently learning sample size calculations using SAS and I came across this question:

Under ideal packaging conditions, the concentration of the active ingredient in a vacuum packed dry powdered product should be independent of storage temperature and humidity at the time of packaging and there should be no difference between initial and end-of-shelf-life concentrations. An experiment to study the active ingredients concentration is planned using a 3x2x2 experiment design in temperature, humidity, and age, respectively. The variable levels are: 20°C, 25°C, and 30°C for temperature; 25 and 50 percent for humidity, and 0 and 6 months for age. Determine the number of replicates are required to detect an effect size of 30 ppm with 90% power when the standard error of the model is expected to be 20 ppm.

I do not know how to approach this problem since I do not have any data to use for sample size calculations. I tried using proc glmpower but it would require a set of data in order to calculate. Is there a way to find sample size with just effect size and standard error?

3 REPLIES 3
Ammonite | Level 13

## Re: Sample Size Calculation for Factorial Design

this link may assist you in determining the sample size.

https://surveysystem.com/sscalc.htm

Rhodochrosite | Level 12

## Re: Sample Size Calculation for Factorial Design

The question that you cite is ill-posed and cannot be answered as written:  If you have a 3x2x2 factorial, you have MANY comparisons that might differ by at least 30 ppm. Which pair of the 3 levels of factor A (temperature) differ by at least 30 ppm? Do the levels of factor B (humidity) or C (age) differ by at least 30 ppm? Do comparisons within interactions differ by at least 30 ppm? Do ALL comparisons in the entire model need to differ by at least 30 ppm?

Even if your statistical model is not mixed, you can use the GLIMMIX procedure to determine sample size in complex models, by using exemplary datasets. (GLMPOWER also uses exemplary datasets.) An exemplary dataset represents an alternative hypothesis for which you would like to assess power--it is what you think the mean structure of your data will look like, and it replaces actual data in the procedure. Once you understand the concept of exemplary data, you'll see why you do not need an actual set of data.

For a 3-way factorial, coming up with an exemplary dataset is a nontrivial problem requiring much thought and a good familiarity with the context of the study, because you have to envision the entirety of the 3-factor outcomes of interest (so, really a set of alternative hypotheses: effect of A and effect of B and effect of C, and interaction of A and B, etc.). See PROC GLIMMIX as a Teaching and Planning Tool for Experiment Design for an example of the process.

For your question, I might imagine that the 30 ppm applies to the difference in age. But what if the difference in age is not expected to be the same for all temperatures and humidities? Then you need to assess power for interactions and it gets more complicated.

I hope this helps.

Calcite | Level 5

## Re: Sample Size Calculation for Factorial Design

Thank you! This is really helpful!

Discussion stats
• 3 replies
• 5117 views
• 1 like
• 3 in conversation