Hello. I am trying to analyze over 300 SNP (single nucleotide polymorphisms) data.
I also asked a question about simpleM for P-value correction last time and received an answer.
(FYI, SimpleM :https://support.sas.com/resources/papers/proceedings09/240-2009.pdf)
However, I am completely lost when it comes to applying this in practice.
1. When conducting multiple testing, should I put all 300 SNPs variables (continuous variables) into cox proportional model at once? (a)
or should I put 300 SNPs seperately into model? (b)
2. Is it correct to test the significance level at 0.05/m (the number obtained from simpleM) without adjusting P-values of SNPs?
Using the simpleM method has made it difficult to utilize proc multtest, since the option for the simpleM method is not available.
Many thanks in advance.
example code is below:
(a)
proc phreg data=mydata;
class confounders;
model time*status(0)=confounders con_confounders biomarkers1-300/RL;
run;
(b)
proc phreg data=mydata;
class confounders;
model time*status(0)=confounders con_confounders biomarkers1/RL;
run;
...
proc phreg data=mydata;
class confounders;
model time*status(0)=confounders con_confounders biomarkers300/RL;
run;
To clarify my previous comment, there are actually two equivalent options for this kind of multiple-testing correction. You can: 1) compare (unadjusted) p-values to the threshold 0.05/m and declare any less than that value to be statistically significant, or 2) compute adjusted p-values by multiplying your raw p-values by m, and declare any adjusted p-values < 0.05 to statistically significant. The first approach is more common in statistical genetics (e.g., this is why 5 x 10^-8 is the common threshold for statistical significance in genome-wide association studies). The second approach is taken by PROC MULTTEST (see the Bonferroni section in SAS Help Center: p-Value Adjustments).
To your point about just one significant SNP - that seems very plausible. Bonferonni corrections are generally regarded as conservative (i.e., tend to declare p-values non-significant). I would recommend reading through the link above for other potentially less conservative options regarding adjustments for multiple tests.
Answering your questions in order:
1. You can do either, but I think the more common approach in genetic studies is to fit 300 models, each with just one of the SNPs as a predictor variable.
2. Yes. You calculate m using the simpleM method, and then declare 'statistically significant' p-values less than 0.05/m.
Thank you for your kind response.
Just to clarify, when correcting for multiple testing, I adjust the significance level to 0.05 only, not correcting the resulting p-value, correct?
After adjusting the significance level to 0.05/m, the values are too small, and there is only one significant SNP.
I'm not sure if this is correct.
To clarify my previous comment, there are actually two equivalent options for this kind of multiple-testing correction. You can: 1) compare (unadjusted) p-values to the threshold 0.05/m and declare any less than that value to be statistically significant, or 2) compute adjusted p-values by multiplying your raw p-values by m, and declare any adjusted p-values < 0.05 to statistically significant. The first approach is more common in statistical genetics (e.g., this is why 5 x 10^-8 is the common threshold for statistical significance in genome-wide association studies). The second approach is taken by PROC MULTTEST (see the Bonferroni section in SAS Help Center: p-Value Adjustments).
To your point about just one significant SNP - that seems very plausible. Bonferonni corrections are generally regarded as conservative (i.e., tend to declare p-values non-significant). I would recommend reading through the link above for other potentially less conservative options regarding adjustments for multiple tests.
The Holm step-down method available in PROC MULTTEST is generally applicable for adjusting a set of p-values. It is powerful, much less conservative than Bonferroni, controls the family-wise error rate, and is not sensitive to dependence that might exist among the tests.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.