Hi!
I have a question about input file for stouffer-liptak correction. So it is proc MULTTEST with option stouffer, but input file with p values... How this one should look like? raw_p values in column or?
Some additional stuff?
Anyone used this one.
My data set looks like this:
test1 raw_p1a raw_p1b raw_p1c
.
.
.
test36789 raw_p36789a raw_p36789b raw_p36789c
thanks in advance!
That is the transformation I would use, and is the one found in the PROC MULTTEST documentation.
Steve Denham
From SAS 9.2.3 online help and example of using an input set of p-values
The MULTTEST Procedure |
data a;
input Test$ Raw_P @@;
datalines;
test01 0.28282 test02 0.30688 test03 0.71022
test04 0.77175 test05 0.78180 test06 0.88581
test07 0.54685 test08 0.84978 test09 0.24228
test10 0.58977 test11 0.03498 test12 0.41607
test13 0.31631 test14 0.05254 test15 0.45061
test16 0.75758 test17 0.12496 test18 0.49485
test19 0.21572 test20 0.50505 test21 0.94372
test22 0.81260 test23 0.77596 test24 0.36889
;
proc multtest inpvalues=a holm hoc fdr;
run;
Thanks!
I am aware of this one but this actually produce one new value for each raw_p. Maybe I am wrong but isn't it that stouffer should produce one new p from multiple p? In my case I would like to get new p from raw_pa, raw_pb and raw_pc for every test I have.
Like in formula Z=sum(z(pi)/sqrt k) where Z is thr the standard normal variable under h0, z(pi) is the p value from test i transformed to z and k is number of tests. Truth is that I could do this by hand but then I noticed this option stouffer and hoped to write in the paper used this from SAS instead of formula 😉
Try:
proc multtest data=yourdata stouffer;
by testidno;
run;
This assumes that what you have as test1, test2, etc. is kept in the variable testidno, and that the dataset is sorted by testidno.
Steve Denham
Thanks Steve!
it looks like I am doing something wrong... I got response ERROR: No base test has been assigned, and ERROR: No class statement.
So I actually wonder am I trying to do what I actually don't understand. I have three p values from three cattle breeds and I would like to combine them. Is this method good one?
did you use PROC MULTTEST statement option INPVALUES.
INPVALUES=SAS-data-set names an input SAS data set that includes a variable containing raw p-values. The MULTTEST procedure adjusts the collection of raw p-values for multiplicity. Resampling-based adjustments are not permitted with this type of data input. The CLASS, CONTRAST, FREQ, STRATA, and TEST statements are ignored when an INPVALUES= data set is specified. The INPVALUES= and DATA= options cannot both be specified. The pvalue-name enables you to specify the name of the p-value column from your data set. By default, pvalue-name=’raw_p’. The INPVALUES= data set can contain variables in addition to the raw p-values variable; see Example 60.5 for an example.
So I set up a test situation using the Stouffer option, and the data from Example 60.5, splitting into 6 testing groups. It does NOT return a single combined p value for grouped tests, but returns individual adjusted p values for each test. Perhaps you could view the maximum adjusted value as a combination.
data a;
input Test$ Raw_P @@;
datalines;
test01 0.28282 test02 0.30688 test03 0.71022
test04 0.77175 test05 0.78180 test06 0.88581
test07 0.54685 test08 0.84978 test09 0.24228
test10 0.58977 test11 0.03498 test12 0.41607
test13 0.31631 test14 0.05254 test15 0.45061
test16 0.75758 test17 0.12496 test18 0.49485
test19 0.21572 test20 0.50505 test21 0.94372
test22 0.81260 test23 0.77596 test24 0.36889
;
data aa;
set a;
grptest=mod(_n_,6);
run;
proc sort data=aa;
by grptest;
run;
ods trace on;
proc multtest inpvalues=aa stouffer;
by grptest;
run;
ods trace off;
As you can see from the output from this, there is an adjusted value for every test, and not a single value, although I think the maximum value may be the exact equivalent of the formulat you proposed. At this point, I would appeal to the independence of the p values, and use either the formula you proposed, or calculate a joint probability that all three tests reject their respective null hypotheses. Something like p_joint=1-((1-rawp_a)*(1-rawp_b)*(1-rawp_c)). You could loop either of these methods pretty easily to go through all 36789 loci (just guessing at what you are looking at). If the max adjusted value from the STOUFFER option matches your formula, then I think you are home free. I would suggest testing it out on three or four incidences.
You will have to reshape your data to use MULTTEST, I think, with a single raw_p, and additional variables testid and breed (taking on A, B, C). Take a look at the structure of dataset aa in the example above.
Steve Denham
Steve I don't know how to thank you
You guessed right. I have SNP data. I am estimating p for every SNP in three breeds.
When I was thinking about combining three values i used papers linked (http://www.biomedcentral.com/1471-2164/13/48 and http://www.plosone.org/article/info:doi/10.1371/journal.pone.0064280) . They were mentioning Stouffer. I looked in SAS and found this option. It looks like I can't use it, so I will calculate "by hand". Let me ask one more thing.
So if I will going to calculate stouffer by hand i should trnsform p value into z value. I believe I should use z = probit(1 -p_value).
Do you agree?
That is the transformation I would use, and is the one found in the PROC MULTTEST documentation.
Steve Denham
Steve, thank you again!
Guys, I am still open for new suggestions!
Wish you all best!
Cheers!
Maja
One more thing. Make sure that those raw p values are one-sided, so that the transformation to a z value is one-to-one. Here is my code for the example data in 60.5
/* Example 60.5 data */
data a;
input Test$ Raw_P @@;
datalines;
test01 0.28282 test02 0.30688 test03 0.71022
test04 0.77175 test05 0.78180 test06 0.88581
test07 0.54685 test08 0.84978 test09 0.24228
test10 0.58977 test11 0.03498 test12 0.41607
test13 0.31631 test14 0.05254 test15 0.45061
test16 0.75758 test17 0.12496 test18 0.49485
test19 0.21572 test20 0.50505 test21 0.94372
test22 0.81260 test23 0.77596 test24 0.36889
;
/* Create 6 groups of 4 pvals each */
data aa;
set a;
grptest=mod(_n_,6);
run;
/* Sort for bygroup processing */
proc sort data=aa;
by grptest;
run;
/* See what MULTTEST gives for Stouffer-Liptak, and see what datasets are made */
ods trace on;
proc multtest inpvalues=aa stouffer;
by grptest;
run;
ods trace off;
/* Transform pvals to z scores */
data b;
set aa;
z=quantile('NORMALl', 1-raw_p);
run;
/* Intermediate step to sum z scores within and across groups */
proc means noprint data=b;
class grptest;
var z;
output out=grpz sum=sumz;
run;
/* Calculate Stouffer combined value */
data grp_pval;
set grpz;
stouffer_p_val=1-cdf('NORMAL',sumz/sqrt(_freq_));
run;
Steve Denham
Awesome! My p values are two sided. Actually I obtained them from t values. Do you believe that should make one sided p? Or to apply different formula. Sorry for being borring but I am really happy someone is actually giving me useful advices.
I would say plug in the t values for the z values. However, you may have different degrees of freedom associated with the t values within a SNP, so they may not be totally comparable. Therefore, I would use the cdf('T", t-value, df) to get one-sided p values, then feed them back through what we have above.
Steve Denham
Thank you
:smileycool:
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.