Solved: Re: Stouffer-Liptak p value correction

MajaFerencakovic · Posted 12-17-2013 02:11 PM

Hi!

I have a question about input file for stouffer-liptak correction. So it is proc MULTTEST with option stouffer, but input file with p values... How this one should look like? raw_p values in column or?

Some additional stuff?

Anyone used this one.

My data set looks like this:

test1 raw_p1a raw_p1b raw_p1c

.

test36789 raw_p36789a raw_p36789b raw_p36789c

thanks in advance!

SteveDenham · Posted 12-19-2013 09:37 AM

That is the transformation I would use, and is the one found in the PROC MULTTEST documentation.

Steve Denham

View solution in original post

ballardw · Posted 12-17-2013 04:24 PM

From SAS 9.2.3 online help and example of using an input set of p-values

The MULTTEST Procedure

Example 58.5 Inputting Raw p-Values

data a;

input Test$ Raw_P @@;

datalines;

test01 0.28282 test02 0.30688 test03 0.71022

test04 0.77175 test05 0.78180 test06 0.88581

test07 0.54685 test08 0.84978 test09 0.24228

test10 0.58977 test11 0.03498 test12 0.41607

test13 0.31631 test14 0.05254 test15 0.45061

test16 0.75758 test17 0.12496 test18 0.49485

test19 0.21572 test20 0.50505 test21 0.94372

test22 0.81260 test23 0.77596 test24 0.36889

;

proc multtest inpvalues=a holm hoc fdr;

run;

MajaFerencakovic · Posted 12-17-2013 04:41 PM

Thanks!

I am aware of this one but this actually produce one new value for each raw_p. Maybe I am wrong but isn't it that stouffer should produce one new p from multiple p? In my case I would like to get new p from raw_pa, raw_pb and raw_pc for every test I have.

Like in formula Z=sum(z(pi)/sqrt k) where Z is thr the standard normal variable under h0, z(pi) is the p value from test i transformed to z and k is number of tests. Truth is that I could do this by hand but then I noticed this option stouffer and hoped to write in the paper used this from SAS instead of formula 😉

SteveDenham · Posted 12-18-2013 01:14 PM

Try:

proc multtest data=yourdata stouffer;

by testidno;

run;

This assumes that what you have as test1, test2, etc. is kept in the variable testidno, and that the dataset is sorted by testidno.

Steve Denham

MajaFerencakovic · Posted 12-19-2013 07:15 AM

Thanks Steve!

it looks like I am doing something wrong... I got response ERROR: No base test has been assigned, and ERROR: No class statement.

So I actually wonder am I trying to do what I actually don't understand. I have three p values from three cattle breeds and I would like to combine them. Is this method good one?

data_null__ · Posted 12-19-2013 07:36 AM

did you use PROC MULTTEST statement option INPVALUES.

INPVALUES=SAS-data-set names an input SAS data set that includes a variable containing raw p-values. The MULTTEST procedure adjusts the collection of raw p-values for multiplicity. Resampling-based adjustments are not permitted with this type of data input. The CLASS, CONTRAST, FREQ, STRATA, and TEST statements are ignored when an INPVALUES= data set is specified. The INPVALUES= and DATA= options cannot both be specified. The pvalue-name enables you to specify the name of the p-value column from your data set. By default, pvalue-name=’raw_p’. The INPVALUES= data set can contain variables in addition to the raw p-values variable; see Example 60.5 for an example.

SteveDenham · Posted 12-19-2013 09:02 AM

So I set up a test situation using the Stouffer option, and the data from Example 60.5, splitting into 6 testing groups. It does NOT return a single combined p value for grouped tests, but returns individual adjusted p values for each test. Perhaps you could view the maximum adjusted value as a combination.

data a;

input Test$ Raw_P @@;

datalines;

test01 0.28282 test02 0.30688 test03 0.71022

test04 0.77175 test05 0.78180 test06 0.88581

test07 0.54685 test08 0.84978 test09 0.24228

test10 0.58977 test11 0.03498 test12 0.41607

test13 0.31631 test14 0.05254 test15 0.45061

test16 0.75758 test17 0.12496 test18 0.49485

test19 0.21572 test20 0.50505 test21 0.94372

test22 0.81260 test23 0.77596 test24 0.36889

;

data aa;

set a;

grptest=mod(_n_,6);

run;

proc sort data=aa;

by grptest;

run;

ods trace on;

proc multtest inpvalues=aa stouffer;

by grptest;

run;

ods trace off;

As you can see from the output from this, there is an adjusted value for every test, and not a single value, although I think the maximum value may be the exact equivalent of the formulat you proposed. At this point, I would appeal to the independence of the p values, and use either the formula you proposed, or calculate a joint probability that all three tests reject their respective null hypotheses. Something like p_joint=1-((1-rawp_a)*(1-rawp_b)*(1-rawp_c)). You could loop either of these methods pretty easily to go through all 36789 loci (just guessing at what you are looking at). If the max adjusted value from the STOUFFER option matches your formula, then I think you are home free. I would suggest testing it out on three or four incidences.

You will have to reshape your data to use MULTTEST, I think, with a single raw_p, and additional variables testid and breed (taking on A, B, C). Take a look at the structure of dataset aa in the example above.

Steve Denham

MajaFerencakovic · Posted 12-19-2013 09:32 AM

Steve I don't know how to thank you

You guessed right. I have SNP data. I am estimating p for every SNP in three breeds.

When I was thinking about combining three values i used papers linked (http://www.biomedcentral.com/1471-2164/13/48 and http://www.plosone.org/article/info:doi/10.1371/journal.pone.0064280) . They were mentioning Stouffer. I looked in SAS and found this option. It looks like I can't use it, so I will calculate "by hand". Let me ask one more thing.

So if I will going to calculate stouffer by hand i should trnsform p value into z value. I believe I should use z = probit(1 -p_value).

Do you agree?

SteveDenham · Posted 12-19-2013 09:37 AM

That is the transformation I would use, and is the one found in the PROC MULTTEST documentation.

Steve Denham

MajaFerencakovic · Posted 12-19-2013 09:46 AM

Steve, thank you again!

Guys, I am still open for new suggestions!

Wish you all best!

Cheers!

Maja

SteveDenham · Posted 12-19-2013 11:06 AM

One more thing. Make sure that those raw p values are one-sided, so that the transformation to a z value is one-to-one. Here is my code for the example data in 60.5
/* Example 60.5 data */

data a;
   input Test$ Raw_P @@;
   datalines;
test01 0.28282    test02 0.30688    test03 0.71022
test04 0.77175    test05 0.78180    test06 0.88581
test07 0.54685    test08 0.84978    test09 0.24228
test10 0.58977    test11 0.03498    test12 0.41607
test13 0.31631    test14 0.05254    test15 0.45061
test16 0.75758    test17 0.12496    test18 0.49485
test19 0.21572    test20 0.50505    test21 0.94372
test22 0.81260    test23 0.77596    test24 0.36889
;

/* Create 6 groups of 4 pvals each */
data aa;
set a;
grptest=mod(_n_,6);
run;

/* Sort for bygroup processing */
proc sort data=aa;
by grptest;
run;

/* See what MULTTEST gives for Stouffer-Liptak, and see what datasets are made */
ods trace on;
proc multtest inpvalues=aa stouffer;
by grptest;
run;
ods trace off;

/* Transform pvals to z scores */

data b;
set aa;
z=quantile('NORMALl', 1-raw_p);
run;

/* Intermediate step to sum z scores within and across groups */

proc means noprint data=b;
class grptest;
var z;
output out=grpz sum=sumz;
run;

/* Calculate Stouffer combined value */

data grp_pval;
set grpz;
stouffer_p_val=1-cdf('NORMAL',sumz/sqrt(_freq_));
run;

Steve Denham

MajaFerencakovic · Posted 12-19-2013 01:05 PM

Awesome! My p values are two sided. Actually I obtained them from t values. Do you believe that should make one sided p? Or to apply different formula. Sorry for being borring but I am really happy someone is actually giving me useful advices.

SteveDenham · Posted 12-19-2013 01:44 PM

I would say plug in the t values for the z values. However, you may have different degrees of freedom associated with the t values within a SNP, so they may not be totally comparable. Therefore, I would use the cdf('T", t-value, df) to get one-sided p values, then feed them back through what we have above.

Steve Denham

MajaFerencakovic · Posted 12-19-2013 03:02 PM

Thank you

:smileycool:

Example 58.5 Inputting Raw p-Values

Classroom Training Available!