BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MajaFerencakovic
Fluorite | Level 6

Hi!

I have a question about input file for stouffer-liptak correction. So it is proc MULTTEST with option stouffer, but input file with p values... How this one should look like? raw_p values in column or?

Some additional stuff?

Anyone used this one.

My data set looks like this:

test1 raw_p1a raw_p1b raw_p1c

.

.

.

test36789 raw_p36789a  raw_p36789b  raw_p36789c

thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

That is the transformation I would use, and is the one found in the PROC MULTTEST documentation.

Steve Denham

View solution in original post

13 REPLIES 13
ballardw
Super User

From SAS 9.2.3 online help and example of using an input set of p-values

The MULTTEST Procedure

 

Example 58.5 Inputting Raw p-Values

 

data a;

input Test$ Raw_P @@;

datalines;

test01 0.28282 test02 0.30688 test03 0.71022

test04 0.77175 test05 0.78180 test06 0.88581

test07 0.54685 test08 0.84978 test09 0.24228

test10 0.58977 test11 0.03498 test12 0.41607

test13 0.31631 test14 0.05254 test15 0.45061

test16 0.75758 test17 0.12496 test18 0.49485

test19 0.21572 test20 0.50505 test21 0.94372

test22 0.81260 test23 0.77596 test24 0.36889

;

proc multtest inpvalues=a holm hoc fdr;

run;


MajaFerencakovic
Fluorite | Level 6

Thanks!

I am aware of this one but this actually produce one new value for each raw_p. Maybe I am wrong but isn't it that stouffer should produce one new p from multiple p? In my case I would like to get new p from raw_pa, raw_pb and raw_pc for every test I have.

Like in formula Z=sum(z(pi)/sqrt k) where Z is thr the standard normal variable under h0, z(pi) is the p value from test i transformed to z and k is number of tests. Truth is that I could do this by hand but then I noticed this option stouffer and hoped to write in the paper used this from SAS instead of formula 😉

SteveDenham
Jade | Level 19

Try:

proc multtest data=yourdata stouffer;

by testidno;

run;

This assumes that what you have as test1, test2, etc. is kept in the variable testidno, and that the dataset is sorted by testidno.

Steve Denham

MajaFerencakovic
Fluorite | Level 6

Thanks Steve!

it looks like  I am doing something wrong... I got response ERROR: No base test has been assigned, and ERROR: No class statement.

So I actually wonder am I trying to do what I actually don't understand. I have three p values from three cattle breeds and I would like to combine them. Is this method good one?

data_null__
Jade | Level 19

did you use PROC MULTTEST statement option INPVALUES.

INPVALUES=SAS-data-set names an input SAS data set that includes a variable containing raw p-values. The MULTTEST procedure adjusts the collection of raw p-values for multiplicity. Resampling-based adjustments are not permitted with this type of data input. The CLASS, CONTRAST, FREQ, STRATA, and TEST statements are ignored when an INPVALUES= data set is specified. The INPVALUES= and DATA= options cannot both be specified. The pvalue-name enables you to specify the name of the p-value column from your data set. By default, pvalue-name=’raw_p’. The INPVALUES= data set can contain variables in addition to the raw p-values variable; see Example 60.5 for an example.

SteveDenham
Jade | Level 19

So I set up a test situation using the Stouffer option, and the data from Example 60.5, splitting into 6 testing groups.  It does NOT return a single combined p value for grouped tests, but returns individual adjusted p values for each test.  Perhaps you could view the maximum adjusted value as a combination.

data a;

   input Test$ Raw_P @@;

   datalines;

test01  0.28282    test02  0.30688    test03  0.71022   

test04  0.77175    test05  0.78180    test06  0.88581   

test07  0.54685    test08  0.84978    test09  0.24228   

test10  0.58977    test11  0.03498    test12  0.41607   

test13  0.31631    test14  0.05254    test15  0.45061   

test16  0.75758    test17  0.12496    test18  0.49485   

test19  0.21572    test20  0.50505    test21  0.94372   

test22  0.81260    test23  0.77596    test24  0.36889   

;

data aa;

set a;

grptest=mod(_n_,6);

run;

proc sort data=aa;

by grptest;

run;

ods trace on;

proc multtest inpvalues=aa stouffer;

by grptest;

run;

ods trace off;

As you can see from the output from this, there is an adjusted value for every test, and not a single value, although I think the maximum value may be the exact equivalent of the formulat you proposed.  At this point, I would appeal to the independence of the p values, and use either the formula you proposed, or calculate a joint probability that all three tests reject their respective null hypotheses.  Something like p_joint=1-((1-rawp_a)*(1-rawp_b)*(1-rawp_c)).  You could loop either of these methods pretty easily to go through all 36789 loci (just guessing at what you are looking at).  If the max adjusted value from the STOUFFER option matches your formula, then I think you are home free.  I would suggest testing it out on three or four incidences.

You will have to reshape your data to use MULTTEST, I think, with a single raw_p, and additional variables testid and breed (taking on A, B, C).  Take a look at the structure of dataset aa in the example above.

Steve Denham

MajaFerencakovic
Fluorite | Level 6

Steve I don't know how to thank you Smiley Wink

You guessed right. I have SNP data. I am estimating p for every SNP in three breeds.

When I was thinking about combining three values i used papers linked (http://www.biomedcentral.com/1471-2164/13/48 and http://www.plosone.org/article/info:doi/10.1371/journal.pone.0064280) . They were mentioning Stouffer. I looked in SAS and found this option. It looks like I can't use it, so I will calculate "by hand". Let me ask one more thing.

So if I will going to calculate stouffer by hand i should trnsform p value into z value. I believe I should use z = probit(1 -p_value).

Do you agree?

SteveDenham
Jade | Level 19

That is the transformation I would use, and is the one found in the PROC MULTTEST documentation.

Steve Denham

MajaFerencakovic
Fluorite | Level 6

Steve, thank you again!

Guys, I am still open for new suggestions!

Wish you all best!

Cheers! Smiley Happy

Maja

SteveDenham
Jade | Level 19

One more thing.  Make sure that those raw p values are one-sided, so that the transformation to a z value is one-to-one.  Here is my code for the example data in 60.5
/* Example 60.5 data */

data a;
   input Test$ Raw_P @@;
   datalines;
test01  0.28282    test02  0.30688    test03  0.71022   
test04  0.77175    test05  0.78180    test06  0.88581   
test07  0.54685    test08  0.84978    test09  0.24228   
test10  0.58977    test11  0.03498    test12  0.41607   
test13  0.31631    test14  0.05254    test15  0.45061   
test16  0.75758    test17  0.12496    test18  0.49485   
test19  0.21572    test20  0.50505    test21  0.94372   
test22  0.81260    test23  0.77596    test24  0.36889   
;

/* Create 6 groups of 4 pvals each */
data aa;
set a;
grptest=mod(_n_,6);
run;

/* Sort for bygroup processing */
proc sort data=aa;
by grptest;
run;

/* See what MULTTEST gives for Stouffer-Liptak, and see what datasets are made */
ods trace on;
proc multtest inpvalues=aa stouffer;
by grptest;
run;
ods trace off;

/* Transform pvals to z scores */

data b;
set aa;
z=quantile('NORMALl', 1-raw_p);
run;

/* Intermediate step to sum z scores within and across groups */

proc means noprint data=b;
class grptest;
var z;
output out=grpz sum=sumz;
run;

/* Calculate Stouffer combined value */

data grp_pval;
set grpz;
stouffer_p_val=1-cdf('NORMAL',sumz/sqrt(_freq_));
run;

Steve Denham

MajaFerencakovic
Fluorite | Level 6

Awesome! My p values are two sided. Actually I obtained them from t values. Do you believe that should make one sided p? Or to apply different formula. Sorry for being borring but I am really happy someone is actually giving me useful advices.

SteveDenham
Jade | Level 19

I would say plug in the t values for the z values.  However, you may have different degrees of freedom associated with the t values within a SNP, so they may not be totally comparable.  Therefore, I would use the cdf('T", t-value, df) to get one-sided p values, then feed them back through what we have above.

Steve Denham

MajaFerencakovic
Fluorite | Level 6

Thank you

:smileycool:

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1328 views
  • 6 likes
  • 4 in conversation