Solved: Re: Proc univariate confidence interval and p-value mismatch

JoakimE · Posted 01-10-2022 09:42 AM

Hi,

I've been using the following code to calculate Wilcoxon signed rank p-values with corresponding (or at least what I think is corresponding...) confidence intervals:

ods output TestsForLocation=wilcoxon;
proc univariate data=comp normal cipctldf;
by param;
var Diff;
output out=pctl pctlpts=50 pctlpre=p cipctldf=(lowerpre=LCL upperpre=UCL);
histogram Diff / normal;
run;

However, I find that for some of my by-varaibles I get a signficant p-value, i.e. p-value < 0.05, but the confidence interval for the median (calculated using pctlpts=50 and the cipctldf option) covers 0, i.e. the lower limit for the group difference is below 0 and the higher is limit is above 0. How is that possible? I feel like I'm missing something here.

KR,

Joakim

Rick_SAS · Posted 01-10-2022 02:25 PM

I think the answer is that you are using two different tests, so there is no reason to expect them to agree on any one particular sample. Think about tests for normality: given a sample, some tests might reject the hypothesis of normality whereas another test might fail to reject.

I assume you are using the "Signed Rank" statistic (S) and its associated p value. The Wilcoxon signed rank test is a nonparametric test and uses ranks of the data to compute the statistic for the hypothesis Median=0. It assumes that the data distribution is symmetric.

The "distribution-free" confidence limits are also nonparametric and use ranks, as described by Hahn and Meeker (1991).

Although the p-values and the distribution-free CIs are both nonparametric, they are not using the same formulas. Therefore, it is possible that the p-value is significant but the CI does not contain zero, or vice-versa.

Said differently, the 95% CI is expected to contain 0 on 95% of random samples from a (symmetric) distribution for which the true median is 0. The test statistic is also expected to conclude that Median=0 for 95% of random samples. But there is no guarantee that the two statistics will give the same conclusion on any given sample. Intuitively, there should be some overlap, but if you simulate many samples, it is possible that the samples rejected by the CI test are not the same as the samples rejected by the signed-rank test.

View solution in original post

PaigeMiller · Posted 01-10-2022 09:54 AM

@JoakimE wrote:

However, I find that for some of my by-varaibles I get a signficant p-value, i.e. p-value < 0.05

For what hypothesis test do you find a significant p-value?

--
Paige Miller

Rick_SAS · Posted 01-10-2022 02:25 PM

I think the answer is that you are using two different tests, so there is no reason to expect them to agree on any one particular sample. Think about tests for normality: given a sample, some tests might reject the hypothesis of normality whereas another test might fail to reject.

I assume you are using the "Signed Rank" statistic (S) and its associated p value. The Wilcoxon signed rank test is a nonparametric test and uses ranks of the data to compute the statistic for the hypothesis Median=0. It assumes that the data distribution is symmetric.

The "distribution-free" confidence limits are also nonparametric and use ranks, as described by Hahn and Meeker (1991).

Although the p-values and the distribution-free CIs are both nonparametric, they are not using the same formulas. Therefore, it is possible that the p-value is significant but the CI does not contain zero, or vice-versa.

Said differently, the 95% CI is expected to contain 0 on 95% of random samples from a (symmetric) distribution for which the true median is 0. The test statistic is also expected to conclude that Median=0 for 95% of random samples. But there is no guarantee that the two statistics will give the same conclusion on any given sample. Intuitively, there should be some overlap, but if you simulate many samples, it is possible that the samples rejected by the CI test are not the same as the samples rejected by the signed-rank test.

Rick_SAS · Posted 01-10-2022 02:39 PM

And here's a program that shows that my intuition is correct. The program simulates 100 samples from the normal distribution. For 97 samples, the Wilcoxon test and the CIs make the same conclusion to reject or fail to reject the H0. However, there are two samples that the Wilcoxon test rejects but the CI does not, and there is one test that the CI test rejects but the Wilcoxon test does not.

/* simulate normal data */
data comp;
call streaminit(123);
do param=1 to 100;
   do i = 1 to 30;
      Diff = rand("Normal", 0);
      output;
   end;
end;
run;

ods exclude all;
proc univariate data=comp normal cipctldf mu0=0;
   by param;
   var Diff;
   output out=pctl pctlpts=50 pctlpre=p cipctldf=(lowerpre=LCL upperpre=UCL);
   ods output TestsForLocation=wilcoxon(where=(Test='Signed Rank'));
run;
ods exclude none;

data RejectWilcoxon;
set wilcoxon;
WReject = (pValue<0.05);
keep param pValue WReject;
run;

data RejectCI;
set Pctl;
CIReject = (LCL50>0 | UCL50<0);
run;

data All;
merge RejectWilcoxon RejectCI;
by param;
run;

proc freq data=All;
tables WReject*CIReject / norow nocol nopercent;
run;
data comp;
call streaminit(123);
do param=1 to 100;
do i = 1 to 30;
   Diff = rand("Normal", 0);
   output;
end;
end;
run;

ods exclude all;
proc univariate data=comp normal cipctldf mu0=0;
by param;
var Diff;
output out=pctl pctlpts=50 pctlpre=p cipctldf=(lowerpre=LCL upperpre=UCL);
ods output TestsForLocation=wilcoxon(where=(Test='Signed Rank'));
run;
ods exclude none;

data RejectWilcoxon;
set wilcoxon;
WReject = (pValue<0.05);
keep param pValue WReject;
run;

data RejectCI;
set Pctl;
CIReject = (LCL50>0 | UCL50<0);
run;

data All;
merge RejectWilcoxon RejectCI;
by param;
run;

proc freq data=All;
tables WReject*CIReject / norow nocol nopercent;
run;

JoakimE · Posted 01-11-2022 09:37 AM

Thanks Rick, that seems reasonable to me. And you are correct, I am comparing the Hahn and Meeker CIs with the Wilcoxon signed rank p-value. Clearly the two should not be presented together as the conclusion could be different for the two methods.

Ok, so a follow-up question. Is there a corresponding CI I could use coupled with the signed rank p-value? Is there some option for that in proc univariate?

Rick_SAS · Posted 01-11-2022 10:02 AM

> a follow-up question. Is there a corresponding CI I could use coupled with the signed rank p-value? Is there some option for that in proc univariate?

I don't think so. The signed-rank test just adds up a bunch of numbers based on the relative size of the data. You get the p value by comparing the observed statistic with the distribution of the statistic under the null hypothesis. This does not give a confidence interval.

I assume this is some sort of simulation study? I think you could report that the signed-rank test rejected the null hypothesis on XX% of the simulated samples and that the distribution-free CIs did not contain 0 in YY% of the samples. Presumably, those numbers will be close, such as 94.6% and 95.1%. Or, you can choose to report only one of the tests.

JoakimE · Posted 01-12-2022 09:01 AM

Thanks for your input Rick! This helps a lot!

Rick_SAS · Posted 02-01-2022 08:42 AM

This question is a special case for the general questions, "Can two hypothesis tests differ? Can one reject the null hypothesis and the other fail to reject?" I wrote some thoughts on these questions. The answer is, yes. This can and does happen, and it is not uncommon.

How often do different statistical tests agree? A simulation study

SAS Innovate 2025: Call for Content