Solved: Re: Derivation of Wilcoxon Signed Rank test p-value for small n

pywils · Posted 01-25-2022 08:37 PM

Hello,

I am having difficulty understanding how SAS derives the p value for the Wilcoxon Signed Rank test for small n.

I understand how to get S, the test statistic for the Wilcoxon Signed Rank Test (see documentation here). However, I don't understand how to go from S to the pvalue. The documentation (link again) says, for small n, "the significance of S is computed from the exact distribution of S, where the distribution is a convolution of scaled binomial distributions." What exactly does that mean?

Here is an example:

data test3;
input v1
	  v2;
datalines;
2 1
1 0 
3 0 
1 0 
;

data test4;
	set test3;
	diffVar = v1 - v2;
run; 

proc univariate data = test4;
	var diffVar;
run;

We see that S = 5, but why is Pr >= |S| = 0.1250?

Thank you!

FreelanceReinh · Posted 01-26-2022 07:12 AM

Hello @pywils,

Interesting question. Looking into pp. 129 f. of the book by Lehmann and D’Abrera (1975) the documentation refers to, I think the argument (applied to your example) is as follows:

The ranks of the four diffVar values 1, 1, 3, 1 are 2, 2, 4, 2, respectively (where 2 is the average rank of the original ranks 1, 2 and 3, which are assigned to the three tied values diffVar=1). Under the null hypothesis (of symmetry about 0) the sign of each of the diffVar values could be positive or negative with probability 1/2. Due to independence, the 2**4=16 possible sign combinations are equally likely (probability 1/16).

The first term in the formula of the Wilcoxon signed-rank test statistic S is the sum of the ranks of positive differences:

Positive ranks	None	2	4	2, 2	2, 4	2, 2, 2	2, 2, 4	2, 2, 2, 4
Probability	1/16	3/16	1/16	3/16	3/16	1/16	3/16	1/16

For example, 3 of the 16 possible sign combinations yield one positive rank 2 and one positive rank 4, hence the probability 3/16. Each of the four ranks 2, 2, 4, 2 contributes its value -- independently -- with probability 1/2 to the sum. So their distributions are Bernoulli distributions (binomial with parameters n=1 and p=1/2), but with a "scale" factor of 2 or 4, respectively. The distribution of a sum of independent random variables is called a convolution. A simulation of this "convolution of scaled binomial distributions" and then S (after subtracting the expectation 4*(4+1)/4=5) could look like this:

data sim;
call streaminit(314159);
array r[4] _temporary_;
do i=1 to 1e6;
  r[1]=2*rand('bern',0.5);
  r[2]=2*rand('bern',0.5);
  r[3]=4*rand('bern',0.5);
  r[4]=2*rand('bern',0.5);
  S=sum(of r[*])-5;
  output;
end;
run;

proc freq data=sim;
tables S;
run;

Your observed case "2, 2, 2, 4" yields S=5 and is the only one with S>=5, hence the two-sided p-value is twice the probability of obtaining S=5, i.e. 2*1/16=1/8=0.125.

View solution in original post

FreelanceReinh · Posted 01-26-2022 07:12 AM

Hello @pywils,

Interesting question. Looking into pp. 129 f. of the book by Lehmann and D’Abrera (1975) the documentation refers to, I think the argument (applied to your example) is as follows:

The ranks of the four diffVar values 1, 1, 3, 1 are 2, 2, 4, 2, respectively (where 2 is the average rank of the original ranks 1, 2 and 3, which are assigned to the three tied values diffVar=1). Under the null hypothesis (of symmetry about 0) the sign of each of the diffVar values could be positive or negative with probability 1/2. Due to independence, the 2**4=16 possible sign combinations are equally likely (probability 1/16).

The first term in the formula of the Wilcoxon signed-rank test statistic S is the sum of the ranks of positive differences:

Positive ranks	None	2	4	2, 2	2, 4	2, 2, 2	2, 2, 4	2, 2, 2, 4
Probability	1/16	3/16	1/16	3/16	3/16	1/16	3/16	1/16

For example, 3 of the 16 possible sign combinations yield one positive rank 2 and one positive rank 4, hence the probability 3/16. Each of the four ranks 2, 2, 4, 2 contributes its value -- independently -- with probability 1/2 to the sum. So their distributions are Bernoulli distributions (binomial with parameters n=1 and p=1/2), but with a "scale" factor of 2 or 4, respectively. The distribution of a sum of independent random variables is called a convolution. A simulation of this "convolution of scaled binomial distributions" and then S (after subtracting the expectation 4*(4+1)/4=5) could look like this:

data sim;
call streaminit(314159);
array r[4] _temporary_;
do i=1 to 1e6;
  r[1]=2*rand('bern',0.5);
  r[2]=2*rand('bern',0.5);
  r[3]=4*rand('bern',0.5);
  r[4]=2*rand('bern',0.5);
  S=sum(of r[*])-5;
  output;
end;
run;

proc freq data=sim;
tables S;
run;

Your observed case "2, 2, 2, 4" yields S=5 and is the only one with S>=5, hence the two-sided p-value is twice the probability of obtaining S=5, i.e. 2*1/16=1/8=0.125.

pywils · Posted 01-27-2022 07:14 PM

Thank you!

ballardw · Posted 01-26-2022 11:35 AM

When you get something like "the significance of S is computed from the exact distribution of S". That means that ALL permutations of the values are created. Then you count the numbers that have x or more of interest.

For a smaller sample (4) you can look at this:

data small;
   do i=0,1;
   do j=0,1;
   do k=0,1;
   do l=0,1;
      samp = cats(i,j,k,l);
      num =sum(i,j,k,l);
      output;
   end;
   end;
   end;
   end;
run;

proc freq data=small;
   tables samp num;
run;

Each of the results has the same probability of occuring 0.0625 (6.25%). So you can look at the totals of the 1(the + in sign rank) and see that the probability of getting 3 or more, as an example, is the sum of ways you get 3 or 4. 25+6.25% or 31.25. or P(s >=3)=.3125 , P(s=4)=.0625.

Left as an exercise is to create the distribution for 5 elements and see what P(s=5) comes out as.

Yes there is more efficient coding for larger sample sizes. The above is very easy to follow for beginners.

Derivation of Wilcoxon Signed Rank test p-value for small n

Re: Derivation of Wilcoxon Signed Rank test p-value for small n

Re: Derivation of Wilcoxon Signed Rank test p-value for small n

Re: Derivation of Wilcoxon Signed Rank test p-value for small n

Re: Derivation of Wilcoxon Signed Rank test p-value for small n

Ready to join fellow brilliant minds for the SAS Hackathon?