Hello, Dave. Despite my continuous effort on the very specific issue of pooling Wilcoxon test results in the past month, I found joining the conversation here still fruitful. It suddenly dawned upon me that the methods I mentioned may be too complicated, a small modification of your method may be a good choice. Still, I have some issues regarding to your code.
(1) Combine sum-of-rank or the z-statistic? In your code, the variable you pooled via PROC MIANALYZE was sum-of-rank, which may violate the rationale of Rubin's rule of pooling estimands, since Rubin's rule was based upon asymptotic normal distribution of the pooled estimand. In Wilcoxon sum-of-rank test, it is the z-statistic rather than the sum of ranks that follow an asymptotic normal distribution. Therefore, we should pool the z-statistics instead.
(2) Potential necessity to specify the EDF= option in PROC MIANALYZE. I wonder if you forgot to specify the EDF= option to override the infinite degrees of freedom defaulted by PROC MIANALYZE.
So, in conclusion, I think the most convenient way of pooling results of Wilcoxon sum-of-rank tests is as follows: (1) Obtain the z-statistic of each imputed sample; (2) Pool them via PROC MIANALYZE; (3) Obtain the results.
The rationale is as follows: now that the z-statistic correspond to the departure from null hypothesis in each sample and that a z-statistic of 0 stands for not rejecting the null hypothesis. Pooling the Wilcoxon test results translates into a one-sample t-test problem. That is: we have M sample values of a certain statistic (in this case it is the z-statistic) following an asymptotic normal distribution, we would like to see if the population mean of the statistic is 0. The pooling of imputed sample z-statistics is no different than pooling imputed sample means or standard deviations in multiple imputation, which can be easily done in PROC MIANALYZE.
The biggest challenge in doing so is to ascertain the standard error of each z-statistic, which is required by PROC MIANALYZE. I had no idea how to compute it in the first place given that we only have one z-statistic each sample, so it would be impossible to compute neither the sample standard deviation nor the sample standard error. But Licht's work enlightened me by pointing out that the z-statistics generated from Wilcoxon sum-or-rank tests essentially follow a standard normal distribution. In the case of pooling the z-statistics, each sample only computes one z-statistic, so all of the population standard errors of the z-statistics are 1/sqrt(1)=1. That problem was solved! We can simply apply a code instructing SAS to add a row of all 1s and use this row as the standard errors.
Now we finally discuss the EDF= issue. Admittedly, I have not read any literature introducing the concept of effective degrees of freedom in multiple imputation aside from those pertaining to SAS, and I found the explanation SAS Help provided still not that clear. So I also wonder the exact definition of EDF and whether we should specify this option here. From my view, I think it unnecessary to specify the EDF= option here, given that the EDF= option stands for the degrees of freedom of each and every statistic combined. Now that (1) the z-statistics follow a standard normal distribution to which the concept of degrees of freedom does not apply and (2) the t distribution is also asymptotically standard normal, perhaps we can deem each z-statistic as having infinite degrees of freedom, which is the default of PROC MIANALYZE. There is therefore no need to correct the effective degrees of freedom to a finite value.
... View more