How to combine results from Wilcoxon Rank Sum Test for multiple imputed data sets from proc MI in SAS Endpoint information: We have seizure count collected for every day and therefore there will be some missing for some days. We got average seizure frequency per 28-day, for an interval. That is, (seizure acount for a interval)/ (days with available seizure count during the interval)*28. For example, baseline period (28 days) DB period (99 days). Then the endpoint is percent change from baseline in seizure frequency. (per 28-day seizure frequency during DB - per 28-day seizure frequency during baseline)/(per 28-days seizure frequency during baseline) 100% . We will impute seizure count for each day if it is missing. So we will have m (say 10) imputed seizure count data sets. Q1. After imputation, we plan calculate the endpoint for each imputed data, is this correct? Can we stack all the 10 data sets and then calculate the endpoint? Q2. Assume we calculate the endpoint for each imputed data. Then do Wilcoxon Rank Sum test. We will have 10 p-values and 10 corresponding 'z' values, etc. How should we combine them together to get one pooled p-value? How should we make inferences based on the 10 imputed data sets? Thanks. Janet Thanks a lot for your clear explanation. So, firstly, I know I should not consider do analysis on pooled imputed data sets but do analysis separately. Secondly, I have read some about Rubin's rule these days. But your summary is so clear that I understand much better. Third, some 'exact' method is a 'research' till this moment. But for Rubin's rule, from Wilcoxon Rank Sum Test output, which variable should I put into proc MIanalysis, 'z', S, or sumofscore, sumofscore - expectofSum, not think over yet, Any suggestion? Thanks again Hi Season: I read two your replies. It is so informative and both you and Dave have so many knowledges. I benefit from those a lot. Really appreciate. I saved this discussion. These days, I have been searching and read for this issue, that is, pooling results from Wilcoxon Rand Sum test after getting analysis result from m multiple imputaed data sets. So far, it seems there is no consensus solution online. 1) I found online post, "But you will also not be able to use MIANALYZE to combine the nonparametric test but instead will need to combine the actual Chi-Square test statistics", and referred a macro from Allision, https://www.sas.upenn.edu/~allison/combchi.sas. This method looks like it is just one of methods your mentioned. It is for chi-square. 2) Maybe there are some R packages. But I have not identified a specific one yet. 3) I basically agree with you on "In your code, the variable you pooled via PROC MIANALYZE was sum-of-rank, which may violate the rationale of Rubin's rule of pooling estimands, since Rubin's rule was based upon asymptotic normal distribution of the pooled estimand. In Wilcoxon sum-of-rank test, it is the z-statistic rather than the sum of ranks that follow an asymptotic normal distribution. Therefore, we should pool the z-statistics instead." 4) I strongly believe that 'z' from Wilcoxon rank-sum test follows standard normal well. z ~ normal (0, 1), as you wrote, the sigma is just 1. If we have many imputed data (say 100), I have thought to run proc univariate to see if 'z' follows a standard normal. 5) I have tried this method, put 'z' and stderr with '1' into PROC MIANALYZE on my data. Below is what I cannot completely agree with you for the above. In the output, the estimate of the 'z' is, as everyone knows, just simple arithmetis mean. There is a "t for H0, parameter = Theta0"; under it, the value is kind of close to the estimate of 'z'. There is a p-value of P>|t|. So, I sensed, this p-value is assuming the average of 'z' follows a non-central t distribution with non-central parameter of Theta0 under H0? If my understanding is correct, then I doubt this p-value is the 'pooled' p-value we want. Because what we want is a 'best 'z', following normal. Our pooled p-value should from the 'best' z from normal distribution directly. I would think just using the average 'z' to get p-value from normal distribution is a reasonable solution. 6) From #5 above, it goes back to my initial thinking in my question. I am trying to get a ‘pooled’ statistic (later I thought of, 'z' can be used directly, same as your thought.). a 'pooled sum of score' from each data set, a NEW expected sum of score, a pooled std under H0, etc. idea is not mature. Again, thanks a lot.
... View more