05-17-2017 12:38 PM
I am running some simulations in SAS, a component of which involves the Wilcoxon rank-sum test to compare two "treatment" groups, which I am implementing with PROC NPAR1WAY. Specifically, I am extracting the value of the Z statistic for the Normal approximation for the Wilcoxon test for a series of specified alternative hypotheses.
However, what I noticed was that, between simulations, the SIGN of the Z statistic was switching from negative to positive, despite the fixed (positive) effect size (and a large sample size). Upon further investigation, it appears that that PROC NPAR1WAY is changing the "reference" group against which it is deciding to calculate the Z statistic. It always chooses the first value of the class variable that appears in the dataset as the group for which the Z statistic is calculated (that is, if the the group associated with that first value has a positive effect, it gets a positive Z statistic, and similarly if it has a negative effect).
Here is some toy data demonstrating this behavior:
do i = 1 to 100;
if i<=50 then do;
y = rand('Normal',0,1);
trt = 'A';
y = rand('Normal',1,1);
trt = 'B';
PROC SORT data=A_ref out=B_ref;
by descending trt;
PROC NPAR1WAY data=A_ref wilcoxon;
PROC NPAR1WAY data=B_ref wilcoxon;
You can see that the only difference between "A_ref" and "B_ref" is the order in which the treatment groups appear in the dataset.
And as you can then see from the PROC NPAR1WAY calls, the results are identical, except for the sign of the Z statistic (and the order in which the boxplots appear in the histogram, etc.).
Now, after viewing the documentation for PROC NPAR1WAY, I can't seem to find a way to set the class order within the proc itself. There is no "order" option in the proc OR class statements. Is there a way to do this that I am missing? Obviously I can manually sort the datasets before the call to make sure they are in the order I want, but this seems a bit clunky, especially since so many other SAS procedures allow for custom ordering of class variables within the proc.
05-17-2017 01:48 PM
That actually is what I have been using for now. The main reason I am hesitant to use this going forward is that in the future we may actually be interested in differentiating between "negative" and "positive" results (not so much for large effect sizes, but for small or null effect sizes, we may be interested in situations where, due to random sampling variability, a test statistic indicates a "negative" effect, since the context of our simulation is in group sequential analyses with different types of stopping rules).