I'm conducting a Hodges-Lehmann Estimation with NPAR1WAY Procedure. My data has contains duplicate values so i tried to fix it and conduct exact test with EXACT statement: I added 0.00000001 to each duplicate. Later, I gave up on the idea of an exact test, but I noticed that my results (my purpose is CI) for the procedure are differ.
PROC NPAR1WAY data=data1 HL;
CLASS treat;
VAR response;
ODS SELECT HodgesLehmann;
RUN;
PROC NPAR1WAY data=data2 HL;
CLASS treat;
VAR response;
ODS SELECT HodgesLehmann;
RUN;
Then I do the same in R, for both sets of data, I get the values (-1.8819 0.0556).
I also used macro for Hodges-Lehmann Estimation which I wrote based on the documentation and also got (-1.8819 0.0556) for both data sets.
Can anybody please explain me why is such a big defference in results with such a small increase?
Is it some nuance in proc npar1way which is triggered if there are no repetitions in the data??
Like what's already been mentioned, the PROC NPAR1WAY computation takes tied values into account. The difference between the macro results and the NPAR1WAY results for Hodges-Lehmann confidence limits might be due to the computation of Var_0(S), which is the null-hypothesis variance of the Wilcoxon rank sum statistic. The macro might be using the expression for Var_0(S) that's appropriate when there are no ties, while NPAR1WAY uses the more general expression that accounts for ties.
For details about how PROC NPAR1WAY computes Var_0(S), see the documentation section Hodges-Lehmann Estimation (paragraph 5) and the section Simple Linear Rank Tests for Two-Sample Data (last paragraph). The reference is Randles and Wolfe (1979).
Can anybody please explain me why is such a big defference in results with such a small increase?
Since I don't have your data, I would imagine that the reason for the difference is that NPAR1WAY does not rely on linearity (like in a linear regression). Duplicates, or very near duplicates where the difference is 0.00001, increase the sum of the ranks in each category, and so its not a 0.00001 change, its a unit change.
Thank you so much for response!
I don't quite understand, but is the sum of ranks used in the calculation of confidence intervals for H-L estimated shift? I mean, we take ranges themselves and values with ranks. And we need sum of ranks to calculate p-value. 🤔
In addition, the wilcox.test function in R gives the same results for both data sets, which confuses me..
I attach my files, maybe this will help.
I can't tell you what R is doing, I don't know if it is doing the same things as PROC NPAR1WAY, and sometimes they are not doing the same things, and so different results are not a problem (if only I knew what R is doing, which I don't). Furthermore, there's no reason to believe software A gives the right answer and software B is wrong, without a lot more evidence.
NPAR1WAY uses ranks, and sums them to do statistical tests. So changing a variable's value by 0.000001 looks like a tiny change, but the rank changes by 1. Not a tiny change.
Calling @StatDave
When I read the documentation for the HL test in NPAR1WAY exact confidence limits there is a bit about "tied" values use a 1/2, for the function of (y,x) values and 1 where y>x. and 0 where y<x. So by adding that small amount you changed the function result for some of your data pairs by 0.5 per pair.
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_npar1way_details19.htm
Start reading at Exact Confidence Limits
Like what's already been mentioned, the PROC NPAR1WAY computation takes tied values into account. The difference between the macro results and the NPAR1WAY results for Hodges-Lehmann confidence limits might be due to the computation of Var_0(S), which is the null-hypothesis variance of the Wilcoxon rank sum statistic. The macro might be using the expression for Var_0(S) that's appropriate when there are no ties, while NPAR1WAY uses the more general expression that accounts for ties.
For details about how PROC NPAR1WAY computes Var_0(S), see the documentation section Hodges-Lehmann Estimation (paragraph 5) and the section Simple Linear Rank Tests for Two-Sample Data (last paragraph). The reference is Randles and Wolfe (1979).
It helps, thank you very much! I managed to get the identical value after adding the ties when calculating the sum of ranks in the macro.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.