Solved: Re: Hodges-Lehmann Estimation mystery

bongo · Posted 06-27-2021 04:59 PM

I'm conducting a Hodges-Lehmann Estimation with NPAR1WAY Procedure. My data has contains duplicate values so i tried to fix it and conduct exact test with EXACT statement: I added 0.00000001 to each duplicate. Later, I gave up on the idea of an exact test, but I noticed that my results (my purpose is CI) for the procedure are differ.

PROC NPAR1WAY data=data1 HL;
CLASS treat;
VAR response;
ODS SELECT HodgesLehmann;
RUN;
PROC NPAR1WAY data=data2 HL;
CLASS treat;
VAR response;
ODS SELECT HodgesLehmann;
RUN;

Then I do the same in R, for both sets of data, I get the values (-1.8819 0.0556).

I also used macro for Hodges-Lehmann Estimation which I wrote based on the documentation and also got (-1.8819 0.0556) for both data sets.

Can anybody please explain me why is such a big defference in results with such a small increase?

Is it some nuance in proc npar1way which is triggered if there are no repetitions in the data??

Watts · Posted 06-29-2021 09:48 AM

Like what's already been mentioned, the PROC NPAR1WAY computation takes tied values into account. The difference between the macro results and the NPAR1WAY results for Hodges-Lehmann confidence limits might be due to the computation of Var_0(S), which is the null-hypothesis variance of the Wilcoxon rank sum statistic. The macro might be using the expression for Var_0(S) that's appropriate when there are no ties, while NPAR1WAY uses the more general expression that accounts for ties.

For details about how PROC NPAR1WAY computes Var_0(S), see the documentation section Hodges-Lehmann Estimation (paragraph 5) and the section Simple Linear Rank Tests for Two-Sample Data (last paragraph). The reference is Randles and Wolfe (1979).

View solution in original post

PaigeMiller · Posted 06-27-2021 06:05 PM

Can anybody please explain me why is such a big defference in results with such a small increase?

Since I don't have your data, I would imagine that the reason for the difference is that NPAR1WAY does not rely on linearity (like in a linear regression). Duplicates, or very near duplicates where the difference is 0.00001, increase the sum of the ranks in each category, and so its not a 0.00001 change, its a unit change.

--
Paige Miller

bongo · Posted 06-28-2021 10:48 AM

Thank you so much for response!

I don't quite understand, but is the sum of ranks used in the calculation of confidence intervals for H-L estimated shift? I mean, we take ranges themselves and values with ranks. And we need sum of ranks to calculate p-value. 🤔

In addition, the wilcox.test function in R gives the same results for both data sets, which confuses me..

I attach my files, maybe this will help.

PaigeMiller · Posted 06-28-2021 10:53 AM

I can't tell you what R is doing, I don't know if it is doing the same things as PROC NPAR1WAY, and sometimes they are not doing the same things, and so different results are not a problem (if only I knew what R is doing, which I don't). Furthermore, there's no reason to believe software A gives the right answer and software B is wrong, without a lot more evidence.

NPAR1WAY uses ranks, and sums them to do statistical tests. So changing a variable's value by 0.000001 looks like a tiny change, but the rank changes by 1. Not a tiny change.

--
Paige Miller

Ksharp · Posted 06-28-2021 08:17 AM

Calling @StatDave

ballardw · Posted 06-28-2021 12:29 PM

When I read the documentation for the HL test in NPAR1WAY exact confidence limits there is a bit about "tied" values use a 1/2, for the function of (y,x) values and 1 where y>x. and 0 where y<x. So by adding that small amount you changed the function result for some of your data pairs by 0.5 per pair.

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_npar1way_details19.htm

Start reading at Exact Confidence Limits

Watts · Posted 06-29-2021 09:48 AM

Like what's already been mentioned, the PROC NPAR1WAY computation takes tied values into account. The difference between the macro results and the NPAR1WAY results for Hodges-Lehmann confidence limits might be due to the computation of Var_0(S), which is the null-hypothesis variance of the Wilcoxon rank sum statistic. The macro might be using the expression for Var_0(S) that's appropriate when there are no ties, while NPAR1WAY uses the more general expression that accounts for ties.

For details about how PROC NPAR1WAY computes Var_0(S), see the documentation section Hodges-Lehmann Estimation (paragraph 5) and the section Simple Linear Rank Tests for Two-Sample Data (last paragraph). The reference is Randles and Wolfe (1979).

bongo · Posted 06-30-2021 05:11 AM

It helps, thank you very much! I managed to get the identical value after adding the ties when calculating the sum of ranks in the macro.

Ready to join fellow brilliant minds for the SAS Hackathon?