BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
bongo
Fluorite | Level 6

I'm conducting a Hodges-Lehmann Estimation with NPAR1WAY Procedure. My data has contains duplicate values so i tried to fix it and conduct exact test with EXACT statement: I added 0.00000001 to each duplicate. Later, I gave up on the idea of an exact test,  but I noticed that my results (my purpose is CI) for the procedure are differ.

PROC NPAR1WAY data=data1 HL;
CLASS treat;
VAR response;
ODS SELECT HodgesLehmann;
RUN;
PROC NPAR1WAY data=data2 HL; CLASS treat; VAR response; ODS SELECT HodgesLehmann; RUN;

Снимок.JPG

Then I do the same in R, for both sets of data, I get the values (-1.8819 0.0556).

I also used macro for Hodges-Lehmann Estimation which I wrote based on the documentation and also got (-1.8819 0.0556) for both data sets.

Can anybody please explain me why is such a big defference in results with such a small increase?

Is it some nuance in proc npar1way which is triggered if there are no repetitions in the data??

 

1 ACCEPTED SOLUTION

Accepted Solutions
Watts
SAS Employee

Like what's already been mentioned, the PROC NPAR1WAY computation takes tied values into account. The difference between the macro results and the NPAR1WAY results for Hodges-Lehmann confidence limits might be due to the computation of Var_0(S), which is the null-hypothesis variance of the Wilcoxon rank sum statistic. The macro might be using the expression for Var_0(S) that's appropriate when there are no ties, while NPAR1WAY uses the more general expression that accounts for ties.

 

For details about how PROC NPAR1WAY computes Var_0(S), see the documentation section Hodges-Lehmann Estimation (paragraph 5) and the section Simple Linear Rank Tests for Two-Sample Data (last paragraph). The reference is Randles and Wolfe (1979).  

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

Can anybody please explain me why is such a big defference in results with such a small increase?

Since I don't have your data, I would imagine that the reason for the difference is that NPAR1WAY does not rely on linearity (like in a linear regression). Duplicates, or very near duplicates where the difference is 0.00001, increase the sum of the ranks in each category, and so its not a 0.00001 change, its a unit change.

--
Paige Miller
bongo
Fluorite | Level 6

Thank you so much for response!

I don't quite understand, but is the sum of ranks used in the calculation of confidence intervals for H-L estimated shift? I mean, we take ranges themselves and values with ranks. And we need sum of ranks to calculate p-value.  🤔

In addition, the wilcox.test function in R gives the same results for both data sets, which confuses me..

 

I attach my files, maybe this will help.

PaigeMiller
Diamond | Level 26

I can't tell you what R is doing, I don't know if it is doing the same things as PROC NPAR1WAY, and sometimes they are not doing the same things, and so different results are not a problem (if only I knew what R is doing, which I don't). Furthermore, there's no reason to believe software A gives the right answer and software B is wrong, without a lot more evidence.

 

NPAR1WAY uses ranks, and sums them to do statistical tests. So changing a variable's value by 0.000001 looks like a tiny change, but the rank changes by 1. Not a tiny change.

--
Paige Miller
Ksharp
Super User

Calling @StatDave 

ballardw
Super User

When I read the documentation for the HL test in NPAR1WAY exact confidence limits there is a bit about "tied" values use a 1/2, for the function of (y,x) values and 1 where y>x. and 0 where y<x. So by adding that small amount you changed the function result for some of your data pairs by 0.5 per pair.

 

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_npar1way_details19.htm

Start reading at Exact Confidence Limits

Watts
SAS Employee

Like what's already been mentioned, the PROC NPAR1WAY computation takes tied values into account. The difference between the macro results and the NPAR1WAY results for Hodges-Lehmann confidence limits might be due to the computation of Var_0(S), which is the null-hypothesis variance of the Wilcoxon rank sum statistic. The macro might be using the expression for Var_0(S) that's appropriate when there are no ties, while NPAR1WAY uses the more general expression that accounts for ties.

 

For details about how PROC NPAR1WAY computes Var_0(S), see the documentation section Hodges-Lehmann Estimation (paragraph 5) and the section Simple Linear Rank Tests for Two-Sample Data (last paragraph). The reference is Randles and Wolfe (1979).  

bongo
Fluorite | Level 6

It helps, thank you very much! I managed to get the identical value after adding the ties when calculating the sum of ranks in the macro.

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1593 views
  • 8 likes
  • 5 in conversation