BookmarkSubscribeRSS Feed
ATomczyk
Fluorite | Level 6

Dear Community,

We are reaching out to you to get some more understanding on the PROC UNIVARIATE Signed Rank S statistic and associated Pr >=|S| (p-value) as described here: PROC UNIVARIATE: Performing a Sign Test Using Paired Data (sas.com)

ATomczyk_0-1718012347230.png

 

When we have a sample size >20, we match on the S statistic, but can’t get a match on the p-value result (we are using R for comparison which uses Normal distribution approximation).

 

Could you clarify what algorithm is followed for calculating p values?

  • Is the normal/t-Student/Monte Carlo distribution applied?
  • Is a continuity correction added?
  • Which approach is followed? We have noticed p values are not based on Bauer or Hodges-Lehman algorithms

 

5 REPLIES 5
Rick_SAS
SAS Super FREQ

What is the purpose of your study? 

 

The statistic itself is sometimes reported differently in different software. This has been discussed before. There are several different statistics that can be used for the signed rank test.  See "On the computation of the Wilcoxon signed rank statistic"

 

Regarding the p-values and continuity corrections, there is a modification to the test statistic due to Pratt, which will affect the p-values. For a discussion of that and other issues, see "Modifications of the Wilcoxon signed rank test and exact p-values."

 

Both articles contain references.

ATomczyk
Fluorite | Level 6

Hi @Rick_SAS ,

 

Massive thanks for coming back to us. We really appreciate you lookin into that and referring us to the articles.

This study is part of the CAMIS project (https://psiaims.github.io/CAMIS/). We compare statistical methods across software and describe similarities/differences and reasons for them.

 

With Wilcoxon signed-rank test, I am aware of the difference in the test statistic ( S - Signed Rank) vs. a common T+., and that is actually an equivalent.

Nevertheless I struggle replicating the SAS calculations of p value (0.0093 in SAS, versus 0.0095 in StatXact or R). 

The considered dataset has 240 observations (so >20), no ties and no "0" differences.

 

Could you please specify what method is used in SAS to calculate the p value?

 

 

ballardw
Super User

One thing I always ask when some one says "values don't match" is "what is the other value that didn't match?" If the SAS result for something is 0.5278 and a different program reports 0.528 then I would say they are likely "the same but rounded differently" . If you export that SAS output to a data set I am almost certain that the value will show a different value and the 0.5278 has been rounded to 4 decimal places as that is what the format width of 6 will show: 6 characters, one for the typical leading 0, the decimal itself and 4 more characters for decimal places. Or if the value is small you get the <.0001 for 6 characters.

 

Without data and actual code submitted we can't be sure if any SAS options may have been applied that might affect the calculations. And of course we  have no idea what you submitted in R.

ATomczyk
Fluorite | Level 6

Hi @ballardw ,

 

Thank you for looking into that. 

As have mentioned in my above response, the study is part of CAMIS project, where we look into differences between statistical methods in software. 

 

Thank you for mentioning the rounding! We can see that as well that it is a case for many "not matching" results. I've ruled that out already as that is a difference between 0.0093 vs . 0.009535. 

 

That is 240 observations dataset, no ties, no "0" differences. 

In this case SAS should apply the non-exact method as an only option (if I understood the specification correctly).

 

Do you know what distribution is used for the test statistic? Is it with or without continuity correction?

 

I am happy to share more details, dataset and the matching calculations in R and StatXact.

However enough and much appreciated support from the SAS Community would be sharing the details of the function calculating the p value.

 

Thanks for your help again. It is so nice to see people getting involved in studies like that!

Mike_N
SAS Employee

The method for calculating the p-value is at the bottom of this page: SAS Help Center: Tests for Location. For non-exact p-values, it looks like proc univariate uses a t distribution with a complicated transformation of S. The linked page also gives references for this method.   

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1028 views
  • 10 likes
  • 4 in conversation