Statistical Procedures

ATomczyk · Posted 06-10-2024 05:41 AM

Dear Community,

We are reaching out to you to get some more understanding on the PROC UNIVARIATE Signed Rank S statistic and associated Pr >=|S| (p-value) as described here: PROC UNIVARIATE: Performing a Sign Test Using Paired Data (sas.com)

When we have a sample size >20, we match on the S statistic, but can’t get a match on the p-value result (we are using R for comparison which uses Normal distribution approximation).

Could you clarify what algorithm is followed for calculating p values?

Is the normal/t-Student/Monte Carlo distribution applied?
Is a continuity correction added?
Which approach is followed? We have noticed p values are not based on Bauer or Hodges-Lehman algorithms

Rick_SAS · Posted 06-10-2024 06:27 AM

What is the purpose of your study?

The statistic itself is sometimes reported differently in different software. This has been discussed before. There are several different statistics that can be used for the signed rank test. See "On the computation of the Wilcoxon signed rank statistic"

Regarding the p-values and continuity corrections, there is a modification to the test statistic due to Pratt, which will affect the p-values. For a discussion of that and other issues, see "Modifications of the Wilcoxon signed rank test and exact p-values."

Both articles contain references.

ATomczyk · Posted 07-17-2024 08:35 AM

Hi @Rick_SAS ,

Massive thanks for coming back to us. We really appreciate you lookin into that and referring us to the articles.

This study is part of the CAMIS project (https://psiaims.github.io/CAMIS/). We compare statistical methods across software and describe similarities/differences and reasons for them.

With Wilcoxon signed-rank test, I am aware of the difference in the test statistic ( S - Signed Rank) vs. a common T+., and that is actually an equivalent.

Nevertheless I struggle replicating the SAS calculations of p value (0.0093 in SAS, versus 0.0095 in StatXact or R).

The considered dataset has 240 observations (so >20), no ties and no "0" differences.

Could you please specify what method is used in SAS to calculate the p value?

ballardw · Posted 06-10-2024 09:58 AM

One thing I always ask when some one says "values don't match" is "what is the other value that didn't match?" If the SAS result for something is 0.5278 and a different program reports 0.528 then I would say they are likely "the same but rounded differently" . If you export that SAS output to a data set I am almost certain that the value will show a different value and the 0.5278 has been rounded to 4 decimal places as that is what the format width of 6 will show: 6 characters, one for the typical leading 0, the decimal itself and 4 more characters for decimal places. Or if the value is small you get the <.0001 for 6 characters.

Without data and actual code submitted we can't be sure if any SAS options may have been applied that might affect the calculations. And of course we have no idea what you submitted in R.

ATomczyk · Posted 07-17-2024 08:50 AM

Hi @ballardw ,

Thank you for looking into that.

As have mentioned in my above response, the study is part of CAMIS project, where we look into differences between statistical methods in software.

Thank you for mentioning the rounding! We can see that as well that it is a case for many "not matching" results. I've ruled that out already as that is a difference between 0.0093 vs . 0.009535.

That is 240 observations dataset, no ties, no "0" differences.

In this case SAS should apply the non-exact method as an only option (if I understood the specification correctly).

Do you know what distribution is used for the test statistic? Is it with or without continuity correction?

I am happy to share more details, dataset and the matching calculations in R and StatXact.

However enough and much appreciated support from the SAS Community would be sharing the details of the function calculating the p value.

Thanks for your help again. It is so nice to see people getting involved in studies like that!

Mike_N · Posted 07-17-2024 09:34 AM

The method for calculating the p-value is at the bottom of this page: SAS Help Center: Tests for Location. For non-exact p-values, it looks like proc univariate uses a t distribution with a complicated transformation of S. The linked page also gives references for this method.

Statistical Procedures

PROC UNIVARIATE Signed Rank S statistic and associated Pr >=|S| (p-value)

Re: PROC UNIVARIATE Signed Rank S statistic and associated Pr >=|S| (p-value)

Re: PROC UNIVARIATE Signed Rank S statistic and associated Pr >=|S| (p-value)

Re: PROC UNIVARIATE Signed Rank S statistic and associated Pr >=|S| (p-value)

Re: PROC UNIVARIATE Signed Rank S statistic and associated Pr >=|S| (p-value)

Re: PROC UNIVARIATE Signed Rank S statistic and associated Pr >=|S| (p-value)

Follow Us

What is...

Statistical Procedures

PROC UNIVARIATE Signed Rank S statistic and associated Pr >=|S| (p-value)

Re: PROC UNIVARIATE Signed Rank S statistic and associated Pr >=|S| (p-value)

Re: PROC UNIVARIATE Signed Rank S statistic and associated Pr >=|S| (p-value)

Re: PROC UNIVARIATE Signed Rank S statistic and associated Pr >=|S| (p-value)

Re: PROC UNIVARIATE Signed Rank S statistic and associated Pr >=|S| (p-value)

Re: PROC UNIVARIATE Signed Rank S statistic and associated Pr >=|S| (p-value)

Our biggest data and AI event of the year.

Follow Us

What is...