## Mann-Whitney U test

Using Mann-Whitney U test (Wilcoxon rank sum test), I am comparing two groups to see whether they are statistically different.   Based on almost the  same median and mean values between the two groups, I definitely thought that p-value would be very high.  But P-value was < 0.0001 (attached).  Any ideas why P-value is significant?

P-value from T-test was about 0.15.

sas code:

proc NPAR1WAY data=nis.cdiselect wilcoxon;

class primary;

var age;

run;

17 REPLIES 17

## Re: Mann-Whitney U test

The expected values differ significantly, so review how expected values are calculated.

You have data sets that are imbalanced in size and large so any difference is statistically significant, however that does not imply practical significance..  PGStats
Opal | Level 21

## Re: Mann-Whitney U test

Can you show the code and results for the t-test?

PG

## Re: Mann-Whitney U test

proc ttest data=nis.cdiselect;

class primary;

var age;

weight trendwt;

run;

The result of above is attached.

After adding weight trends statement, P-value was 0.7473.

According to the Central Limit Theorem, I can say that the sample is normally distributed as the sample size is >=30 (the data nis.cdiselect has 3.5 million weighted frequency)?  Therefore, I can use t-test?

Thanks.

## Re: Mann-Whitney U test

For the CLT ,the mean of the sample is normally distributed, not the sample. You can check the sample distribution visually if you like.

What is the hypothesis? If its that the means are the same, then yes you can use the CLT to assume the means are normally distributed and a t-test to test for significant differences.

What happens if you look at the distribution curves, say as histograms?

@docfermi wrote:

proc ttest data=nis.cdiselect;

class primary;

var age;

weight trendwt;

run;

The result of above is attached.

After adding weight trends statement, P-value was 0.7473.

According to the Central Limit Theorem, I can say that the sample is normally distributed as the sample size is >=30 (the data nis.cdiselect has 3.5 million weighted frequency)?  Therefore, I can use t-test?

Thanks.  PGStats
Opal | Level 21

## Re: Mann-Whitney U test

You cannot assign weights to observations in the Wilcoxon rank sum test provided by NPAR1WAY. The weights that you are using might be designed expressly to balance the two samples.

PG

## Re: Mann-Whitney U test

I didn't use weight statement for Mann-Whitney (sas won't run with weight  statement in any case).

## Re: Mann-Whitney U test

@docfermi wrote:

I didn't use weight statement for Mann-Whitney (sas won't run with weight  statement in any case).

Exactly. This means you aren't using the same data in the two tests which makes them inconsistent and you cannot compare the results.

It's a weighted vs unweighted test.

## Re: Mann-Whitney U test

I used unweighted as well (See my initial post).  t-test results showed P-value about 0.15.

## Re: Mann-Whitney U test

As I and others have mentioned, what do the distributions look like?

If you can see a difference or the distributions are markedly different it does offer evidence in a particular direction that wouldn't be seen with a traditional box plot. You're could have something similar to Anscombe's Quartet to some degree.

@docfermi wrote:

I used unweighted as well (See my initial post).  t-test results showed P-value about 0.15.

## Re: Mann-Whitney U test

It is a left-skewed distribution.  Thanks.

## Re: Mann-Whitney U test

How many ties do you have in the ranked values?

From the documentation for NPAR1WAY:

The asymptotic tests might be less accurate when the distribution of the data is heavily tied. For such data, it might be appropriate to use the exact tests provided by PROC NPAR1WAY as described in the section Exact Tests.

Do you see a similar result if you take a random sample of say 10% of the records?

Or have you looked at any graphic representation of the data?

Does this graph imply equal or unequal medians to you:

```proc sgplot data=nis.cdiselect;
vbar age/ group=primary
groupdisplay=cluster
stat=freq
;
run;```

## Re: Mann-Whitney U test

Adding "exact wilcoxon"?  The result was the same.

I don't know how to check how many ties that I have....?

Also I have not learned how to take a random sample of 10% of my records...?

Equal or unequal median means.. whether mean ~ median? Otherwise please explain.  Mean is, median is 72 for both groups.

For your information, I added the results from the sas code that you provided.  Thanks.

## Re: Mann-Whitney U test

@docfermi wrote:

Adding "exact wilcoxon"?  The result was the same.

I don't know how to check how many ties that I have....?

Also I have not learned how to take a random sample of 10% of my records...?

Equal or unequal median means.. whether mean ~ median? Otherwise please explain.  Mean is, median is 72 for both groups.

For your information, I added the results from the sas code that you provided.  Thanks.

If the AGE variable is the likely "age in years" as an integer you can use proc freq to get exact counts. The graph results are showing 1) as many as 18,000 ties within just one of the groups and 2) age rounding to 5 year increments (those spikes). The larger spike at around 90(hard to tell with the overlapping tick labels) is also kind of interesting indicating perhaps some other factor was used for that age, possibly a group of people "at least 90". It indicates the third (blue) or second (red) largest count in the area of the histogram where the surrounding ages are showing a declining count.

The graph indicates to me that there is a difference in medians as the "blue" group seems to have more of its members towards the upper age range than the red group. Notice that around the 20's range the blue is maybe not quite twice as tall as the red but up around the 60's (?) the blue is well over twice as tall. If the medians were to similar the ratio of heights of the bars would more similar across a wide range of the data.

## Re: Mann-Whitney U test

Thank you so much!  That's quite helpful.  Besides normality assumption criteria, two sample t-test to look for difference in the means vs. Mann-Whitney U test for difference in the medians?

Discussion stats
• 17 replies
• 12497 views
• 7 likes
• 4 in conversation