- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello all,
I have been trying to figure out how to test if two independent samples' medians are statistically significantly different from each other and Google directed me to the following link where the NPAR1WAY procedure was used: https://support.sas.com/documentation/cdl/en/statug/68162/HTML/default/viewer.htm#statug_npar1way_ex...
Below are my data and the SAS code. My issue is about how to interpret the results. First of all there are two tables that reports two different statistics. The first one is the "Wilcoxon Two-Sample Test" and Pr < Z is 0.20 (i.e. not statistically significantly different). The second test is "Median Two-Sample Test" and Pr < Z is 0.175 (i.e. statistically significantly different). First, I don't know what these two tests measure and second why they have contradicting results (I'm guessing they measure different things and thus results are not contradicting but I don't know what).
I would really appreciate your advice and sorry for the lengthy data!
data los;
input Treatment $ Length_of_stay Freq;
datalines;
No 1 475
No 2 304
No 3 364
No 4 364
No 5 334
No 6 320
No 7 277
No 8 256
No 9 208
No 10 192
No 11 173
No 12 152
No 13 114
No 14 123
No 15 95
No 16 96
No 17 94
No 18 81
No 19 69
No 20 75
No 21 50
No 22 60
No 23 53
No 24 49
No 25 45
No 26 40
No 27 46
No 28 37
No 29 36
No 30 31
No 31 33
No 32 25
No 33 26
No 34 21
No 35 33
No 36 23
No 37 21
No 38 27
No 39 16
No 40 31
No 41 24
No 42 27
No 43 16
No 44 17
No 45 13
No 46 11
No 47 20
No 48 15
No 49 13
No 50 16
No 51 8
No 52 12
No 53 17
No 54 10
No 55 6
No 56 9
No 57 11
No 58 8
No 59 10
No 60 8
No 61 7
No 62 10
No 63 8
No 64 11
No 65 6
No 66 4
No 67 9
No 68 6
No 69 4
No 70 10
No 71 5
No 72 1
No 73 7
No 74 4
No 75 4
No 76 4
No 77 6
No 78 9
No 79 2
No 80 2
No 81 2
No 82 4
No 83 2
No 84 3
No 85 4
No 86 3
No 87 3
No 88 2
No 89 3
No 90 3
No 91 3
No 92 2
No 93 3
No 94 1
No 95 2
No 96 2
No 97 1
No 98 2
No 102 3
No 104 1
No 108 2
No 113 2
No 114 1
No 115 3
No 117 1
No 119 2
No 120 1
No 121 2
No 122 1
No 123 2
No 124 1
No 125 2
No 126 2
No 128 2
No 130 2
No 133 1
No 135 1
No 141 1
No 143 1
No 144 2
No 145 1
No 148 1
No 149 1
No 151 1
No 152 1
No 153 1
No 162 1
No 168 1
No 170 1
No 176 2
No 177 1
No 180 1
No 181 1
No 187 1
No 192 1
No 194 1
No 201 1
No 212 1
No 215 1
No 227 1
No 260 1
No 261 1
No 313 1
No 326 1
No 370 1
No 383 1
No 387 1
No 395 1
No 402 1
No 409 1
No 431 1
Yes 1 617
Yes 2 471
Yes 3 489
Yes 4 460
Yes 5 447
Yes 6 398
Yes 7 393
Yes 8 289
Yes 9 290
Yes 10 280
Yes 11 269
Yes 12 204
Yes 13 209
Yes 14 138
Yes 15 142
Yes 16 148
Yes 17 135
Yes 18 129
Yes 19 113
Yes 20 98
Yes 21 91
Yes 22 80
Yes 23 70
Yes 24 85
Yes 25 66
Yes 26 59
Yes 27 57
Yes 28 54
Yes 29 56
Yes 30 36
Yes 31 45
Yes 32 46
Yes 33 52
Yes 34 49
Yes 35 32
Yes 36 31
Yes 37 41
Yes 38 30
Yes 39 24
Yes 40 35
Yes 41 20
Yes 42 14
Yes 43 24
Yes 44 20
Yes 45 30
Yes 46 18
Yes 47 14
Yes 48 24
Yes 49 22
Yes 50 21
Yes 51 21
Yes 52 9
Yes 53 17
Yes 54 18
Yes 55 8
Yes 56 13
Yes 57 12
Yes 58 15
Yes 59 14
Yes 60 12
Yes 61 13
Yes 62 9
Yes 63 7
Yes 64 4
Yes 65 10
Yes 66 12
Yes 67 11
Yes 68 7
Yes 69 13
Yes 70 7
Yes 71 10
Yes 72 3
Yes 73 6
Yes 74 11
Yes 75 8
Yes 76 2
Yes 77 1
Yes 78 3
Yes 79 2
Yes 80 5
Yes 81 1
Yes 82 2
Yes 83 2
Yes 84 6
Yes 85 3
Yes 86 2
Yes 87 4
Yes 88 3
Yes 89 2
Yes 90 6
Yes 91 2
Yes 92 2
Yes 93 4
Yes 94 2
Yes 95 1
Yes 97 1
Yes 98 2
Yes 99 1
Yes 100 3
Yes 102 3
Yes 104 1
Yes 105 1
Yes 106 3
Yes 108 1
Yes 109 3
Yes 111 2
Yes 114 3
Yes 115 4
Yes 116 2
Yes 117 2
Yes 119 1
Yes 121 5
Yes 122 1
Yes 123 4
Yes 124 2
Yes 125 1
Yes 127 1
Yes 128 1
Yes 131 1
Yes 132 1
Yes 133 1
Yes 137 1
Yes 140 1
Yes 141 1
Yes 144 1
Yes 145 1
Yes 146 1
Yes 148 1
Yes 149 1
Yes 151 1
Yes 153 2
Yes 158 1
Yes 160 1
Yes 165 1
Yes 166 1
Yes 168 1
Yes 173 2
Yes 175 1
Yes 176 1
Yes 182 1
Yes 184 1
Yes 185 1
Yes 187 1
Yes 195 2
Yes 199 1
Yes 220 1
Yes 224 1
Yes 228 2
Yes 232 2
Yes 242 1
Yes 247 1
Yes 300 1
Yes 316 1
Yes 322 1
Yes 336 1
Yes 340 1
Yes 375 1
Yes 576 1;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
A possible element of explanation for the sensitivity of the median test might be illustrated by the cumulative distribution function obtained with:
proc univariate data=los;
class treatment;
var Length_of_stay;
freq freq;
histogram;
cdfplot / Weibull(theta=est) statref=median overlay;
ods output CDFplot=losCDF;
run;
title "Empirical distribution function for Length_of_stay";
proc sgplot data=losCDF;
where CDFx > 0;
series x=ECDFx y=ECDFy / group=class1;
refline 50 / axis=y label="Median" labelloc=inside;
xaxis type=log;
run;
note that the greatest difference between the two curves is situated just about at the median.
This is also found with :
proc npar1way data=los edf plots=none;
class treatment;
freq freq;
var Length_of_stay;
run;
i.e. the median happens to be the value where the two distributions differ the most.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It will help a lot in answering your question if you show the actual NPAR1WAY code that you used.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Oh man! I completely missed it! Here is the data and the NPAR1WAY code:
data los;
input Treatment $ Length_of_stay Freq;
datalines;
No 1 475
No 2 304
No 3 364
No 4 364
No 5 334
No 6 320
No 7 277
No 8 256
No 9 208
No 10 192
No 11 173
No 12 152
No 13 114
No 14 123
No 15 95
No 16 96
No 17 94
No 18 81
No 19 69
No 20 75
No 21 50
No 22 60
No 23 53
No 24 49
No 25 45
No 26 40
No 27 46
No 28 37
No 29 36
No 30 31
No 31 33
No 32 25
No 33 26
No 34 21
No 35 33
No 36 23
No 37 21
No 38 27
No 39 16
No 40 31
No 41 24
No 42 27
No 43 16
No 44 17
No 45 13
No 46 11
No 47 20
No 48 15
No 49 13
No 50 16
No 51 8
No 52 12
No 53 17
No 54 10
No 55 6
No 56 9
No 57 11
No 58 8
No 59 10
No 60 8
No 61 7
No 62 10
No 63 8
No 64 11
No 65 6
No 66 4
No 67 9
No 68 6
No 69 4
No 70 10
No 71 5
No 72 1
No 73 7
No 74 4
No 75 4
No 76 4
No 77 6
No 78 9
No 79 2
No 80 2
No 81 2
No 82 4
No 83 2
No 84 3
No 85 4
No 86 3
No 87 3
No 88 2
No 89 3
No 90 3
No 91 3
No 92 2
No 93 3
No 94 1
No 95 2
No 96 2
No 97 1
No 98 2
No 102 3
No 104 1
No 108 2
No 113 2
No 114 1
No 115 3
No 117 1
No 119 2
No 120 1
No 121 2
No 122 1
No 123 2
No 124 1
No 125 2
No 126 2
No 128 2
No 130 2
No 133 1
No 135 1
No 141 1
No 143 1
No 144 2
No 145 1
No 148 1
No 149 1
No 151 1
No 152 1
No 153 1
No 162 1
No 168 1
No 170 1
No 176 2
No 177 1
No 180 1
No 181 1
No 187 1
No 192 1
No 194 1
No 201 1
No 212 1
No 215 1
No 227 1
No 260 1
No 261 1
No 313 1
No 326 1
No 370 1
No 383 1
No 387 1
No 395 1
No 402 1
No 409 1
No 431 1
Yes 1 617
Yes 2 471
Yes 3 489
Yes 4 460
Yes 5 447
Yes 6 398
Yes 7 393
Yes 8 289
Yes 9 290
Yes 10 280
Yes 11 269
Yes 12 204
Yes 13 209
Yes 14 138
Yes 15 142
Yes 16 148
Yes 17 135
Yes 18 129
Yes 19 113
Yes 20 98
Yes 21 91
Yes 22 80
Yes 23 70
Yes 24 85
Yes 25 66
Yes 26 59
Yes 27 57
Yes 28 54
Yes 29 56
Yes 30 36
Yes 31 45
Yes 32 46
Yes 33 52
Yes 34 49
Yes 35 32
Yes 36 31
Yes 37 41
Yes 38 30
Yes 39 24
Yes 40 35
Yes 41 20
Yes 42 14
Yes 43 24
Yes 44 20
Yes 45 30
Yes 46 18
Yes 47 14
Yes 48 24
Yes 49 22
Yes 50 21
Yes 51 21
Yes 52 9
Yes 53 17
Yes 54 18
Yes 55 8
Yes 56 13
Yes 57 12
Yes 58 15
Yes 59 14
Yes 60 12
Yes 61 13
Yes 62 9
Yes 63 7
Yes 64 4
Yes 65 10
Yes 66 12
Yes 67 11
Yes 68 7
Yes 69 13
Yes 70 7
Yes 71 10
Yes 72 3
Yes 73 6
Yes 74 11
Yes 75 8
Yes 76 2
Yes 77 1
Yes 78 3
Yes 79 2
Yes 80 5
Yes 81 1
Yes 82 2
Yes 83 2
Yes 84 6
Yes 85 3
Yes 86 2
Yes 87 4
Yes 88 3
Yes 89 2
Yes 90 6
Yes 91 2
Yes 92 2
Yes 93 4
Yes 94 2
Yes 95 1
Yes 97 1
Yes 98 2
Yes 99 1
Yes 100 3
Yes 102 3
Yes 104 1
Yes 105 1
Yes 106 3
Yes 108 1
Yes 109 3
Yes 111 2
Yes 114 3
Yes 115 4
Yes 116 2
Yes 117 2
Yes 119 1
Yes 121 5
Yes 122 1
Yes 123 4
Yes 124 2
Yes 125 1
Yes 127 1
Yes 128 1
Yes 131 1
Yes 132 1
Yes 133 1
Yes 137 1
Yes 140 1
Yes 141 1
Yes 144 1
Yes 145 1
Yes 146 1
Yes 148 1
Yes 149 1
Yes 151 1
Yes 153 2
Yes 158 1
Yes 160 1
Yes 165 1
Yes 166 1
Yes 168 1
Yes 173 2
Yes 175 1
Yes 176 1
Yes 182 1
Yes 184 1
Yes 185 1
Yes 187 1
Yes 195 2
Yes 199 1
Yes 220 1
Yes 224 1
Yes 228 2
Yes 232 2
Yes 242 1
Yes 247 1
Yes 300 1
Yes 316 1
Yes 322 1
Yes 336 1
Yes 340 1
Yes 375 1
Yes 576 1
;
ods graphics on;
proc npar1way data=los wilcoxon median
plots=(wilcoxonboxplot medianplot);
class treatment;
var length_of_stay;
freq freq;
run;
ods graphics off;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
My NPAR1WAY results using your data do not show any contradiction between the two tests:
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Interesting! Below are my results:
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, I misinterpreted your data. Ignore my message above.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
My stats match the ones posted by @Recep.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You must not be using the same data. Please show the sas log generated by the proc npar1way.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I believe @PGStats said to disregard his results. The results I posted should be the correct ones.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The result is different , that is to say your data is not stable or you have many outliers .
Use this check:
proc univariate data=have;
histogram var;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Always take what's on Wikipedia with a grain of salt, but in this case, it makes a key point. From the Median test page (bold emphasis added by me):
It is crucial to note, however, that the null hypothesis verified by the Wilcoxon–Mann–Whitney U (and so the Kruskal–Wallis test) is not about medians. The test is sensitive also to differences in scale parameters and symmetry. As a consequence, if the Wilcoxon–Mann–Whitney U test rejects the null hypothesis, one cannot say that the rejection was caused only by the shift in medians. It is easy to prove by simulations, where samples with equal medians, yet different scales and shapes, lead the Wilcoxon–Mann–Whitney U test to fail completely.
I am pretty sure that something like this is what is going on with these data - the shapes are different enough that the p values from the two tests don't align well. It is even more interesting (at least to me) that the Median test comes out as "significant" while the Wilcoxon does not.
SteveDenham
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
All, please ignore this post, as it is incorrect. Steve
Another way to approach this:
Suppose the length of stay is Weibull distributed. You could use PROC RELIABILITY (if you have SAS/QC licensed) to get the fitted medians for each group, and look at the confidence bounds for those. This code would do that (code edited, to what was used to generate the results below):
proc reliability data=los; class treatment; distribution weibull; model length_of_stay = treatment ; probplot length_of_stay = treatment;freq freq; /* this was in the code I used, as it is necessary to get the correct values */ run;
The probability plots support the use of the Weibull distribution.
Results regarding the medians:
|
|
|
|
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@SteveDenham wrote:
Suppose the length of stay is Weibull distributed. You could use PROC RELIABILITY (if you have SAS/QC licensed) to get the fitted medians for each group, and look at the confidence bounds for those.
This is an interesting approach. I'm curious to see the results after taking the frequencies into account, I guess with the FREQ statement
freq freq;
but I can't test it because SAS/QC is not included in my SAS license.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I believe that PROC RELIABILITY is more reliable than my intuition, but I would have expected fitted medians around 8 and 9 (i.e., the empirical medians) after introducing the FREQ statement, not values in the 70s as before.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@SteveDenham , I'm not very familiar with PROC RELIABILITY but @FreelanceReinh is right about the medians. The PROC MEANS below shows that they are 8 and 9 for "No" and "Yes", respectively.
proc sort data=los;by treatment;
proc means data=los median;
by treatment;
var length_of_stay;
weight freq;
run;
Hospital length of stay is knows for its notorious right skewed distribution. For that reason we are not comparing means between two groups but the medians. I wonder if the distribution plays a role when comparing medians because I was under the impression that quantiles are taking into account when comparing medians.