Hi,
I’m running some hypothesis tests on my data to compare the median length of stay (LOS) between 2 groups (treatment and control, categorical variable=treatment (y/n)), using the following code:
Proc NPAR1WAY data=LOS median wilcoxon;
class treatment;
var LOS;
run;
However, I also want to test for differences in median LOS between the patients who were treated/untreated and then either survived or died (mortality variable), but I can’t figure out how to do this…
I know that only one class variable is allowed when using the NPAR1WAY proc, and I get this error when I try to run the following code:
Proc NPAR1WAY data=LOS median wilcoxon;
class treatment mortality;
var LOS;
run;
But is there another proc I can use to compare median LOS classified by both treatment and mortality variables?
I include some example data for context.
Appreciate all support and happy to clarify anything that may be unclear.
data example_data;
input ID treatment mortality los;
datalines;
1 1 1 96
2 1 1 71
3 1 1 92
4 0 0 99
5 1 0 41
6 0 0 37
7 1 0 17
8 1 1 65
9 0 1 7
10 1 0 12
;
run;
Thank you @Ksharp for your reply.
I had tried this but was unsure if it was correct to combine these variables!..
When I create the new concatenated variable and re-run the analysis, am I correct that it's the p value from the "Median One-Way Analysis" output that I read? (In my above example, p=0.0658, so there is no difference between the median LOS among the treated/untreated who died/survived).
Thank you again
Yeah. I think you are right.
But median test of PROC NPAR1WAY is not very powerful, Maybe @StatDave know a better PROC .
If you are willing to assume a distribution for your LOS response, then you can probably get a more powerful test. For a non-negative response like length of stay, a distribution like gamma or inverse gaussian might be reasonable. With the data you show and using the combined predictors, the following finds a strongly significant effect - but you have to be comfortable with the distributional assumption. The LSMEANS statement gives multiple comparisons among the groups. The Mean column gives the estimates on the mean (original) scale.
proc genmod;
class new_class;
model los=new_class / dist=gamma link=log type3;
lsmeans new_class / ilink diff;
run;
Thank you very much - that is really interesting to be aware of these different (more powerful) tests!
Hi @StatDave ,
I'm wondering since the output from PROC GENMOD is means, is it correct to use this proc (instead of NPAR1WAY) to compare the median LOS? - and would I still use PROC NPAR1WAY to compute medians for my data when I have ≥2 class variables, and then use PROC GENMOD to test for any difference? Is this correct to do?
Using the ilink option when the link is a log should result in a location estimate that approximates the median (geometric mean) rather than the expected value, but again it becomes a matter of distributional assumptions because it is not the same for gamma, inverse gaussian or log normal.
If you really, really want to look at differences in medians, you might consider bootstrapping as an approach.
SteveDenham
GENMOD allows you to estimate the mean of whichever distribution is specified, not the median. The LSMEANS statement I showed provides estimates of the gamma mean at each level of the predictor. If you want to estimate the median, or other quantile, that is what quantile regression is for, which PROC QUANTREG can do.
If you have the LOS and other variables, I wonder if survival analysis isn't an option as well, but with no censoring may be equivalent to genmod.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.