I want to use quantile regression to compare the median of the outcome variable between two groups. The median for one group is 72, and for the other group, it is 40. Therefore, the expected median difference should be 72 - 40 = 32. However, the below code estimates a median difference of 21 instead of 32.
Could you please advise on how to resolve this discrepancy?
proc quantreg data=both_MIS ci=resampling;
class trtgroup;
model LOS_hours = trtgroup/ quantile=0.5;
estimate 'Diff in Medians' trtgroup 1 -1 / CL;
run;
Note: when I use this code at the 0.75 quantile level, the results align correctly without any discrepancies. I just have this issues specifically at the 0.5 quantile.
Thanks!
> The median for one group is 72, and for the other group, it is 40. Therefore, the expected median difference should be 72 - 40 = 32.
No, that's not how quantile regression works. You do not necessarily get the same estimates as you would for computing the median for each group separately and then subtracting. For example, the following code uses PROC MEANS to compute the median MPG_City value separately for Origin=Asia and Origin=USA. The results are 20.5 for Origin=Asia and 18 for Origin=USA. Thus, a naive estimate of the difference is 2.5. The program then uses your code to perform a quantile regression. The QUANTREG estimates are 20 and 18 for a difference of 2.0.
data cars;
set sashelp.cars;
if Origin in ('Asia' 'USA');
run;
proc means data=cars median;
class Origin;
var MPG_City;
run;
proc quantreg data=cars ci=resampling;
class Origin;
model MPG_City = Origin / quantile=0.5;
estimate 'Diff in Medians' Origin 1 -1 / CL;
run;
For a discussion of how QUANTREG differs from the naive estimate, see "Quantile regression: Better than connecting the sample quantiles of binned data."
For an additional example of how the QUANTREG estimates can differ from univariate estimates, see "Quantile estimates and the difference of medians in SAS."
> Is it correct to say that sometimes quantile regression gives the same results, but occasionally it might differ?
Yes. And as balardw points out, the estimates also depend on the univariate method used to estimate the quantiles. As with all statistics, there are many ways to form an estimate from the data.
Something else to consider is exactly which "median" you are using. By default Proc Means uses QNTLDEF=5 but there are 4 other values that can result in different values of the median depending on your data.
Run @Rick_SAS's example with other values of QNTLDEF. Three have medians of 20 and 18 and two have 20.5 and 18.
With Proc Univariate the PCTLDEF option does similar different calculations.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.