Re: Medians from Proc Glimmix

jcn · Posted 03-28-2023 06:58 PM

Back in 2015, you (Steve) responded to a post entitled "Re:Non-normal data; PROC GLIMMIX". AgResearch7 was asking about dist=negbin link=log GLIMMIX model statements on blood serum components. You answered...

I would say some of the endpoints have a normal distribution of errors (remember that is the assumption, NOT that the endpoint itself is normally distributed) The others are probably best fit with a lognormal distribution.

As far as the error structure, a lot depends on the spacing of the measurements in time. AR(1) and ARH(1) assume that the measurements are equally spaced in time. CS and CSH assume that the correlation between close together time points is the same as more separated in time points. However, you only have two time points (Day 0 and Day 26), so CSH and CS would make perfect sense. Fit both, pick the one that yields the smaller corrected AIC value. Try the following:

PROC GLIMMIX data=yourdata;

CLASS TRT DAY ID;

MODEL GLUCOSE = TRT day trt*day/dist=lognormal ddfm=kr2;

Random day /residual subject = id(trt) type =CSH;

LSMEANS TRT day/DIFF ADJUST=TUKEY;

LSmeans trt*day/slicediff=day adjust=tukey adjdfe=row; /* I really dislike Tukey's adjustment for repeated measures data. You might look into adjust=simulate */

ODS OUTPUT lsmeans=lsmeans;

RUN;QUIT;

Now, to get lognormal estimates back onto the original scale you'll have to post-process the lsmeans dataset, and this is where some mathematical statistics enters the picture. Simply exponentiating the estimate will give an estimate of the median value. If you want to get an estimate of the expected value and of the standard error of the expected value, you'll need the following code:

data btlsmeans;

set lsmeans;

omega=exp(stderr*stderr);

btlsmean=exp(estimate)*sqrt(omega);

btvar=exp(2*estimate)*omega*(omega-1);

btsem=sqrt(btvar);

run;

Steve Denham

I ran your bt code on my Glimmix model and wanted to get medians to report. My code is...

Proc GLIMMIX data=AKI_recur_inc2;
class AKI_recur_cat3(ref='No AKI');
model vent_days_n = AKI_recur_cat3 / solution dist=lognormal ddfm=kr2;
format AKI_recur_cat3 recurf.;

estimate 'pairwise 1 v 2' AKI_recur_cat3 1 -1 0 / cl e or;
lsmeans AKI_recur_cat3 /e diff cl;
ODS OUTPUT lsmeans=lsmeans;
run;

DATA btlsmeans; set lsmeans;
omega=exp(stderr*stderr);
btlsmean=exp(estimate)*sqrt(omega);
btvar=exp(2*estimate)*omega*(omega-1);
btsem=sqrt(btvar);
run;

This outcome is 'vent days' but I also ran another outcome 'length of stay'. Using your code, it seems that 'btlsmean' values look appropriate (given the median IQR by category) for 'vent days' yet the 'btvar' values look appropriate for LOS. Since one usually reports Q1 & Q3 with medians, what is appropriate to report from this model? Thank you!!

SteveDenham · Posted 03-29-2023 09:50 AM

Some things to think about: Exponentiating the estimate obtained from GLIMMIX gives you the geometric mean, which is a good approximation of the median on the untransformed scale. I don't think there is a good way to get Q1 and Q3 estimates, but you might try creating a confidence interval on the estimate with an alpha of 0.25, and then exponentiating the endpoints. Better would be to simulate 1000 or so samples from the population defined by the parameters you have found, and look at the quartiles of that distribution.

SteveDenham

jcn · Posted 03-29-2023 10:50 AM

@SteveDenham Thank you so much Steve! What is the difference between btvar and btlsmeans. Why would one look correct for one outcome and the other look correct for the other outcome? Is SE appropriate to report with 'median' instead of IQR since it is an estimate from a model? Would some other code be more appropriate since you know what I am trying to produce? TU!!

SteveDenham · Posted 04-05-2023 11:07 AM

btvar = back-transformed variance

btlsmean = back-transformed expected value

I guess I am having a conceptual issue here. I jumped into modeling this because I had some prior information about the distribution of the variables in question, i.e., that they were log normally distributed. If you believe in the distribution, a distribution-free estimate of location is probably not needed. It just turns out that the geometric mean obtained from a sample from a lognormal distribution is a good approximation to the median.

So, are you going to make a defensible assumption about the distribution of your response variables? if so, then GLIMMIX or GENMOD are appropriate. If you truly want to estimate the median, you should consider PROC QUANTREG. Work through the examples in the documentation..

SteveDenham

Medians from Proc Glimmix