Solved: Re: Mixed modelling, help with correct procedure and code

das · Posted 07-20-2018 08:39 PM

I would appreciate help with how to code the following experiment. I'm using SAS 9.3.

The experiment used blood from six donors.

The blood from each donor was divided into four culture wells, and each well incubated with one of a combination of two drugs (A and B), each drug either absent (i.e., diluent only; coded 0) or present at a predetermined concentration (coded 1). Only blood from one donor can be run at a given time.

I've approached this treatment phase as the full factorial of two drugs (A|B) but could see doing it as one treatment factor (trt) with four levels (trt0=A0B0, trt1=A1B0, trt2=A0B1, trt3=A1B1). Is there an advantage of one versus the other?

The activation state (%activated) of two types of T-lymphocyte (T-type = CD4 or CD8) is determined from each well.

The main interest is the effect of the treatment conditions on %activation within each T-cell type (all comparisons). It is of secondary interest to know if the treatment effects differ between the two types of T-cell.

Is MIXED or GLIMMIX more appropriate?

Does subject=donor or T-type(donor) or should it be subject=donor and group=T-type?

Should T-type be in the model statement?

Donor is random but what about T-type? Should T-type be in the model statement?

Do I need to have a repeated statement?

Here is something that works but I fear is far too simplistic:

proc glimmix data=work.CD4CD8 ;
	class donor T_type A B ;
	model activation = T_type|A|B ;
	random _residual_ / subject=donor group=T_type ;
run;

Any advice or direction greatly appreciated.

Dave

sld · Posted 07-23-2018 10:59 PM

The design is definitely a mixed model. The MIXED procedure assumes that the response is normally distributed (conditional on the predictors); GLIMMIX allows other distributional assumptions, among them normal, beta and binomial.

Generally speaking, but there are always exceptions, a percent (or proportion) response uses either a beta or a binomial distribution, so GLIMMIX is typically more appropriate than MIXED. A proportion that is obtained as a ratio of counts (e.g., number of "successes" out of number of "trials") calls for a binomial distribution; a proportion measured directly calls for a beta distribution, as noted by @plf515.

If I have a two-way (here, 2x2) factorial, I usually specify the model as A x B rather than a single factor with 4 levels. But the statistical model is the same either way; it's just a different parameterization, and one form may deliver what you want more directly than the other. You get to choose.

I would consider the following model AS A STARTING POINT (for a beta distribution; syntax would differ for binomial):

proc glimmix data= work.cd4cd8 ;
  class donor a b t_type;
  model activation = a | b | t_type / dist=beta;
  random intercept a*b / subject=donor;
  lsmeans a | b | t_type / ilink;
  run;

Keep in mind that there are a lot of options that might be better than the default options implied in the code above. Default options are not always the best choice for generalized linear mixed models.

You may want to start with a normal distribution assumption within GLIMMIX before attempting more challenging models, even though a normal distribution is probably not a valid choice. It's good to get your feet under you with general linear mixed models before you dive into generalized linear mixed models.

I hope this helps.

View solution in original post

plf515 · Posted 07-21-2018 09:37 AM

GLIMMIX vs. MIXED - Since the dependent variable is a percentage (and presumably bound by 0 and 100) a linear multilevel model may not be right. You might try changing the % to a proportion and using a BETA distribution (DIST = BETA on the MODEL statement).

das · Posted 07-21-2018 01:30 PM

Thank you. I wondered about that. Yes, in this case this is bounded, values can only be 0 to 100. I'll see how this affects the modeling but it seems correct regardless.

sld · Posted 07-23-2018 10:59 PM

The design is definitely a mixed model. The MIXED procedure assumes that the response is normally distributed (conditional on the predictors); GLIMMIX allows other distributional assumptions, among them normal, beta and binomial.

Generally speaking, but there are always exceptions, a percent (or proportion) response uses either a beta or a binomial distribution, so GLIMMIX is typically more appropriate than MIXED. A proportion that is obtained as a ratio of counts (e.g., number of "successes" out of number of "trials") calls for a binomial distribution; a proportion measured directly calls for a beta distribution, as noted by @plf515.

If I have a two-way (here, 2x2) factorial, I usually specify the model as A x B rather than a single factor with 4 levels. But the statistical model is the same either way; it's just a different parameterization, and one form may deliver what you want more directly than the other. You get to choose.

I would consider the following model AS A STARTING POINT (for a beta distribution; syntax would differ for binomial):

proc glimmix data= work.cd4cd8 ;
  class donor a b t_type;
  model activation = a | b | t_type / dist=beta;
  random intercept a*b / subject=donor;
  lsmeans a | b | t_type / ilink;
  run;

Keep in mind that there are a lot of options that might be better than the default options implied in the code above. Default options are not always the best choice for generalized linear mixed models.

You may want to start with a normal distribution assumption within GLIMMIX before attempting more challenging models, even though a normal distribution is probably not a valid choice. It's good to get your feet under you with general linear mixed models before you dive into generalized linear mixed models.

I hope this helps.

das · Posted 07-23-2018 11:11 PM

Thanks for all the description. That really helps me learn. I tried the beta distribution today and that definitely improved the fit stats. But your explanation makes me wonder if binomial is the correct distribution since it is the fraction of a population of cells identified as activated by flow cytometry. I'll try that tomorrow and post some code. Thanks for the reply!

sld · Posted 07-23-2018 11:27 PM

Some "more friendly" (i.e., less mathematical) introductions into distributional choices:

For the beta distribution, A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables.

For binomial, etc., The arcsine is asinine: the analysis of proportions in ecology.

Looking forward to seeing what you find out....

das · Posted 07-30-2018 07:40 PM

So this is the solution I finally settled on.

proc glimmix data=work.CD4CD8 asycov plots=all;
	class donor T_type A B ;
	model activated = T_type|A|B / dist=binomial DDFM=KENWARDROGER ;
	random _residual_ / subject=T_type(donor) type=un ;
	lsmeans T_type*A*B / ilink slicediff=T_type adjust=tukey  ADJDFE=ROW plots=meanplot(ilink join cl sliceby=T_type) ;
	output out=gmxout pred(blup ilink)=pred;
run;

The output indicates very good fit. I really appreciate the input from this community.

Dave

sld · Posted 07-31-2018 12:29 AM

I'm glad you found the Community to be helpful.

Before we drop this topic, I want to follow up because I'm not sure that I am entirely happy with your model.

(1) The response variable is activated and the distribution is binomial. Does activated take values of 0 or 1, or is it a proportion?

(2) An important distinction between the normal distribution and distributions available in generalized linear (mixed) model procedures is that the variance is a function of the mean for these non-normal distributions. Consequently, although there are residuals, there is no such thing as residual variance because once the mean is estimated, the variance is also known. And consequently then, I do not think I would use

	random _residual_ / subject=T_type(donor) type=un ;

at least, not without a lot of thought about what it is doing and whether it is valid.

das · Posted 07-31-2018 12:43 AM

The response is a proportion varying between 0 and 1. The fit stat Gen. Chi-Square / DF = 1.00.

Dave

sld · Posted 07-31-2018 12:57 AM

If the response is a proportion, then I probably would use the beta distribution rather than the binomial.

The binomial distribution theoretically is appropriate either for binary (Bernoulli) data (taking values of 0 or 1) or for responses specified as "number of successes" out of "number of trials" using the syntax

MODEL events/trials = <fixed-effects> </ model-options>;

where "events" is number_of_activated cells and "trials" is total_number_of_cells.