BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
das
Obsidian | Level 7 das
Obsidian | Level 7

I would appreciate help with how to code the following experiment. I'm using SAS 9.3.

 

The experiment used blood from six donors.

 

The blood from each donor was divided into four culture wells, and each well incubated with one of a combination of two drugs (A and B), each drug either absent (i.e., diluent only; coded 0) or present at a predetermined concentration (coded 1). Only blood from one donor can be run at a given time. 

 

I've approached this treatment phase as the full factorial of two drugs (A|B) but could see doing it as one treatment factor (trt) with four levels (trt0=A0B0, trt1=A1B0, trt2=A0B1, trt3=A1B1). Is there an advantage of one versus the other?

 

The activation state (%activated) of two types of T-lymphocyte (T-type = CD4 or CD8) is determined from each well.

 

The main interest is the effect of the treatment conditions on %activation within each T-cell type (all comparisons). It is of secondary interest to know if the treatment effects differ between the two types of T-cell.

 

Is MIXED or GLIMMIX more appropriate?

Does subject=donor or T-type(donor) or should it be subject=donor and group=T-type?

Should T-type be in the model statement?

Donor is random but what about T-type? Should T-type be in the model statement?

Do I need to have a repeated statement?

 

Here is something that works but I fear is far too simplistic:

proc glimmix data=work.CD4CD8 ;
	class donor T_type A B ;
	model activation = T_type|A|B ;
	random _residual_ / subject=donor group=T_type ;
run;

 

Any advice or direction greatly appreciated.

 

Dave

1 ACCEPTED SOLUTION

Accepted Solutions
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

The design is definitely a mixed model. The MIXED procedure assumes that the response is normally distributed (conditional on the predictors); GLIMMIX allows other distributional assumptions, among them normal, beta and binomial.

 

Generally speaking, but there are always exceptions, a percent (or proportion) response uses either a beta or a binomial distribution, so GLIMMIX is typically more appropriate than MIXED. A proportion that is obtained as a ratio of counts (e.g., number of "successes" out of number of "trials") calls for a binomial distribution; a proportion measured directly calls for a beta distribution, as noted by @plf515

 

If I have a two-way (here, 2x2) factorial, I usually specify the model as A x B rather than a single factor with 4 levels. But the statistical model is the same either way; it's just a different parameterization, and one form may deliver what you want more directly than the other. You get to choose.

 

I would consider the following model AS A STARTING POINT (for a beta distribution; syntax would differ for binomial):

 

 

proc glimmix data= work.cd4cd8 ;
  class donor a b t_type;
  model activation = a | b | t_type / dist=beta;
  random intercept a*b / subject=donor;
  lsmeans a | b | t_type / ilink;
  run;

 

Keep in mind that there are a lot of options that might be better than the default options implied in the code above. Default options are not always the best choice for generalized linear mixed models.

 

 

You may want to start with a normal distribution assumption within GLIMMIX before attempting more challenging models, even though a normal distribution is probably not a valid choice. It's good to get your feet under you with general linear mixed models before you dive into generalized linear mixed models. 

 

I hope this helps.

 

View solution in original post

9 REPLIES 9
plf515
Lapis Lazuli | Level 10

GLIMMIX vs. MIXED - Since the dependent variable is a percentage (and presumably bound by 0 and 100) a linear multilevel model may not be right.  You might try changing the % to a proportion and using a BETA distribution (DIST = BETA on the MODEL statement).

das
Obsidian | Level 7 das
Obsidian | Level 7
Thank you. I wondered about that. Yes, in this case this is bounded, values can only be 0 to 100. I'll see how this affects the modeling but it seems correct regardless.
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

The design is definitely a mixed model. The MIXED procedure assumes that the response is normally distributed (conditional on the predictors); GLIMMIX allows other distributional assumptions, among them normal, beta and binomial.

 

Generally speaking, but there are always exceptions, a percent (or proportion) response uses either a beta or a binomial distribution, so GLIMMIX is typically more appropriate than MIXED. A proportion that is obtained as a ratio of counts (e.g., number of "successes" out of number of "trials") calls for a binomial distribution; a proportion measured directly calls for a beta distribution, as noted by @plf515

 

If I have a two-way (here, 2x2) factorial, I usually specify the model as A x B rather than a single factor with 4 levels. But the statistical model is the same either way; it's just a different parameterization, and one form may deliver what you want more directly than the other. You get to choose.

 

I would consider the following model AS A STARTING POINT (for a beta distribution; syntax would differ for binomial):

 

 

proc glimmix data= work.cd4cd8 ;
  class donor a b t_type;
  model activation = a | b | t_type / dist=beta;
  random intercept a*b / subject=donor;
  lsmeans a | b | t_type / ilink;
  run;

 

Keep in mind that there are a lot of options that might be better than the default options implied in the code above. Default options are not always the best choice for generalized linear mixed models.

 

 

You may want to start with a normal distribution assumption within GLIMMIX before attempting more challenging models, even though a normal distribution is probably not a valid choice. It's good to get your feet under you with general linear mixed models before you dive into generalized linear mixed models. 

 

I hope this helps.

 

das
Obsidian | Level 7 das
Obsidian | Level 7
Thanks for all the description. That really helps me learn. I tried the beta distribution today and that definitely improved the fit stats. But your explanation makes me wonder if binomial is the correct distribution since it is the fraction of a population of cells identified as activated by flow cytometry. I'll try that tomorrow and post some code. Thanks for the reply!
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Some "more friendly" (i.e., less mathematical) introductions into distributional choices:

 

For the beta distribution, A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables

 

For binomial, etc., The arcsine is asinine: the analysis of proportions in ecology.

 

Looking forward to seeing what you find out....

 

das
Obsidian | Level 7 das
Obsidian | Level 7

So this is the solution I finally settled on.

 

proc glimmix data=work.CD4CD8 asycov plots=all;
	class donor T_type A B ;
	model activated = T_type|A|B / dist=binomial DDFM=KENWARDROGER ;
	random _residual_ / subject=T_type(donor) type=un ;
	lsmeans T_type*A*B / ilink slicediff=T_type adjust=tukey  ADJDFE=ROW plots=meanplot(ilink join cl sliceby=T_type) ;
	output out=gmxout pred(blup ilink)=pred;
run;

The output indicates very good fit. I really appreciate the input from this community.

 

 

Dave

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

I'm glad you found the Community to be helpful.

 

Before we drop this topic, I want to follow up because I'm not sure that I am entirely happy with your model.

 

(1) The response variable is activated and the distribution is binomial. Does activated take values of 0 or 1, or is it a proportion? 

 

(2) An important distinction between the normal distribution and distributions available in generalized linear (mixed) model procedures is that the variance is a function of the mean for these non-normal distributions. Consequently, although there are residuals, there is no such thing as residual variance because once the mean is estimated, the variance is also known. And consequently then, I do not think I would use 

 

	random _residual_ / subject=T_type(donor) type=un ;

at least, not without a lot of thought about what it is doing and whether it is valid. 

 

das
Obsidian | Level 7 das
Obsidian | Level 7

The response is a proportion varying between 0 and 1. The fit stat Gen. Chi-Square / DF = 1.00.

 

Dave

 

 

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

If the response is a proportion, then I probably would use the beta distribution rather than the binomial.

 

The binomial distribution theoretically is appropriate either for binary (Bernoulli) data (taking values of 0 or 1) or for responses specified as "number of successes" out of "number of trials" using the syntax

 

MODEL events/trials = <fixed-effects> </ model-options>;

 

where "events" is number_of_activated cells and "trials" is total_number_of_cells.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1810 views
  • 0 likes
  • 3 in conversation