BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kellychan84
Fluorite | Level 6
proc glimmix data=DF_digestibility; 
  class treatment block;
  model fIDFdigestibility = treatment/dist=lognormal; 
  random block; 
  lsmeans treatment/tdiff lines;
  ods output lsmeans=third;
  covtest "block=0" 0 .;
  output out=second predicted=pred residual=resid residual(noblup)=mresid student=sresid student(noblup)=smresid;
  title "1-Fecal Insoluble Dietary Fiber Digestibility (%) ANOVA Results";
run;
data BackTransformedData; 
  set third;
  btlsmean = exp(estimate) ;
  btse_mean = exp(estimate)*stderr;
run;
proc print data=second;
run;
proc sgplot data=second;
  scatter y=smresid x=pred;
  refline 0;
run;
proc sgplot data=second;
  scatter y=smresid x=treatment;
  refline 0;
run;
proc sgplot data=second;
  vbox smresid/group=treatment datalabel;
run;
proc sgscatter data=second;
  plot sresid*(pred treatment block);
run;
proc univariate data=second normal plot;
  var sresid;
  histogram sresid / normal kernel;
run;

kellychan84_0-1629486415751.png

 

 
1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

What you learned in class is appropriate for linear models, but not for mixed models.  This code that you ran:

 

proc glimmix data=DF_digestibility;
class treatment block;
model fIDFdigestibility = treatment;
random block; 
random _residual_/group=treatment; 
lsmeans treatment/tdiff lines;
covtest homogeneity; 
output out=second predicted=pred residual=resid residual(noblup)=mresid student=sresid student(noblup)=smresid;
title "1-Fecal Insoluble Dietary Fiber Digestibility (%) ANOVA Results";
run;

is spot-on.  You have accounted for the lack of homogeneity in the two groups with the random _residual_/group=treatment.  You don't need to transform if heterogeneity was your only concern.  But there is something else happening I think.  Your response is a percent, based on the ratio of two continuous variables and is contained in the open interval from 0 to 1.  I would recommend a fit with a beta distribution, such as this:

 

proc glimmix data=DF_digestibility;
class treatment block;
model fIDFdigestibility = treatment/dist=beta;
random block; 
random _residual_/group=treatment; 
lsmeans treatment/diff lines ilink;
covtest homogeneity; 
output out=second predicted=pred residual=resid residual(noblup)=mresid student=sresid student(noblup)=smresid;
title "1-Fecal Insoluble Dietary Fiber Digestibility (%) ANOVA Results";
run;

That should take care of your concerns (at least based on the shared box plot).  The default link for a beta distribution is logistic (so I would suppose that the transformation to choose is logistic).  However, it makes more sense to talk about the distribution in this case.

 

I wouldn't be surprised if the COVTEST was not significant for your data, assuming a beta distribution..

 

SteveDenham

 

 

View solution in original post

15 REPLIES 15
PaigeMiller
Diamond | Level 26

The variable fIDFdigestibility does not have to be normally distributed in order to be modeled properly by PROC GLIMMIX. If you want to use a normal distribution when fitting the model, only the RESIDUALS have to be normally distributed. Naturally, if you are fitting the model based upon some other (non-normal) distribution, then you need to have fIDFdigestibility follow that distribution. It's really not clear if you intend to fit the model assuming a normal distribution.

--
Paige Miller
kellychan84
Fluorite | Level 6

@PaigeMiller  thank you for your reply. You are right! It is the RESIDUALS that have to be normally distributed. I think the normality result from SAS I attached is for the RESIDUALS.  

 

Let me explain my question in another way. I want to transform my data here because the RESIDUALS are showing unequal variance between two treatment groups when I run the homogeneity test (code below). 

 

proc glimmix data=DF_digestibility;
class treatment block;
model fIDFdigestibility = treatment;
random block; 
random _residual_/group=treatment; 
lsmeans treatment/tdiff lines;
covtest homogeneity; 
output out=second predicted=pred residual=resid residual(noblup)=mresid student=sresid student(noblup)=smresid;
title "1-Fecal Insoluble Dietary Fiber Digestibility (%) ANOVA Results";
run;

 

kellychan84_0-1629488931520.png

 

jiltao
SAS Super FREQ

The COVTEST result tells you that the variances are different among the treatment groups. So your RANDOM _RESIDUAL_ / GROUP=TREATMENT; statement is necessary, because it fits an unequal variance model to your data. You do not need to do data transformations. Constant variance is never one of the model assumptions in PROC MIXED / PROC GLIMMIX. You just model it, which is what your RANDOM _RESIDUAL_ statement does.

 

Hope this helps,

Jill

kellychan84
Fluorite | Level 6

@jiltao Thank you for your answer. I am still confusing.

If I find out the variances between the two treatment groups are unequal, am I supposed to transform them before I compare them by F test? The unequal covariances are also indicated in the below plots as data are showing some patterns or skewed to one side.

kellychan84_0-1629491731446.pngkellychan84_1-1629491746997.png

kellychan84_2-1629491807613.png

 

 

 

sbxkoenk
SAS Super FREQ

Hello @kellychan84 ,

 

I cannot say it better than @jiltao 2 posts above (unequal variance model), but maybe you understand it better if @SteveDenham tells you the same (heterogeneous variance model)😉😉

See here:

https://communities.sas.com/t5/Statistical-Procedures/Is-there-a-way-to-test-for-equal-variances-in-...

 

Kind regards,

Koen

SteveDenham
Jade | Level 19

The key to all of this is in @jiltao 's post:

 

Constant variance is never one of the model assumptions in PROC MIXED / PROC GLIMMIX

 

You model the heterogeneity, and stop testing for it, or even really examining it.  The assumption of homogeneous variance is NOT important in mixed model analyses.

 

SteveDenham

kellychan84
Fluorite | Level 6

@SteveDenham Thank you very much for your answer, Steve! I was told in my statistical class that you should always check your residuals (random, independent of treatment and design effects, homogenous) before we look at the results. And if you can see some patterns in your residuals, you should transform your data to a better model before you can compare the treatment effect. I did see some patterns in the residuals in this specific data set. What are your suggestions on this? Thanks!

SteveDenham
Jade | Level 19

What you learned in class is appropriate for linear models, but not for mixed models.  This code that you ran:

 

proc glimmix data=DF_digestibility;
class treatment block;
model fIDFdigestibility = treatment;
random block; 
random _residual_/group=treatment; 
lsmeans treatment/tdiff lines;
covtest homogeneity; 
output out=second predicted=pred residual=resid residual(noblup)=mresid student=sresid student(noblup)=smresid;
title "1-Fecal Insoluble Dietary Fiber Digestibility (%) ANOVA Results";
run;

is spot-on.  You have accounted for the lack of homogeneity in the two groups with the random _residual_/group=treatment.  You don't need to transform if heterogeneity was your only concern.  But there is something else happening I think.  Your response is a percent, based on the ratio of two continuous variables and is contained in the open interval from 0 to 1.  I would recommend a fit with a beta distribution, such as this:

 

proc glimmix data=DF_digestibility;
class treatment block;
model fIDFdigestibility = treatment/dist=beta;
random block; 
random _residual_/group=treatment; 
lsmeans treatment/diff lines ilink;
covtest homogeneity; 
output out=second predicted=pred residual=resid residual(noblup)=mresid student=sresid student(noblup)=smresid;
title "1-Fecal Insoluble Dietary Fiber Digestibility (%) ANOVA Results";
run;

That should take care of your concerns (at least based on the shared box plot).  The default link for a beta distribution is logistic (so I would suppose that the transformation to choose is logistic).  However, it makes more sense to talk about the distribution in this case.

 

I wouldn't be surprised if the COVTEST was not significant for your data, assuming a beta distribution..

 

SteveDenham

 

 

kellychan84
Fluorite | Level 6

Thank you very much @SteveDenham! very clear explanation!

 

Can I consult you with one more question? If the result is showing equal variance, in this case, can I use the following code without " random _residual_ / group =treatment" and " covtest homogeneity ".

 

proc glimmix data=DF_digestibility;
class treatment block;
model fIDFdigestibility = treatment /dist=beta; ;
random block; 
lsmeans treatment/diff lines ilink;
output out=second predicted=pred residual=resid residual(noblup)=mresid student=sresid student(noblup)=smresid;
title "1-Fecal Insoluble Dietary Fiber Digestibility (%) ANOVA Results";
run;

  

SteveDenham
Jade | Level 19

Yes, that is often the direction taken. 

 

Just a note, I would imagine that no substantial differences in the results are noticeable whether those lines are kept in or removed.  My feeling here is to always assume that there is some heterogeneity in variance between groups, and it is worth keeping the code in to get a more empirical estimate of the variability within groups.  On the other hand, dropping that code almost always results in a more stable system of the mixed model equations, eliminating Hessian issues and nonpositive G matrix concerns in many cases.

 

SteveDenham

kellychan84
Fluorite | Level 6

Thank you so much @SteveDenham

 

You are right, adding or removing that coding did not change the results much!

 

By the way, when I run the code with " dist=beta", it turned out an error: "ERROR: Invalid or missing data." I don't know how to fix this. My response is digestibility which is listed as 1-100% instead of 0-1, does it matter?

 

proc glimmix data=DF_digestibility;
class treatment block;
model fIDFdigestibility = treatment/dist=beta;
random block; 
random _residual_/group=treatment; 
lsmeans treatment/diff lines ilink;
covtest homogeneity; 
output out=second predicted=pred residual=resid residual(noblup)=mresid student=sresid student(noblup)=smresid;
title "1-Fecal Insoluble Dietary Fiber Digestibility (%) ANOVA Results";
run;

 

 

sbxkoenk
SAS Super FREQ

Hello @kellychan84 ,

 

Yes, it matters!

Beta regression is used for proportions and therefore the response variable should be between 0 and 1, noninclusive. Values outside this range are not used in the analysis in PROC GLIMMIX.

 

Good luck with your analysis,

Koen

SteveDenham
Jade | Level 19

Following up on @sbxkoenk 's remark - divide all of your response values by 100 to get them out of percentage units and into grams digested/grams consumed units (kg digested/kg consumed for large animals).  You can do this inside PROC GLIMMIX, but I recommend doing it in a DATA step prior to running PROC GLIMMIX.

 

SteveDenham

kellychan84
Fluorite | Level 6

Sounds great, thank you so much for all your help! I really appreciate it!!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 15 replies
  • 1217 views
  • 8 likes
  • 5 in conversation