BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nlpurumi
Obsidian | Level 7

proc glimmix data=a1.prediss1 plots=(studentpanel boxplot(fixed student));
class phase_info1 Phoneme_stress subject;
model ResidualError_Sum= phase_info1/solution /*dist=gamma*/;
output out=a1.prediss1 pred=pred resid=r;  ---> I wonder if this one line would give me a residual variable, which I can test on the normality assumption later.
random intercept phase_info1 / subject=subject;
random _residual_ / group=phase_info1;
run;

 

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

I have to say I am a little concerned with this code:

proc glimmix data=a1.prediss1 plots=(studentpanel boxplot(fixed student));
class phase_info1 Phoneme_stress subject;
model GMP_log= phase_info1/solution dist=gamma;
output out=a1.prediss1 pred=pred resid=r;
random intercept phase_info1 / subject=subject;
random _residual_ / group=phase_info1;
run;

If you did this earlier you may be overwriting your input data set with sometimes unexpected results.

 

 

None of your code references the a1.prediss1 data set that follows proc glimmix. So it looks like your test for normality gets the same result because you tested a different data set than the Glimmix output.

View solution in original post

15 REPLIES 15
Reeza
Super User

1. What happens when you run it?

2. The Docs seem to indicate it's correct, do you have a reason you're questioning it?

 

http://documentation.sas.com/?docsetId=statug&docsetVersion=14.3&docsetTarget=statug_glimmix_syntax1...

nlpurumi
Obsidian | Level 7

Yes. I generated residual variable after proc mixed and changed the model to proc glimmix with gamma distribution.

I expected the resultant residual should be different, but the normality test results were the same.

I questioned whether the code was correct.

How about the codes afterwards to test the normality assumption, do you see any errors?

Thanks in advance.

 

proc glimmix data=a1.prediss1 plots=(studentpanel boxplot(fixed student));
class phase_info1 Phoneme_stress subject;
model GMP_log= phase_info1/solution dist=gamma;
output out=a1.prediss1 pred=pred resid=r;
random intercept phase_info1 / subject=subject;
random _residual_ / group=phase_info1;
run;
data R1; set R;
absr=ABS(RESID);
run;
proc univariate data=R1 normal;
BY phase_info1;
var Resid;
histogram / normal
ctext = blue;
run;
proc sort data=R1;
by phase_info1 Phoneme_stress subject;
run;
ods graphics on;
proc univariate data=R1 normal;
var GMP_log;
histogram / normal
ctext = blue;
run;

ballardw
Super User

I have to say I am a little concerned with this code:

proc glimmix data=a1.prediss1 plots=(studentpanel boxplot(fixed student));
class phase_info1 Phoneme_stress subject;
model GMP_log= phase_info1/solution dist=gamma;
output out=a1.prediss1 pred=pred resid=r;
random intercept phase_info1 / subject=subject;
random _residual_ / group=phase_info1;
run;

If you did this earlier you may be overwriting your input data set with sometimes unexpected results.

 

 

None of your code references the a1.prediss1 data set that follows proc glimmix. So it looks like your test for normality gets the same result because you tested a different data set than the Glimmix output.

nlpurumi
Obsidian | Level 7

Thanks for pointing this out.

I did not realize what I did wrong.

Then, if I change the code to this form, will I be able to use the residual output from proc glimmix?

 

Thanks in advance.

 

proc glimmix data=a1.prediss1 plots=(studentpanel boxplot(fixed student));
class phase_info1 Phoneme_stress subject;
model GMP_log= phase_info1/solution dist=gamma;
output out=R pred=pred resid=r;
random intercept phase_info1 / subject=subject;
random _residual_ / group=phase_info1;
run;

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

I recommend that you devote some effort to learning more about generalized linear models and then generalized linear mixed models. 

 

Why did you switch from a normal distribution assumption to a gamma distribution assumption?

 

I presume that "GMP_log" is a log transformation of GMP. The gamma distribution uses a log link. Is it appropriate to essentially "double up" with a log transformation?

 

If you specify a distribution other than normal (for example, gamma), why would you choose to test normality of residuals from a generalized linear (mixed) model?

 

What is the purpose of 

random _residual_ / group=phase_info1;

and how does that work with a gamma distribution (compared to a normal distribution) and how does it mesh with your desire to assess the distribution of the pooled residuals?

 

I think that you are trying to fit a complicated model without an adequate understanding of what you are doing, always a treacherous undertaking. As I've suggested to you previously, you'll benefit from taking the time to learn about the statistical methodology before you dive into coding. Granted, it does take time and effort, but it is time and effort well spent.

 

I hope this helps.

 

 

 

 

 

nlpurumi
Obsidian | Level 7

I see why you ask all these questions.

I will study GLM and GLMM before I answer your questions. Thanks for your concern and advice!

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

I'll look forward to hearing from you!

nlpurumi
Obsidian | Level 7

Here are my answers:

 

Why did you switch from a normal distribution assumption to a gamma distribution assumption?

 

 

I presume that "GMP_log" is a log transformation of GMP. The gamma distribution uses a log link. Is it appropriate to essentially "double up" with a log transformation?

 

I first did square-root and log tranform for the dependent variable in order to see if the data meets normal distribution assumption.

The double up occurred after I found none of these transformed data met the normality assumption.

I specified the distribution as gamma in the proc glimmix while I still maintained the transformed DV.

I was trying to make the model converge with transformed DV because while the model converged well with gaussian distribution, it did not easily converge with the gamma distribution.

 

 

If you specify a distribution other than normal (for example, gamma), why would you choose to test normality of residuals from a generalized linear (mixed) model?

 

I tested normality assumption first and went to choose gamma distribution later. But, I realize that you pointed out why I tested the normality assumption again after I decided to go with gamma distribution, and I think that is a good point. I was listening to one statistician's advice, and I think he was wrong to advise me to test it one more time. Does this answer your question?

 

What is the purpose of 

random _residual_ / group=phase_info1;

and how does that work with a gamma distribution (compared to a normal distribution) and how does it mesh with your desire to assess the distribution of the pooled residuals?

 

I've learned that in proc glimmix, people replace repeated statement to 'random _residual_' statement.

However, I did not use above statement to make this adjustment, but because my data did no meet the homogeneous of variance assumption. So for me, although "phase_info1" really is a repeated variable, this statement was nothing to do with the repeated condition, but to make adjustments for heterogeneity of variance problem.

Also, as far as I remember, the statement I used 'random _residual_' statement slided by the condition variable was something I learned from you in another previous thread of mine at sas.com as a remedy for heteroscadasticity problem.

As I write here, I think maybe I will need to use above residual value to test homogeneity of variance one more time after I make random _residual_ statement to check whether this statement really ensures this assumption. What do you think?

Is there a better remedy than this statement to accommodate for heteroscadasticity problem?

 

If you have any further questions, please let me know.

 

If I ask one question about GLM, I am learning that GLM allows to have more distribution options than normal distribution.

If I decide to test data in GLMM and choose proc glimmix command, can I just ignore whether data meets normal distribution assumption or not?

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

@nlpurumi wrote:

Why did you switch from a normal distribution assumption to a gamma distribution assumption?

 

I presume that "GMP_log" is a log transformation of GMP. The gamma distribution uses a log link. Is it appropriate to essentially "double up" with a log transformation?

 

I first did square-root and log tranform for the dependent variable in order to see if the data meets normal distribution assumption.

The double up occurred after I found none of these transformed data met the normality assumption.

I specified the distribution as gamma in the proc glimmix while I still maintained the transformed DV.

I was trying to make the model converge with transformed DV because while the model converged well with gaussian distribution, it did not easily converge with the gamma distribution.

 

 

You really would benefit from learning more about the theoretical foundations of generalized linear models (GLM). I cannot do justice to all the details here.

 

For a GLM, we assume that the data conditional on the predictors follow the specified (e.g., gamma) distribution. That means that the residuals are not expected to follow a normal distribution. So any assessments that you make about residuals from a GLM (that does not assume a normal distribution) and the normal distribution are invalid as well as unnecessary.

 


 I tested normality assumption first and went to choose gamma distribution later. But, I realize that you pointed out why I tested the normality assumption again after I decided to go with gamma distribution, and I think that is a good point. I was listening to one statistician's advice, and I think he was wrong to advise me to test it one more time. Does this answer your question?

 

If you use a gamma distribution in a GLM, you do not expect the residuals to follow a normal distribution.

 

 


What is the purpose of 

random _residual_ / group=phase_info1;

and how does that work with a gamma distribution (compared to a normal distribution) and how does it mesh with your desire to assess the distribution of the pooled residuals?

 

I've learned that in proc glimmix, people replace repeated statement to 'random _residual_' statement.

However, I did not use above statement to make this adjustment, but because my data did no meet the homogeneous of variance assumption. So for me, although "phase_info1" really is a repeated variable, this statement was nothing to do with the repeated condition, but to make adjustments for heterogeneity of variance problem.

Also, as far as I remember, the statement I used 'random _residual_' statement slided by the condition variable was something I learned from you in another previous thread of mine at sas.com as a remedy for heteroscadasticity problem.

As I write here, I think maybe I will need to use above residual value to test homogeneity of variance one more time after I make random _residual_ statement to check whether this statement really ensures this assumption. What do you think?

Is there a better remedy than this statement to accommodate for heteroscadasticity problem?

 

GLMs with non-normal distributions are different from GLMs (or LMMs) with normal distributions in that for a normal distribution, the mean and the variance are independent but for GLMs, the variance is a function of the mean. Importantly, this means that we do not need to (and actually cannot) obtain a estimate of variance separate from the mean (this concept/constraint is why I encourage you to study GLMs before you start to use them). Consequently, for the gamma distribution, the variance increases as the mean increases by definition, and the concept of heterogeneity is more complicated. If you look at residuals on the link scale, you may find that there is no evidence of heterogeneity of variance. Also consequently, although there are residuals, there is no such thing as "residual variance". So specifying "random _residual_" is unnecessary, either with or without "group=".

 


If I ask one question about GLM, I am learning that GLM allows to have more distribution options than normal distribution.

If I decide to test data in GLMM and choose proc glimmix command, can I just ignore whether data meets normal distribution assumption or not?


 

Yes and no. In a GLMM, the random effects are assumed to be normally distributed, but the residuals are not.

 

You clearly are brave enough to tackle these complicated models. Good for you! Take the time to learn with you are dealing with.

 

I hope this helps.

 

nlpurumi
Obsidian | Level 7

Thanks for your advice and all of your explanations.

Could you recommend me any book to learn more about GLM and GLMM?

 

Here are my responses:

1. If I understood your note correctly, as long as I assume GLM that has non-normal distribution, I do not need to assess normal distribution. Also, GLMM is an extension of GLM, if I go with GLMM with gamma distribution, I do not need to worry about normal distribution. 

 

In my data set, most of the dependent variables violated normality assumption and I chose gamma distribution for GLMM.

So, if I chose GLMM with gamma distribution, I would not need to worry further about normality assumption.

 

I got this message clearly, but after reading your additional comment about normality test on random effects, I became puzzled again. You mentioned that we do not need to test normality assumption on residuals, but on random effects.

How do we test normality assumption on the random effects?

I assume there will be several variables included in the random effects in GLMM depending on the model I am testing.

I wonder how you compose your SAS code in order to reflect these changes. Or should I dig into more to learn what it really means by random effects?

 

2. I got the point that I did not need to make adjustment in SAS code assuming this one line 'random _residual_' would take care of the heteroscadasticity problem. That is actually good because the model did not converge when I chose 'gamma' distribution and when I left this one line active. I had to make either one of them deactivated, in order to obtain the results.

I also understand that variance of GLM becomes complicated because it is dependent on mean.

I also understand that residual tends to be homogeneous in GLM and does not requires homogeneity of variance test (although I did not completely understand what the link scale is.)

Does it mean any sort of homogeneity of variance test is unnecessary for GLM or GLMM?

If it is still required, on what variable do we test it?

 

I now know maybe what I did for homoscedasticitiy assumption test on residual might have been unnecessary procedure. After this testing, I had only one dependent variable out of many variables violated this assumption. I still don't know a good way to remedy for this violation in the further analysis. Should I just ignore this violation and run GLMM like usual and say this result is not trustworthy because one of the assumptions was violated?

 

Thanks for your reply in advance.

 

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

When you have a better understanding of LMM (general linear mixed models), GLM (generalized linear models) and GLMM (generalized linear mixed models), your questions will largely, if not entirely, be resolved.

 

There are many websites that address GLMs; use your favorite search engine. An Introduction to Generalized Linear Models, Third Edition (Chapman & Hall/CRC Texts in Statistical... by Annette Dobson is a good resource.

 

SAS® for Mixed Models, Second Edition is a good SAS-centric resource for LMMs.

 

Putting GLM and LMM together to form GLMM, Generalized Linear Mixed Models: Modern Concepts, Methods and Applications is a good, SAS-centric resource. But it is dense and not an easy read; you first need a good understanding of GLMs and LMMs. There are several good websites that discuss GLMMs, so a web search is useful.

 

nlpurumi
Obsidian | Level 7

Thanks for sharing good resources to learn those models.

 

Now, I am considering whether proving any violation of linearity assumption in the variance would be a sufficient assumption test for variance aspect of the GLMM with non-normal distribution.

 

Thanks for your advice in advance.

 

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

The author of this vignette for the DHARMa package in R lays out the challenges of assessing fit in GL(M)Ms and presents his version of an approach that you might find useful or at least interesting:

 

https://cran.r-project.org/web/packages/DHARMa/vignettes/DHARMa.html

 

 

nlpurumi
Obsidian | Level 7

Thanks for sharing good resources.

When I read this article, it says they did some overdispersion correction to account for heteroscedasticity which takes care of different levels of dispersion in the model on page 23/45.

I wonder if this is possible in the following SAS code. It appears that the article is not talking about SAS codes.

 

proc glimmix data=dissertation1 plots=(studentpanel boxplot(fixed student));
class subjectid group slc ;
model Hamm_Par_syl_ver4= group|slc/solution dist=gamma;
output out=preddata pred=pred resid=r;
random intercept /*slc*/ / subject=subjectid(group) ;
/*random _residual_ / group=slc;*/
lsmeans group*slc / plot=meanplot(sliceby=group join cl);
slice group*slc / sliceby=group diff alpha=.0083;
slice group*slc / sliceby=slc diff alpha=.0083;
lsmeans slc/diff alpha=.0167;
lsmeans group/diff alpha=.025;
lsmestimate group*slc "Group effect: SLC 1 v 2" 1 -1 0 -1 1 0,
"Group effect: SLC 1 v 3" 1 0 -1 -1 0 1,
"Group effect: SLC 2 v 3" 0 1 -1 0-1 1
/ adjust=simulate(seed=29847);
run;

 

Thanks for your advice in advance.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 15 replies
  • 3511 views
  • 4 likes
  • 4 in conversation