Solved: Re: Proc GLIMMIX and CV

mthorne · Posted 09-01-2021 01:01 PM

Does PROC GLIMMIX produce a coefficient of variance and R square that is equivalent to what is produced in PROC GLM?

SteveDenham · Posted 09-22-2021 02:11 PM

@mthorne - what to do with zeroes? Well, the beta distribution can't be fit, as it is only supported on the open interval (0,1). Binomial should be fine, as it is defined on the closed interval [0,1]. Everything depends on the data generating process - if you are counting things or counting events out of trials, you should explore Poisson/negative binomial in the first case and binomial in the second. If you are measuring something, normal and lognormal are logical starting points. If you are measuring a ratio of continuous variables, beta is good. If you are modeling waiting times, a gamma distribution seems appropriate.

/* SOAPBOX ON

To be honest I use GLM for two things - multivariate analysis for a designed experiment and examining heterogeneity of variance. I have been using PROC MIXED for over 30 years now, so it seems second nature to me. I want the REML estimates rather than the OLS estimates. I want to infer to larger inference spaces than inferring to repeating the identical experiment on identical components - down that path lies the issue of replicability, as the experiments and components are never truly identical.

SOAPBOX OFF */

SteveDenham

View solution in original post

PaigeMiller · Posted 09-01-2021 01:13 PM

I suppose the answer depends on what you mean by "equivalent".

Long thread on this topic: Solved: Proc Mixed - R-Squared - SAS Support Communities

--
Paige Miller

mthorne · Posted 09-01-2021 01:18 PM

Thanks, I will check out this thread! I find the older GLM analysis very useful and informative, but the newer GLIMMIX output, while more powerful at times, is not that informative.

SteveDenham · Posted 09-02-2021 07:44 AM

it is as informative as GLM. The issue is that statisticians don't agree on measures of goodness of fit or effect size for linear mixed models, let along generalized linear mixed models. Applying the methods used in GLM for either of these is anti-informative (IMO).

SteveDenham

mthorne · Posted 09-02-2021 12:08 PM

@SteveDenham Thanks, Steve! It would be helpful if there were ways to understand how these newer models work. It was pretty easy to grasp ANOVA because we all had to work the math out by hand at some point in time, but I'm not sure that is possible with GLIMMIX or MIXED. Also, it would be helpful to understand how different data affect the models and effect which options to use, especially with experimental data where there can be control treatments with large numbers and effective treatments that have low numbers, and treatments in between. I am not yet comfortable reading a goodness of fit output and really knowing what it means, and I think this is a problem for a lot of us doing applied agricultural research and trying to make inferences based on our data. Any thoughts would be appreciated!

SteveDenham · Posted 09-02-2021 01:05 PM

@mthorne , if you don't have a copy of SAS for Mixed Models, (any edition), get one. There are a lot of ag examples in there. The third edition with Walt Stroup as the lead author will help you think about generalized linear mixed models - something we should get used to using. In the Preface to Generalized Linear Mixed Models: Modern Concepts, Methods and Applications (which is actually a text book by Walt), the fourth paragraph begins:

"I have a colleague whose mantra is "Never knowingly teach something that you will have to unteach later." Much of a GLMM (generalized linear mixed model) course consists of unlearning dysfunctional habit of mind accumulated while learning the "y = Xβ + e mindset." "

It turns out that many of the answers I give in this community are my attempt at helping people unlearn those habits.

Ag experiments, whether with crops or animals, have a great advantage over the observational studies found in medicine and business. You set things up in a way that is easily modeled, and once you decide on the statistical question and the inference space you want to use, everything falls into place. At least until you have missing data...

SteveDenham

mthorne · Posted 09-15-2021 02:36 PM

Thank you @SteveDenham for your comments. I do have both the texts you suggest but haven't yet been able to go through SAS for Mixed Models. The other text is helpful is some regards and not yet in others. There is a lot of material that maybe only a statistics professor would fully understand, but I do try to reach out for help.

I keep asking "what would my data need to look like for these models to be appropriate" and "what would need to be different for the model to work." One issue that is a perennial problem for me is that effective treatments often have no variance among replications, i.e., all zeros or 100s, and this will often cause the lsmeans to have no estimate or make no sense, even if it is just one treatment out of 12. All zeros can be an indication of a very successful treatment, but it will blow up the analysis and leave me unable to determine differences between treatments. I have used the negative binomial and Poisson distributions in the GLIMMIX model, and sometimes they just don't work. The old methods, while maybe less robust or correct, will always run and produce an output that seems reasonable. For those of us trying to use the newer methods, this is a real issue. Unlearning the old would be more effective if the new was more understandable, I think.

Thanks again,

Mark

SteveDenham · Posted 09-22-2021 02:11 PM

@mthorne - what to do with zeroes? Well, the beta distribution can't be fit, as it is only supported on the open interval (0,1). Binomial should be fine, as it is defined on the closed interval [0,1]. Everything depends on the data generating process - if you are counting things or counting events out of trials, you should explore Poisson/negative binomial in the first case and binomial in the second. If you are measuring something, normal and lognormal are logical starting points. If you are measuring a ratio of continuous variables, beta is good. If you are modeling waiting times, a gamma distribution seems appropriate.

/* SOAPBOX ON

To be honest I use GLM for two things - multivariate analysis for a designed experiment and examining heterogeneity of variance. I have been using PROC MIXED for over 30 years now, so it seems second nature to me. I want the REML estimates rather than the OLS estimates. I want to infer to larger inference spaces than inferring to repeating the identical experiment on identical components - down that path lies the issue of replicability, as the experiments and components are never truly identical.

SOAPBOX OFF */

SteveDenham

Registration is open