I am using PROC GENMOD to run a linear model with many predictors, some categorical. The procedure outputs "Analysis of Maximum Likelihood Parameter Estimates." What does it mean that some Estimates are zero (and their standard errors and confidence limits are also zero)? Also, where do I find R square in GENMOD output?
Many thanks!
The zeroes are the result of the parameterization used in GENMOD. The matrix is non-full rank, so the last level of each categorical variable is set to zero. If you want to test the effect, add Type3 as an option in the model statement. If you want to find the values for each level (which is a linear combination of the estimates) add an LSMEANS statement. In reality, the estimates in the solution vector are deviations from the last level, not the effect itself.
And then comes the matter of R squared. For a generalized linear model, the standard definition found for OLS does not apply for a couple of reasons. First, there are no sums of squares, etc. calculated - the solution is found using maximum likelihood estimation. Thus, you cannot partition the variance using method of moments and get an R squared. Goodness of fit for a generalized model is measured in a lot of ways, but the two most commonly considered are Pearson's chi square divided by degrees of freedom and information criteria (I like corrected AIC, but each has its proponents). The first measures deviance from the fitted model, values greater than 1 indicate over-dispersion, values less than 1 indicate under-dispersion. Information criteria are useful in ranking models of the same data, and can be used to determine how much information loss is associated with the model.
Or you can search the internet for all kinds of pseudo-R squared, and get the formulas, and plug likelihoods or variance components in to get a number. The problem is that the errors are not independent from the means for most of the distributions being fit (normal and lognormal are the exceptions), so that there is relative overweighting of variables with large means (or means near 0.5 in the case of the binomial).
The best ideas I have seen for goodness of fit in generalized models are plots, such as the various types of residuals against the predicted values (X beta).
SteveDenham
The zeroes are the result of the parameterization used in GENMOD. The matrix is non-full rank, so the last level of each categorical variable is set to zero. If you want to test the effect, add Type3 as an option in the model statement. If you want to find the values for each level (which is a linear combination of the estimates) add an LSMEANS statement. In reality, the estimates in the solution vector are deviations from the last level, not the effect itself.
And then comes the matter of R squared. For a generalized linear model, the standard definition found for OLS does not apply for a couple of reasons. First, there are no sums of squares, etc. calculated - the solution is found using maximum likelihood estimation. Thus, you cannot partition the variance using method of moments and get an R squared. Goodness of fit for a generalized model is measured in a lot of ways, but the two most commonly considered are Pearson's chi square divided by degrees of freedom and information criteria (I like corrected AIC, but each has its proponents). The first measures deviance from the fitted model, values greater than 1 indicate over-dispersion, values less than 1 indicate under-dispersion. Information criteria are useful in ranking models of the same data, and can be used to determine how much information loss is associated with the model.
Or you can search the internet for all kinds of pseudo-R squared, and get the formulas, and plug likelihoods or variance components in to get a number. The problem is that the errors are not independent from the means for most of the distributions being fit (normal and lognormal are the exceptions), so that there is relative overweighting of variables with large means (or means near 0.5 in the case of the binomial).
The best ideas I have seen for goodness of fit in generalized models are plots, such as the various types of residuals against the predicted values (X beta).
SteveDenham
I wrote a description of why this happens in PROC GLM, but it also applies to PROC GENMOD.
One R-square measure available for generalized linear models is from the RsquareV macro.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.