Solved: genmod distribution

pink_poodle · Posted 08-27-2020 12:13 PM

This is a follow-up question for my nonlinear regression post:

https://communities.sas.com/t5/Statistical-Procedures/nonlinear-regression/m-p/679512#M32691

@SteveDenham and anyone else with helpful suggestions,

I am doing a non-linear regression with GENMOD. The outcome is continuous and distribution looks normal-ish, but Shapiro-Wilk test says that it is not normal. What would be the next step? Should I do some kind of transformation and then fit normal distribution? The other distributions of GENMOD are listed here, but normal seems like the best option:

https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_genmod_syntax22.htm&docsetVersion...

Many thanks!

SteveDenham · Posted 08-28-2020 09:40 AM

I think the first endpoint is well covered.

On to questions about the second:

Why specify dist=normal? Well, GENMOD is for fitting generalized linear models and the normal/Gaussian distribution is one of the family that can be fit. It is the default. The analysis would be similar to GLM, except that the solutions are solved by maximum likelihood methods rather than ordinary least squares.

You say the second is not normal. Recall that the assumption of normality is only required for the residuals, and not for the variable itself. So follow @Rick_SAS 's advice and look at the diagnostic plots. If it turns out that the residuals are NOT normally distributed (and please don't do a test, use your judgment on the results of the diagnostic plots), then you can consider other distributions. Those distributions may depend on the functional form you are trying to fit, as some are not defined at zero (for instance, a beta or a gamma distribution). And that brings me to the main point on the second variable. Could you share the function you wish to fit - you know y=f(X), where we need to know the f(X). If it is fact non-linear (involves exponentiation, logs, trig functions or is a rational polynomial rather than a straight polynomial), then you have two options: Use PROC NLIN/NLMIXED or use an EFFECT statement in GENMOD to fit a spline. A plot of the response variable as a function of the independent variable will be very useful for this decision.

SteveDenham

View solution in original post

Rick_SAS · Posted 08-27-2020 03:03 PM

The variables in a linear regression do not need to be normally distributed. If you are interested in inferential statistics such as confidence intervals or hypothesis tests, you can check the RESIDUALS for normality.

So my advice is to look at the diagnostic plots (see the same link) and the Fit Statistics table to assess how well the model fits the data.

pink_poodle · Posted 08-27-2020 09:55 PM

I have two types of continuous outcome variables, and my intention was to fit a nonlinear regression on each of them. One outcome variable was Poisson-distributed, so I used GENMOD on it for Poisson regression. It is not a linear regression, its logarithm is linear. For it, the outcome variable y distribution has to be Poisson, and DIST= option in the model statement is poisson, and the link function is log.
Another of my outcome variables is really continuous, with fractions, not normally distributed. What is the meaning of DIST= normal option from GENMOD model statement? If I use it, does the distribution of the outcome have to be normal? I see that the link function when I run the GENMOD with DIST=normal is identity. Is it applying a simple linear model (then why give it info about distribution)? My intention is to fit a nonlinear model for a continuous normal-ish distributed outcome variable.

SteveDenham · Posted 08-28-2020 09:40 AM

I think the first endpoint is well covered.

On to questions about the second:

Why specify dist=normal? Well, GENMOD is for fitting generalized linear models and the normal/Gaussian distribution is one of the family that can be fit. It is the default. The analysis would be similar to GLM, except that the solutions are solved by maximum likelihood methods rather than ordinary least squares.

You say the second is not normal. Recall that the assumption of normality is only required for the residuals, and not for the variable itself. So follow @Rick_SAS 's advice and look at the diagnostic plots. If it turns out that the residuals are NOT normally distributed (and please don't do a test, use your judgment on the results of the diagnostic plots), then you can consider other distributions. Those distributions may depend on the functional form you are trying to fit, as some are not defined at zero (for instance, a beta or a gamma distribution). And that brings me to the main point on the second variable. Could you share the function you wish to fit - you know y=f(X), where we need to know the f(X). If it is fact non-linear (involves exponentiation, logs, trig functions or is a rational polynomial rather than a straight polynomial), then you have two options: Use PROC NLIN/NLMIXED or use an EFFECT statement in GENMOD to fit a spline. A plot of the response variable as a function of the independent variable will be very useful for this decision.

SteveDenham

pink_poodle · Posted 08-29-2020 08:24 PM

Thank you very much for helpful suggestions!

genmod distribution

Re: genmod distribution

Re: genmod distribution

Re: genmod distribution

Re: genmod distribution

Re: genmod distribution

genmod distribution

Re: genmod distribution

Re: genmod distribution

Re: genmod distribution

Re: genmod distribution

Re: genmod distribution

SAS Innovate 2025: Call for Content