BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
pink_poodle
Barite | Level 11

This is a follow-up question for my nonlinear regression post:

https://communities.sas.com/t5/Statistical-Procedures/nonlinear-regression/m-p/679512#M32691

@SteveDenham and anyone else with helpful suggestions, 

I am doing a non-linear regression with GENMOD. The outcome is continuous and distribution looks normal-ish, but Shapiro-Wilk test says that it is not normal. What would be the next step? Should I do some kind of transformation and then fit normal distribution? The other distributions of GENMOD are listed here, but normal seems like the best option:

https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_genmod_syntax22.htm&docsetVersion...

Many thanks!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

I think the first endpoint is well covered.

 

On to questions about the second:

Why specify dist=normal?  Well, GENMOD is for fitting generalized linear models and the normal/Gaussian distribution is one of the family that can be fit.  It is the default.  The analysis would be similar to GLM, except that the solutions are solved by maximum likelihood methods rather than ordinary least squares.

 

You say the second is not normal.  Recall that the assumption of normality is only required for the residuals, and not for the variable itself.  So follow @Rick_SAS 's advice and look at the diagnostic plots.  If it turns out that the residuals are NOT normally distributed (and please don't do a test, use your judgment on the results of the diagnostic plots), then you can consider other distributions.  Those distributions may depend on the functional form you are trying to fit, as some are not defined at zero (for instance, a beta or a gamma distribution).  And that brings me to the main point on the second variable. Could you share the function you wish to fit - you know y=f(X), where we need to know the f(X).  If it is fact non-linear (involves exponentiation, logs, trig functions or is a rational polynomial rather than a straight polynomial), then you have two options: Use PROC NLIN/NLMIXED or use an EFFECT statement in GENMOD to fit a spline.  A plot of the response variable as a function of the independent variable will be very useful for this decision.

 

SteveDenham

View solution in original post

4 REPLIES 4
Rick_SAS
SAS Super FREQ

The variables in a linear regression do not need to be normally distributed. If you are interested in inferential statistics such as confidence intervals or hypothesis tests, you can check the RESIDUALS for normality. 

 

So my advice is to look at the diagnostic plots (see the same link) and the Fit Statistics table to assess how well the model fits the data.

pink_poodle
Barite | Level 11

I have two types of continuous outcome variables, and my intention was to fit a nonlinear regression on each of them. One outcome variable was Poisson-distributed, so I used GENMOD on it for Poisson regression. It is not a linear regression, its logarithm is linear. For it, the outcome variable y distribution has to be Poisson, and DIST= option in the model statement is poisson, and the link function is log.
Another of my outcome variables is really continuous, with fractions, not normally distributed. What is the meaning of DIST= normal option from GENMOD model statement? If I use it, does the distribution of the outcome have to be normal? I see that the link function when I run the GENMOD with DIST=normal is identity. Is it applying a simple linear model (then why give it info about distribution)? My intention is to fit a nonlinear model for a continuous normal-ish distributed outcome variable.

SteveDenham
Jade | Level 19

I think the first endpoint is well covered.

 

On to questions about the second:

Why specify dist=normal?  Well, GENMOD is for fitting generalized linear models and the normal/Gaussian distribution is one of the family that can be fit.  It is the default.  The analysis would be similar to GLM, except that the solutions are solved by maximum likelihood methods rather than ordinary least squares.

 

You say the second is not normal.  Recall that the assumption of normality is only required for the residuals, and not for the variable itself.  So follow @Rick_SAS 's advice and look at the diagnostic plots.  If it turns out that the residuals are NOT normally distributed (and please don't do a test, use your judgment on the results of the diagnostic plots), then you can consider other distributions.  Those distributions may depend on the functional form you are trying to fit, as some are not defined at zero (for instance, a beta or a gamma distribution).  And that brings me to the main point on the second variable. Could you share the function you wish to fit - you know y=f(X), where we need to know the f(X).  If it is fact non-linear (involves exponentiation, logs, trig functions or is a rational polynomial rather than a straight polynomial), then you have two options: Use PROC NLIN/NLMIXED or use an EFFECT statement in GENMOD to fit a spline.  A plot of the response variable as a function of the independent variable will be very useful for this decision.

 

SteveDenham

pink_poodle
Barite | Level 11
Thank you very much for helpful suggestions!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 608 views
  • 2 likes
  • 3 in conversation