BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
chepchep
Fluorite | Level 6

Hello,

I am trying to model a continuous outcome variable which is highly skewed. I have several predictor variables in the model both continuous and categorical. The q-q plot of the residuals is shown below. As you can see, the normality assumption is clearly violated. I tried log transforming the outcome variable but it doesn't seem to fix the problem. Any body has an idea of how to remedy this issue ? Does the central limit theorem apply here?

Thanks.

Here is the code used:

proc glmselect data=b;
class a b  c d e / param=reference;
model y=a  b c d e f ;
output out=check r=residuals;
run;

 

proc univariate data=check;
var residuals;
histogram residuals / normal kernel;
qqplot residuals / normal(mu=est sigma=est);
run;

 

 

 

 

 

chepchep_0-1614950220928.png

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

Personally, I would work on developing a better model. Use regression diagnostic plots to analyze whether you should include second-order interaction terms in the model. Since you are using PROC GLMSELECT, you can add in all second-order terms and use variable selection to see if any interactions improve the fit enough to make it into the final model.

View solution in original post

9 REPLIES 9
StatDave
SAS Super FREQ

Positively valued and skewed responses are often modeled using the gamma or inverse gaussian distribution as are available with the DIST= option in PROC GENMOD. 

chepchep
Fluorite | Level 6

Thanks for your input. So how do I know which of the two to use, can either one of them work?

StatDave
SAS Super FREQ

You can use PROC SEVERITY in SAS/ETS to assess the fit of several distributions, including gamma and inverse gaussian and others. For example:

proc severity data=b crit=aicc;
   loss y;
   dist _predefined_;
run;
chepchep
Fluorite | Level 6

Thank you so much for your input. So I did use the proc severity to select the the distribution that best fits my data and the Burr distribution was selected. This is a distribution that I am not very familiar with. How do I fit a Burr distribution in SAS?

Here is the partial output from proc severity:

 

Distribution Converged AICC SelectedBurrExpGammaIgaussLognParetoGpdWeibull
Yes361869Yes
Yes382973No
Yes371532No
Yes368401No
Yes365528No
Yes382977No
Yes382975No
Yes377067No
StatDave
SAS Super FREQ

I suggest you look at the plots (CDF/EDF and PDF) to visually assess how close the other distributions are to the EDF of the observed data. It's not so much a matter of picking the one with the lowest AICC as it is rejecting distributions that clearly don't fit well and picking one that does fit reasonably well.

chepchep
Fluorite | Level 6

Thank you so much!

PGStats
Opal | Level 21

What does the histogram of the residuals look like? Is there more than one mode? This would signal that you are missing some important effect, or some important interaction(s).

PG
chepchep
Fluorite | Level 6

This is how the histogram looks like:

 

chepchep_0-1614968368545.png

 

Rick_SAS
SAS Super FREQ

Personally, I would work on developing a better model. Use regression diagnostic plots to analyze whether you should include second-order interaction terms in the model. Since you are using PROC GLMSELECT, you can add in all second-order terms and use variable selection to see if any interactions improve the fit enough to make it into the final model.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1831 views
  • 5 likes
  • 4 in conversation