Statistical Procedures

chepchep · Posted 03-05-2021 08:21 AM

Hello,

I am trying to model a continuous outcome variable which is highly skewed. I have several predictor variables in the model both continuous and categorical. The q-q plot of the residuals is shown below. As you can see, the normality assumption is clearly violated. I tried log transforming the outcome variable but it doesn't seem to fix the problem. Any body has an idea of how to remedy this issue ? Does the central limit theorem apply here?

Thanks.

Here is the code used:

proc glmselect data=b;
class a b c d e / param=reference;
model y=a b c d e f ;
output out=check r=residuals;
run;

proc univariate data=check;
var residuals;
histogram residuals / normal kernel;
qqplot residuals / normal(mu=est sigma=est);
run;

Rick_SAS · Posted 03-05-2021 01:31 PM

Personally, I would work on developing a better model. Use regression diagnostic plots to analyze whether you should include second-order interaction terms in the model. Since you are using PROC GLMSELECT, you can add in all second-order terms and use variable selection to see if any interactions improve the fit enough to make it into the final model.

View solution in original post

StatDave · Posted 03-05-2021 09:19 AM

Positively valued and skewed responses are often modeled using the gamma or inverse gaussian distribution as are available with the DIST= option in PROC GENMOD.

chepchep · Posted 03-05-2021 01:22 PM

Thanks for your input. So how do I know which of the two to use, can either one of them work?

StatDave · Posted 03-05-2021 01:59 PM

You can use PROC SEVERITY in SAS/ETS to assess the fit of several distributions, including gamma and inverse gaussian and others. For example:

proc severity data=b crit=aicc;
   loss y;
   dist _predefined_;
run;

chepchep · Posted 03-08-2021 10:47 AM

Thank you so much for your input. So I did use the proc severity to select the the distribution that best fits my data and the Burr distribution was selected. This is a distribution that I am not very familiar with. How do I fit a Burr distribution in SAS?

Here is the partial output from proc severity:

Distribution Converged AICC SelectedBurrExpGammaIgaussLognParetoGpdWeibull

Yes	361869	Yes
Yes	382973	No
Yes	371532	No
Yes	368401	No
Yes	365528	No
Yes	382977	No
Yes	382975	No
Yes	377067	No

StatDave · Posted 03-08-2021 11:00 AM

I suggest you look at the plots (CDF/EDF and PDF) to visually assess how close the other distributions are to the EDF of the observed data. It's not so much a matter of picking the one with the lowest AICC as it is rejecting distributions that clearly don't fit well and picking one that does fit reasonably well.

chepchep · Posted 03-08-2021 11:34 AM

Thank you so much!

PGStats · Posted 03-05-2021 01:01 PM

What does the histogram of the residuals look like? Is there more than one mode? This would signal that you are missing some important effect, or some important interaction(s).

PG

chepchep · Posted 03-05-2021 01:20 PM

This is how the histogram looks like:

Rick_SAS · Posted 03-05-2021 01:31 PM

Personally, I would work on developing a better model. Use regression diagnostic plots to analyze whether you should include second-order interaction terms in the model. Since you are using PROC GLMSELECT, you can add in all second-order terms and use variable selection to see if any interactions improve the fit enough to make it into the final model.

Statistical Procedures

Analyzing positively skewed continuous outcome variable

Re: Analyzing positively skewed continuous outcome variable

Re: Analyzing positively skewed continuous outcome variable

Re: Analyzing positively skewed continuous outcome variable

Re: Analyzing positively skewed continuous outcome variable

Re: Analyzing positively skewed continuous outcome variable

Re: Analyzing positively skewed continuous outcome variable

Re: Analyzing positively skewed continuous outcome variable

Re: Analyzing positively skewed continuous outcome variable

Re: Analyzing positively skewed continuous outcome variable

Re: Analyzing positively skewed continuous outcome variable

Follow Us

What is...

Statistical Procedures

Our biggest data and AI event of the year.

Follow Us

What is...