Re: What consititutes a non-normal distribution of residuals?

Piers_C · Posted 10-14-2016 03:37 AM

Thanks for the advice, I will give this a go. A colleague of mine also suggested subsampling the large distribution of residuals, and testing the smaller subsamples for normality. Perhaps if subsamples also failed the KS and/or other tests this might be a stronger indication of an undiagnosed problem?

Rick_SAS · Posted 10-14-2016 09:11 AM

How are you conducting this analysis? GLM? GLIMMIX? Suppose you determine that the errors are slightly heavy-tailed? How will that change the way you conduct the analysis?

Are you just "verifying assumptions" or is there a real problem that you are trying to resolve?

Piers_C · Posted 10-14-2016 10:13 AM

I am using proc mixed.

My model code is:

title 'TOTAL Hi frequency HI v LO FULL MLM';

proc mixed data=mlm_hi covtest;

class sub sess fbin tbin;

model t_diff=tbin|fbin sess / solution outpredm = pred_hi;

repeated / subject=sub(sess) type=sp(gau) (tbin fbin);

lsmeans tbin*fbin;

run;

Rather than trying to solve a known problem, I am really trying to check that I have not violated any assumptions, hence chencking the residuals. And also your suggestion of plotting the predicted versus the residual outputs, which defninitely do not have a fan structure. I will also try the QQplots.

Rick_SAS · Posted 10-14-2016 09:25 AM

Regarding your friend's suggestion to subsample, I think that would not be helpful. If the subsamples are size 100,000, all subsamples will reject normality. If the subsamples are size 5, every subsample will accept normality. For some value in between (500?) you might get 50% of samples reject and 50% accept.

Look at the normal Q-Q plot, which will graphially indicate whether the data are approximately normal:

proc univariate normal;

QQPLOT x / normal;

run;