Proc Glimmix and residuals

tammy2 · Posted 03-06-2019 01:56 PM

Hello,

I was hoping to get some assistance with my SAS code and some questions I have regarding interpreting the residual plots.

Description of Data:

I am interested in examining whether two groups (variable=gs) show differences in scores on a questionnaire measure over time (2 time points). I am also interested in examining whether there are differences in the groups as a function of the time to expected disease onset (var=EYO). Importantly, not all participants have data for the two time points, some only have data for the first time point. Additionally, the groups are nested in families.

The scores on the questionnaires can range from 0 to 180. When I plotted the data I found that it is positively skewed and hence, I decided to try a poisson distribution to account for this distribution. When I ran the code below, I saw that the residual vs. linear predictor plots seems to indicate heteroskedasticity (see attached file). At this time I am not sure whether this is something I need to resolve and the steps to be taken to resolve this issue.

I am currently using SAS 9.4

My code is:

Proc glimmix data=lib.cbi_final_long Plots=studentpanel method=laplace;
Where time in (1 2);
Class id time gs (ref="neg") family;
Model cbi_total =time gs EYO
gs*EYO gs*time
/dist=poisson link=log solution;
Random intercept /subject=family type=UN;
Random intercept /subject=id (family) type=UN;
lsmeans gs /ilink;
output out=new1 pred(ilink)=predi stderr(ilink)=sepredi pred=pred stderr=sepred
resid=resid student=student;
Run;

Thank you very much for your assistance.

Best,

Tamara

Haris · Posted 03-09-2019 06:46 PM

The plot you are using seems to be a product of your choice of distribution more than anything else. Poisson would not be my choice for a continuous variable that you have. Have you tried lognormal? If you find that variance is not equal in your two groups, you can add a 'GROUP=GS' option to your RANDOM statement to allow for the variance estimates to be different between the two groups.

Second, you are not modeling repeated measures by time. I would use a random r-side effect 'RANDOM Time / sub=ID residual type=AR(1)'.

Are you sure you need TYPE=UN for your g-side matrices? Most commonly, measurements for different individuals in a study are not correlated and are modeled as TYPE=VC.

tammy2 · Posted 03-11-2019 09:25 AM

Thank you very much for your detailed response and suggestions. I am a novice at SAS and GLMMS so I really appreciate your feedback.

I tried to change the distribution and my current code is below. I have attached a picture of my studentized residuals, which still seem inaccurate. As well, there were several notes in the log file indicating that some observations were not used since I have zeros in my data (the participant's score on the questionnaire can range from 0-180).

Are there other major errors in my code or other suggestions I can try?

Thank you very much,

Tamara

CURRENT CODE:

Proc glimmix data=lib.cbi_final_long Plots=all;
Where time in (1 2);
Class id time gs (ref="neg") family;
Model cbi_total =time gs EYO
gs*EYO gs*time
/dist=lognormal solution;
Random intercept /subject=family type=vc;
Random intercept /subject=id (family) type=vc;
Random Time / subject=id residual type=AR(1);
Run;

NOTES IN LOG FILE:

NOTE: Some observations are not used in the analysis because of: zero or negative response (n=355).
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: Estimated G matrix is not positive definite.
NOTE: A linear combination of covariance parameters is confounded with the residual variance.
NOTE: PROCEDURE GLIMMIX used (Total process time):
real time 33.67 seconds
cpu time 32.75 seconds

Haris · Posted 03-11-2019 02:58 PM

Tamara,

LogNormal distribution cannot accept zero as the log of zero is undefined. You need to add a constant to your outcome and remember that you did that when interpreting the results; e.g. CBI_Total+1. That will retain the zeros.

Your residual distribution does not look dependent on the linear predictor. I suspect that the banding is the property of your outcome not the model. Plot residuals with CBI_Total on the X-axis. What do you see?

Your residual distribution looks quite normal except for a spike on the lower end. Looks like you have an atypically common response or something like that. Take a look at the distribution of your outcome score by Time and see if you can spot what the deviations from normality are. If it is an atypically common single Total score, what is that and why, what does it tell you about the instrument or the population?

tammy2 · Posted 03-12-2019 10:24 AM

Thank you very much for your suggestions and feedback.

The residual plots with the CBI_Total score with the constant added to it looks similar to the previous residual plots; there is the banding on the residual vs. linear predictor plots.

When I plotted the CBI_Total score with the studentized residuals (see attached), it resembles a cubic curve. I am not too sure what time means in terms of my linear predictors?

When I plotted the CBI Total score with time (see attachment), I do see that there are more responses ranging from 0-15 (the questionnaire scores can range from 0-180). I would assume that the majority of participants will have scores in the lower range versus the higher range (higher scores on this questionnaire indicates greater behavioural problems and this population is asymptomatic), so I would think this is a typical response for this population.

Please let me know if there are other suggestions or feedback. Thank you very much in advance for your assistance.

Thank you,

Tamara

Code I used to plot residuals:

Proc glimmix data=lib.cbi_final_long Plots=all;
Where time in (1 2);
Class id time gs (ref="neg") family;
Model cbi_total =time gs EYO
gs*EYO gs*time
/ddfm=kr2 dist=lognormal solution;
Random intercept /subject=family type=vc;
Random intercept /subject=id (family) type=vc;
Random Time / subject=id residual type=AR(1);
covtest 'between subject variance =0?' zeroG;
output out=plot1 pred(ilink)=predi stderr(ilink)=sepredi pred=pred stderr=sepred
resid=resid student=student;
Run;

title scatterplot of residuals by cbi total score;
proc sgplot data=plot1;
scatter x=cbi_total y=student;
run;

Haris · Posted 03-12-2019 10:56 AM

I would not worry about banding as long as there is no systematic increase/decrease in residuals by Linear Predictor. If your outcome is highly concentrated in some categories, you may want to analyze it as a discrete outcome rather than continuous.

Which distribution fit your data better: lognormal or Poission? By the looks of your residual plot, you may need to look at other zero-truncated distributions as well for a good match.

Plots at Time 1 and Time 2 should be histograms. That will enable you to see clustering. Scatterplots are superimposed and you can't see the frequency of responses at each level.

tammy2 · Posted 03-12-2019 11:27 AM

Great, thank you very much for those suggestions. I will look into those distributions and see which one is better.

Best,

Tamara

Proc Glimmix and residuals

Re: Proc Glimmix and residuals

Re: Proc Glimmix and residuals

Re: Proc Glimmix and residuals

Re: Proc Glimmix and residuals

Re: Proc Glimmix and residuals

Re: Proc Glimmix and residuals

Proc Glimmix and residuals

Re: Proc Glimmix and residuals

Re: Proc Glimmix and residuals

Re: Proc Glimmix and residuals

Re: Proc Glimmix and residuals

Re: Proc Glimmix and residuals

Re: Proc Glimmix and residuals

SAS Innovate 2025: Call for Content