BookmarkSubscribeRSS Feed
tammy2
Calcite | Level 5

Hello,

 

I was hoping to get some assistance with my SAS code and some questions I have regarding interpreting the residual plots.

 

Description of Data: 

I am interested in examining whether two groups (variable=gs) show differences in scores on a questionnaire measure over time (2 time points). I am also interested in examining whether there are differences in the groups as a function of the time to expected disease onset (var=EYO).  Importantly, not all participants have data for the two time points, some only have data for the first time point. Additionally, the groups are nested in families. 

 

The scores on the questionnaires can range from 0 to 180. When I plotted the data I found that it is positively skewed and hence, I decided to try a poisson distribution to account for this distribution. When I ran the code below, I saw that the residual vs. linear predictor plots seems to indicate heteroskedasticity (see attached file). At this time I am not sure whether this is something I need to resolve and the steps to be taken to resolve this issue. 

 

I am currently using SAS 9.4

 

My code is:

Proc glimmix data=lib.cbi_final_long Plots=studentpanel method=laplace;
Where time in (1 2);
Class id time gs (ref="neg") family;
Model cbi_total =time gs EYO
gs*EYO gs*time
/dist=poisson link=log solution;
Random intercept /subject=family type=UN;
Random intercept /subject=id (family) type=UN;
lsmeans gs /ilink;
output out=new1 pred(ilink)=predi stderr(ilink)=sepredi pred=pred stderr=sepred
resid=resid student=student;
Run;

 

Thank you very much for your assistance.

 

Best,

Tamara 

 

6 REPLIES 6
Haris
Lapis Lazuli | Level 10
The plot you are using seems to be a product of your choice of distribution more than anything else. Poisson would not be my choice for a continuous variable that you have. Have you tried lognormal? If you find that variance is not equal in your two groups, you can add a 'GROUP=GS' option to your RANDOM statement to allow for the variance estimates to be different between the two groups.

Second, you are not modeling repeated measures by time. I would use a random r-side effect 'RANDOM Time / sub=ID residual type=AR(1)'.

Are you sure you need TYPE=UN for your g-side matrices? Most commonly, measurements for different individuals in a study are not correlated and are modeled as TYPE=VC.
tammy2
Calcite | Level 5

Thank you very much for your detailed response and suggestions. I am a novice at SAS and GLMMS so I really appreciate your feedback.

 

I tried to change the distribution and my current code is below. I have attached a picture of my studentized residuals, which still seem inaccurate. As well, there were several notes in the log file indicating that some observations were not used since I have zeros in my data (the participant's score on the questionnaire can range from 0-180).

 

Are there other major errors in my code or other suggestions I can try?

 

Thank you very much,

Tamara

 

 

CURRENT CODE:

Proc glimmix data=lib.cbi_final_long Plots=all;
Where time in (1 2);
Class id time gs (ref="neg") family;
Model cbi_total =time gs EYO
gs*EYO gs*time
/dist=lognormal solution;
Random intercept /subject=family type=vc;
Random intercept /subject=id (family) type=vc;
Random Time / subject=id residual type=AR(1);
Run;

 

NOTES IN LOG FILE: 

NOTE: Some observations are not used in the analysis because of: zero or negative response (n=355).
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: Estimated G matrix is not positive definite.
NOTE: A linear combination of covariance parameters is confounded with the residual variance.
NOTE: PROCEDURE GLIMMIX used (Total process time):
real time 33.67 seconds
cpu time 32.75 seconds

 

Haris
Lapis Lazuli | Level 10

Tamara, 

 

LogNormal distribution cannot accept zero as the log of zero is undefined.  You need to add a constant to your outcome and remember that you did that when interpreting the results; e.g. CBI_Total+1.  That will retain the zeros.

 

Your residual distribution does not look dependent on the linear predictor.  I suspect that the banding is the property of your outcome not the model.  Plot residuals with CBI_Total on the X-axis.  What do you see?

Your residual distribution looks quite normal except for a spike on the lower end.  Looks like you have an atypically common response or something like that.  Take a look at the distribution of your outcome score by Time and see if you can spot what the deviations from normality are.  If it is an atypically common single Total score, what is that and why, what does it tell you about the instrument or the population?

 

tammy2
Calcite | Level 5

Thank you very much for your suggestions and feedback. 

 

The residual plots with the CBI_Total score with the constant added to it looks similar to the previous residual plots; there is the banding on the residual vs. linear predictor plots. 

 

When I plotted the CBI_Total score with the studentized residuals (see attached), it resembles a cubic curve. I am not too sure what time means in terms of my linear predictors?

 

When I plotted the CBI Total score with time (see attachment), I do see that there are more responses ranging from 0-15 (the questionnaire scores can range from 0-180). I would assume that the majority of participants will have scores in the lower range versus the higher range (higher scores on this questionnaire indicates greater behavioural problems and this population is asymptomatic), so I would think this is a typical response for this population. 

 

Please let me know if there are other suggestions or feedback. Thank you very much in advance for your assistance. 

 

Thank you,

Tamara

 

 

Code I used to plot residuals:

 

Proc glimmix data=lib.cbi_final_long Plots=all;
Where time in (1 2);
Class id time gs (ref="neg") family;
Model cbi_total =time gs EYO
gs*EYO gs*time
/ddfm=kr2 dist=lognormal solution;
Random intercept /subject=family type=vc;
Random intercept /subject=id (family) type=vc;
Random Time / subject=id residual type=AR(1);
covtest 'between subject variance =0?' zeroG;
output out=plot1 pred(ilink)=predi stderr(ilink)=sepredi pred=pred stderr=sepred
resid=resid student=student;
Run;

 

title scatterplot of residuals by cbi total score;
proc sgplot data=plot1;
scatter x=cbi_total y=student;
run;

Haris
Lapis Lazuli | Level 10

I would not worry about banding as long as there is no systematic increase/decrease in residuals by Linear Predictor.  If your outcome is highly concentrated in some categories, you may want to analyze it as a discrete outcome rather than continuous.

 

Which distribution fit your data better: lognormal or Poission?  By the looks of your residual plot, you may need to look at other zero-truncated distributions as well for a good match.

 

Plots at Time 1 and Time 2 should be histograms.  That will enable you to see clustering.  Scatterplots are superimposed and you can't see the frequency of responses at each level.

tammy2
Calcite | Level 5

Great, thank you very much for those suggestions. I will look into those distributions and see which one is better.

 

Best,

Tamara 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2295 views
  • 0 likes
  • 2 in conversation