BookmarkSubscribeRSS Feed
tammy2
Calcite | Level 5

Hi All,

 

I was hoping to get some assistance with my SAS code and some questions I have regarding the procedure. I am fairly new to SAS and general linear mixed models.  

 

Description of Data: 

I am interested in examining whether two groups (variable=gs) show differences in scores on a questionnaire measure over time (2 time points). I am also interested in examining whether there are differences in the groups as a function of the time to expected disease onset (var=EYO).  Importantly, not all participants have data for the two time points, some only have data for the first time point. Additionally, the groups are nested in families. 

 

The scores on the questionnaires can range from 0 to 180. When I plotted the data I found that it is positively skewed and hence, I decided to try a poisson distribution to account for this distribution. 

 

SAS Code (version 9.1)

Proc glimmix data=lib.cbi_final_long Plots=residualpanel method=laplace;
Where time in (1 2);
Class id time gs (ref="neg") family;
Model cbi_total =time gs EYO
time*GS time*EYO gs*EYO gs*time*EYO
/dist=poisson link=log solution;
Random intercept /subject=family type=UN;
Random intercept /subject=id (family) type=UN; 
lsmeans gs /ilink;
store model_1;
Run;

 

Questions:

(1) My code seems to run. It converges but the log indicates that "at least one element of the gradient is greater than 1e-3".I was wondering whether my code looks appropriate, or are there any outstanding errors?

 

(2) I understand that with proc mixed, I could use the "influence" statement to obtain influence diagnostics to assess for outliers. Is there something similar I can use with proc GLIMMIX? How can I assess whether there are outliers that should be removed from the analysis?

 

(3) I have seen some examples for the repeated statement specifically to include "time" in the last random statement (e.g.  Random time /subject=id (family) type=UN;) to account for the fact that not every participated completed both time points. Is this necessary for the random statement and if I am indicating to only analyze time 1 and time 2?

 

(4) I have n=588 participants who completed the questionnaire at time1 and n=342 participants who completed the questionnaire at time2. Is it appropriate that the "number of observations read/used" in the SAS output equals to N= 930 instead of 1176?

 

Thank you very much for your assistance.

 

Best,

Tamara 

 

3 REPLIES 3
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Perhaps arguably, subjects which do not have data for both times are of little use in assessing difference over time. You need to consider why data are missing at time 1 or 2, and whether the process that induces missing-ness induces a bias. In other words, are data missing at random? How many participants have data for both times?

 

Individual subjects (IDs) are essentially blocks for time, and when you have missing data then these blocks are incomplete. SAS mixed procedures will run, but if data are missing for some reason that induces bias then you probably should not fit a naive incomplete block model. 

 

The potential for bias with missing data is the primary problem. A subsequent issue is the implementation of random effects in a model that specifies a Poisson (or other non-normal) distribution. It is more complicated than in the normal distribution case. I refer you to SAS for Mixed Models: Introduction and Basic Applications (available through SAS in electronic form or Amazon as paper) and Generalized Linear Mixed Models: Modern Concepts, Methods and Applications for details. With scores ranging from 0 to 180, a Poisson assumption may not be your best choice, but of course that totally depends on the distributional properties of the data.

 

tammy2
Calcite | Level 5

Thank you very much for your detailed response. I completely understand the potential bias in including individuals with only 1 time point. The participants who do not have time2 data are participants who have not been "called back" for a second visit, hence are missing those data points. I should also clarify that this data set contains a very unique cohort and thus we are very eager to utilize all the data we have. I am planning on running a sensitivity analysis with only participants who completed both time points to see whether the pattern of results differ compared with the results with all participants. I am taking into consideration your suggestions and comments regarding bias in our data and will continue to reflect on that throughout the analysis. 

 

From examples I have seen online, my distribution does resemble either a Poisson or negative binominal distribution. With this distribution, would you recommend any changes to my SAS code, or does it seem accurate? Furthermore, are there any options I can use to assess potential outliers, similar to the influence option for proc mixed?

 

Thank you very much for your assistance!

Best,

Tamara 

 

 

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

I do not know enough about the distributional characteristics of your scores or how they were obtained to offer much advice. I tend to reserve Poisson and negative binomial for actual count data. You might find this link useful as you think about these distributions:  https://www.encyclopediaofmath.org/images/2/2a/Modeling_count_data.pdf

 

If your scores can be thought of as being on a continuous scale, you could ponder a transformation (e.g., log) to accommodate skewness and heterogeneity of variance and then possibly be able to use a normal distribution.

 

As you note, you can obtain influence statistics using the INFLUENCE option on the MODEL statement in the MIXED procedure. As far as I know, there is nothing similar for the GLIMMIX procedure. In R, the influence.ME package takes a stab at it, although the authors note it is an imperfect effort. 

 

I hope this helps.

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1358 views
  • 0 likes
  • 2 in conversation