BookmarkSubscribeRSS Feed
AB85
Fluorite | Level 6

I am modeling the probability of a child being retained in kindergarten using the ECLS-K 2011 dataset.  The model includes random intercepts and slopes. I'd like to allow the cluster-level residuals to be correlated. Here is my current code:

 

proc glimmix;

class s2_ID;

model Retained (event=last) = X2RTHETK1_cwc X2MTHETK1_cwc S2NMRETK_gmc x1ageent_cwc/cl dist=binary link=logit solution ;

random intercept X2RTHETK1_cwc X2MTHETK1_cwc/ subject=s2_id ;

run;

 

X2RTHETK1_cwc and X2MTHETK1_cwc are reading and math achievement in kindergarten (centered within clusters), S2NMRETK_gmc is the number of students retained the prior year (grand-mean centered), and x1ageent_cwc is the age at kindergarten entry (centered within clusters). 

 

I'm not sure which options I need to include and the ones I have tried have resulted in errors. I have tried changing the covariance structure to type=un but am getting this error "Estimated G matrix is not positive definite." I've also tried adding a random _residual_ statement but cannot get the model to converge using this approach. What options make the most sense for what I'm trying to do? I'm using SAS 9.4.

3 REPLIES 3
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

More detail about your study design would be helpful. 

 

I might assume that a cluster (s2_id?) is a classroom with students nested within. I might assume that X2RTHETK1, X2MTHETK1, and x1ageent are student-level variables (that students are the sampling units for these variables). The sampling unit for S2NMRETK is not clear to me. And my assumptions may be incorrect: I am not familiar with the ECLS-K 2011 dataset.

 

Are your clusters independent? If not, in what way are they dependent?

 

AB85
Fluorite | Level 6

Thanks so much for your response. Please see my answers below:

 

I might assume that a cluster (s2_id?) is a classroom with students nested within.

s2_id is the school.

 

I might assume that X2RTHETK1, X2MTHETK1, and x1ageent are student-level variables (that students are the sampling units for these variables).

Yes, these are student-level variables: reading score, math score, and age at kindergarten entry. 

 

The sampling unit for S2NMRETK is not clear to me. And my assumptions may be incorrect: I am not familiar with the ECLS-K 2011 dataset.

S2NMRETK is a school-level variable that indicates the number retained in kindergarten in the school the prior school year.

 

Are your clusters independent? If not, in what way are they dependent?

Clusters are independent

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

So, you have students nested within schools. X2RTHETK1, X2MTHETK1, and x1ageent are student-level variables. S2NMRETK  is a school-level variable. I presume Retained is binary.

 

If your clusters (schools) are independent, then there is no need "to allow the cluster-level residuals to be correlated". I think what you might be thinking is to allow nonzero covariances among the random intercept and random slopes for X2RTHETK1_cwc and X2MTHETK1_cwc which would be accomplished by 

 

random intercept X2RTHETK1_cwc X2MTHETK1_cwc/ subject=s2_id type=un;

The "Estimated G matrix is not positive definite." message occurs because one or more variances/covariances have been set to zero, which might be due to the (co)variance being very small or to inadequate data support for its estimation or to an estimation method that is less optimal (binary response data can be problematic). You could try various adjustments to the model such as using Laplace estimation; see the papers by Kiernan, Tao, and Gibbs (2012) and Tao, Kiernan and Gibbs (2015) for this and other ideas.

 

With binary response data, you will not need a random _residual_ statement.

 

If you have not already done so, you would probably find value in reading in detail about multilevel modeling; there is an extensive list or resources here.

 

I hope this helps.

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 2310 views
  • 0 likes
  • 2 in conversation