BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SASBAS
Calcite | Level 5

I have a dataset with a number of identical twin pairs and fraternal twin pairs. I want to examine the relationship between two variables (let's call them INDEPENDENT and DEPENDENT). However, I can't run a normal OLS regression because each twin's dependent variable is correlated with their co-twin's.

The way I have been dealing with this is to use SAS PROC MIXED and include a random intercept defined by twin pair (FAMILYID). Here is my syntax:

proc mixed method=ml covtest noclprint;<br>
  class FAMILYID; <br>
  model DEPENDENT = INDEPENDENT/solution;<br>
  random intercept/sub=FAMILYID type=un gcorr;<br>
run;

However, I've realized that I have a heteroskedasticity problem. The identical twins are likely to be more related to each other than the fraternal twins (variable indicating whether twins are fraternal or identical is called TWINTYPE), and the model doesn't reflect this.

According to SAS documentation for the RANDOM statement: "GRP=effect defines an effect specifying heterogeneity in the covariance structure of G. All observations having the same level of the group effect have the same covariance parameters."

It sounds like this is exactly what I want.  However, I am confused how this relates to the REPEATED statement and whether I need to use the REPEATED statement with "GROUP" instead of "RANDOM" or in addition to random.  Conceptually, what are each of these doing and are they redundant for my purposes? 

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

Change the repeated statement by deleting the type=un option.  With that option in place, it calculates some covariances that actually do not exist in a twin study.  With that out, you should get four covariance parameters:

Fraternal twin variance component

Identical twin variance component

Fraternal twin residual variance

Identical twin residual variance

If you do NOT get this, then please attach a copy of your output, and I will try to figure out what is going on.

Steve Denham

View solution in original post

6 REPLIES 6
SteveDenham
Jade | Level 19

You will probably have to add an additional variable the identifies the twin pair as homozygous or heterozygous.  For now call it twintype.  Then the following would work:

proc mixed method=ml covtest noclprint;
  class FAMILYID twintype;
  model DEPENDENT = INDEPENDENT/solution;
  random intercept/sub=FAMILYID type=un group=twintype gcorr;
run;

I am a bit on the fence about specifying type=un here, as FAMILYID only adds a variance component to your design, so far as I can tell.  I would recommend removing it.  The resulting random statement should give two variance components, derived from the twintype dichotomy. 

Steve Denham

SASBAS
Calcite | Level 5

Thank you, thank you, for this helpful response!  I do have a follow-up question for clarification, and my apologies if this reveals some misunderstanding on my part as I'm still learning about mixed models.

When I IGNORE the twin type and just include a random intercept reflecting that each person in the study is part of a "family cluster" (as below):

proc mixed method=ml covtest noclprint;

  class FAMILYID;

  model DEPENDENT = INDEPENDENT/solution;

  random intercept/sub=FAMILYID type=un gcorr;

run;


I get a covariance parameter estimate for the "FAMILYID" as well as a residual term.  My understanding is that this decomposes the variability into "within-cluster" variability (residual estimate) and "between-cluster" variability (FAMILYID estimate).  I get the intraclass correlation coefficient between my family clusters by computing:  (FAMILYID estimate) divided by (FAMILYID estimate + residual estimate).


When I use the syntax you suggested


proc mixed method=ml covtest noclprint;<br>

  class FAMILYID; <br>

  model DEPENDENT = INDEPENDENT/solution;<br>

  random intercept/sub=FAMILYID group=TWINTYPE type=un gcorr;<br>

run;


and look at my covariance parameters, I get one FAMILYID estimate for fraternal twins, one FAMILYID estimate for identical twins, and one residual term.  So -- this means that I have told my model to make different "between-cluster" estimates for identical and fraternal twins -- but isn't it still disregarding twin type when it calculates the residual term?  And if the residual term reflects the "within-cluster" variability, isn't it a problem that it doesn't take into consideration that there is likely to be LESS variability within an identical twin pair on the outcome variable and MORE variability within a fraternal twin pair? 


So I guess my follow-up question is:  is there syntax that would leave me with a between-cluster estimate for identical twins and fraternal twins, and then a residual term for identical twins and fraternal twins?  OR, am I confused as to why this would be needed?


Additional help/clarification much appreciated.  THANKS!

SteveDenham
Jade | Level 19

One way to get separate residual terms would be to fit a REPEATED statement ( in addition to the current RANDOM) statement.

Try adding:

REPEATED / group=twintype; /* May or may not want a subject= in this.  For now, I think not.  */

and see if this gives a heterogeneous error structure for the two twintypes.

Steve Denham

SASBAS
Calcite | Level 5

Ah!  Thank you for verifying that the REPEATED statement is where I can manipulate the residual term.

I tried the syntax above, and unfortunately, when I run it, I don't get any tests of fix effects or covariance parameter estimates or anything.  I get a log message that says:

"An infinite likelihood is assumed in iteration 0 because of a nonpositive definite

      estimated R matrix for Subject 606"

I googled this and it seems like it might be because the "TWINTYPE" variable is coded the same way across each individual?  My dataset is such that each line corresponds to an individual, and then there's a variable that indicates which family they are a part of and another variable that indicates whether their pair is fraternal or identical. 

I tried this syntax:

proc mixed method=ml covtest noclprint;

  class FAMILYID;

  model DEPENDENT = INDEPENDENT/solution;

  random intercept/sub=FAMILYID group=TWINTYPE type=un gcorr;

  repeated/sub=FAMILYID group=TWINTYPE type=un rcorr;

run;


and the model successfully ran, but it spit out eight different covariance parameter estimates.  Conceptually, I'm having trouble understanding what these would correspond to and how I'd calculate the intraclass correlation coefficient like I did when I just had the random statement to worry about.

Again, many thanks for your reply.

SteveDenham
Jade | Level 19

Change the repeated statement by deleting the type=un option.  With that option in place, it calculates some covariances that actually do not exist in a twin study.  With that out, you should get four covariance parameters:

Fraternal twin variance component

Identical twin variance component

Fraternal twin residual variance

Identical twin residual variance

If you do NOT get this, then please attach a copy of your output, and I will try to figure out what is going on.

Steve Denham

SASBAS
Calcite | Level 5

Yes, this worked!!!!

Thank you so much for your help.  I've searched long and hard on the internet for help with this question.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1829 views
  • 6 likes
  • 2 in conversation