turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Proc Mixed - need help on basic question

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-02-2010 01:15 PM

I am trying to run a simple linear regression, however I have 2 observations for each individual in my sample (each obs collected on each of two non-consecutive days). I recognize that one option is to take the mean of the observations and run using a proc reg, however I was hoping to pool my data to increase my sample size, and then correct for the fact that two obs. came from each individual. I understand that proc mixed is an option here, but I am unclear of how to approach this. What I have so far is:

proc mixed data=new;

class id;

model serumHg = fishintake/solution;

repeated /subject=id;

run;

Any help would be very much appreciated

proc mixed data=new;

class id;

model serumHg = fishintake/solution;

repeated /subject=id;

run;

Any help would be very much appreciated

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-02-2010 10:58 PM

Is there a constant number of days between observations from one subject to the next? If so, then you could use code which is only slightly modified from the code which you show. For a consistent number of days between the two observations, you could employ the code

proc mixed data=new;

class id;

model serumHg = fishintake/solution;

repeated /subject=id type=cs;

run;

An alternate specification of the MIXED procedure which would produce the same result is

proc mixed data=new;

class id;

model serumHg = fishintake/solution;

random intercept /subject=id;

run;

Both of the above models assume that the residual variance is the same for each of the two measures. If you believe that is not a tenable assumption then you could use the code:

proc mixed data=new;

class id;

model serumHg = fishintake/solution;

repeated /subject=id type=un;

run;

As mentioned previously, the above models are appropriate if the number of days between observations is consistent from one subject to the next. If that is not the case, then you might need to employ a spatial covariance structure. (Note that time is the fourth dimension, so spatial structures are appropriate for modeling observations which are more or less distant in time.)

Let me make one more comment. You really do not gain in degrees of freedom when using the individual observations as compared with using the subject means. Using the individual observations can be important if there is some complexity to the residual variance structure like when there is a different amount of time between observations. Using the individual observations could also be important if you have period-specific predictors to incorporate into your model. Using the mixed model would also be indicated if you are really interested in understanding components of variance.

From the limited description which you have provided, it is my guess that the model in which you average the two responses per subject and regress those on the (single) predictor variable would be just as good for your needs as the mixed model. But that assumption is based on a guess about how your experiment is conducted based on limited information.

proc mixed data=new;

class id;

model serumHg = fishintake/solution;

repeated /subject=id type=cs;

run;

An alternate specification of the MIXED procedure which would produce the same result is

proc mixed data=new;

class id;

model serumHg = fishintake/solution;

random intercept /subject=id;

run;

Both of the above models assume that the residual variance is the same for each of the two measures. If you believe that is not a tenable assumption then you could use the code:

proc mixed data=new;

class id;

model serumHg = fishintake/solution;

repeated /subject=id type=un;

run;

As mentioned previously, the above models are appropriate if the number of days between observations is consistent from one subject to the next. If that is not the case, then you might need to employ a spatial covariance structure. (Note that time is the fourth dimension, so spatial structures are appropriate for modeling observations which are more or less distant in time.)

Let me make one more comment. You really do not gain in degrees of freedom when using the individual observations as compared with using the subject means. Using the individual observations can be important if there is some complexity to the residual variance structure like when there is a different amount of time between observations. Using the individual observations could also be important if you have period-specific predictors to incorporate into your model. Using the mixed model would also be indicated if you are really interested in understanding components of variance.

From the limited description which you have provided, it is my guess that the model in which you average the two responses per subject and regress those on the (single) predictor variable would be just as good for your needs as the mixed model. But that assumption is based on a guess about how your experiment is conducted based on limited information.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-03-2010 09:59 AM

Thank you so much for your insight, it was really helpful. I think I will now seriously consider taking the mean of my 2 observations - but just to clarify, my two days of data were collected 3-10 days apart, therefore not consistent from one subject to the next, so in this case you recommend a spatial covariance structure?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-03-2010 01:07 PM

Whether 3 days or 10 days produce a difference in the covariance structure of the subject-specific values probably depends on a lot of considerations that I don't have knowledge of. From your model, I see that your predictor variable is fishintake. You appear to be modeling serum mercury in fish based on the amount of food that they have consumed - or the serum mercury of an animal which feeds on fish such as river otters.

How much mercury is taken up and expressed in serum probably depends on fish (or river otter) age. If you are studying juveniles, then a difference of 3 days compared to a difference of 10 days could make a substantial difference. But this is just speculation on my part. You should investigate alternative models starting with the compound symmetry model specified previously (alternatively, the random effects model). For a spatial model, you could use code as follows:

proc mixed data=new;

class id;

model serumHg = fishintake/solution;

repeated /subject=id type=sp(pow)(time);

run;

where time is measurement date. The compound symmetry and spatial covariance models are not nested, so you cannot formally test which is better using a likelihood ratio test. However, I would note that the covariance structure of the compound symmetry model can be expressed as

_ _

Cov(R1, R2) = | V V*rho |

| V*rho V |

-- --

while the spatial covariance structure can be expressed as:

_ _

Cov(R1, R2) = | V V*(rho**d{12}) |

| V*(rho**d{12}) V |

-- --

where d{12} is the difference in days between the first and second measurement. You will note that both models are identical with the exception that the spatial model incorporates the distance between measurements as a correction to the covariance between the two measures with the distance between measurements a known quantity (not a parameter to estimate). Thus, whichever of these models has the smaller value of -2LL would be the preferred model.

There are other spatial covariance structures which you could employ as an alternative to the spatial power model specified above. See the REPEATED statement syntax for the MIXED procedure for other spatial covariance structures. Again, for the spatial covariance structures which you might employ (sp(exp), sp(gau), sp(lin), sp(linl), sp(sph)), there will not be a likelihood ratio test that allows selection of the best model. Model selection may be based on established literature on the subject or on which model produces the smallest value for -2LL.

How much mercury is taken up and expressed in serum probably depends on fish (or river otter) age. If you are studying juveniles, then a difference of 3 days compared to a difference of 10 days could make a substantial difference. But this is just speculation on my part. You should investigate alternative models starting with the compound symmetry model specified previously (alternatively, the random effects model). For a spatial model, you could use code as follows:

proc mixed data=new;

class id;

model serumHg = fishintake/solution;

repeated /subject=id type=sp(pow)(time);

run;

where time is measurement date. The compound symmetry and spatial covariance models are not nested, so you cannot formally test which is better using a likelihood ratio test. However, I would note that the covariance structure of the compound symmetry model can be expressed as

_ _

Cov(R1, R2) = | V V*rho |

| V*rho V |

-- --

while the spatial covariance structure can be expressed as:

_ _

Cov(R1, R2) = | V V*(rho**d{12}) |

| V*(rho**d{12}) V |

-- --

where d{12} is the difference in days between the first and second measurement. You will note that both models are identical with the exception that the spatial model incorporates the distance between measurements as a correction to the covariance between the two measures with the distance between measurements a known quantity (not a parameter to estimate). Thus, whichever of these models has the smaller value of -2LL would be the preferred model.

There are other spatial covariance structures which you could employ as an alternative to the spatial power model specified above. See the REPEATED statement syntax for the MIXED procedure for other spatial covariance structures. Again, for the spatial covariance structures which you might employ (sp(exp), sp(gau), sp(lin), sp(linl), sp(sph)), there will not be a likelihood ratio test that allows selection of the best model. Model selection may be based on established literature on the subject or on which model produces the smallest value for -2LL.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-03-2010 02:58 PM

Thanks very much for your help,

I think I know where I can go from here!

I think I know where I can go from here!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-03-2010 10:00 AM

If you only have two observations and these are more or less equidistant I would simplify the problem either adjusting by the baseline value or analysing the difference from baseline:

proc glm data=new;

model serumHg_second_measurement = serumHG_baseline fishintake /solution;

run;

or

proc glm data=new;

model serumHG_difference = fishintake /solution;

run;

where serumHG_difference = final - baseline

Regards,

Juanvte.

proc glm data=new;

model serumHg_second_measurement = serumHG_baseline fishintake /solution;

run;

or

proc glm data=new;

model serumHG_difference = fishintake /solution;

run;

where serumHG_difference = final - baseline

Regards,

Juanvte.