Hello,
is there a way to handle properly a mixed model with uneven number of measurements per subject and (more importantly) with uneven time intervals between measurements which are taken at different time points (the dataset contains observations during several years)?
I have data from artificial insemination stations about quality of ejaculate of boars. The goal is to determine if a particular mutation (snp) affects quality of ejaculate. Sample data:
boar_id breed year snp age(days) y(quality) measurement_order
1 A 2020 AA 250 330 1
1 A 2020 AA 265 290 2
2 B 2016 AA 330 400 1
2 B 2016 AA 350 385 2
2 B 2017 AA 365 360 3
The biggest problem I see is that measurement 1 for boar 1 is a completely different time point (date) than measurement 1 for boar 2.
I wanted to try something like this:
proc mixed data=have;
class boar_id breed year snp measurement;
model y = age interval breed year snp measurement;
repeated measurement / subject=boar_id(snp) type=SP(POW);
run;
where interval would mean number of days from the last measurement. But I am not sure if it can fix the problem above.
I also considered giving every unique date of observation its "serial number" (so instead of measurement_order I would use a time point from 1 to n), but then I end up having thousands of levels for fixed effect of time...
So is there a solution?
Here is a link to an example in PROC GLIMMIX that may be useful:https://documentation.sas.com/doc/en/statug/15.2/statug_glimmix_examples09.htm . This example analyzes body weight in cows measured at 23 unequally spaced time points. In your case, the difficult part will be calculating all of the timepoints observed, and inserting missing values. Then you will have to hope that you can get the model to converge. That is the equivalent to your "serial number" analysis.
An alternative approach would be a generalized estimating approach. PROC GEE Example 49.3 Weighted GEE for Longitudinal Data That Have Missing Values https://documentation.sas.com/doc/en/statug/15.2/statug_gee_examples03.htm . This very clever method uses the MISSMODEL statement to handle the probability that a measurement is missing. As your PROC MIXED is set up to estimate the marginal effects in the model, the GEE approach to a marginal model may be more tractable.
SteveDenham
Here is a link to an example in PROC GLIMMIX that may be useful:https://documentation.sas.com/doc/en/statug/15.2/statug_glimmix_examples09.htm . This example analyzes body weight in cows measured at 23 unequally spaced time points. In your case, the difficult part will be calculating all of the timepoints observed, and inserting missing values. Then you will have to hope that you can get the model to converge. That is the equivalent to your "serial number" analysis.
An alternative approach would be a generalized estimating approach. PROC GEE Example 49.3 Weighted GEE for Longitudinal Data That Have Missing Values https://documentation.sas.com/doc/en/statug/15.2/statug_gee_examples03.htm . This very clever method uses the MISSMODEL statement to handle the probability that a measurement is missing. As your PROC MIXED is set up to estimate the marginal effects in the model, the GEE approach to a marginal model may be more tractable.
SteveDenham
Thank you, Steve. Again. The example is great and I will definitively look into it.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.