Hello,
I've run into a bit of an issue with proc mixed and I'm hoping someone here can help. I'm using a repeated measures design to examine differences in a single measure in 5 treatment groups over 14 days. I have a lot of missing data in the study, which is I'm using proc mixed. This is all well and good. Here's my code:
proc mixed data=OND;
class Mouse Treatment Day;
model Running= Treatment | Day;
repeated Day / subject =Mouse;
lsmeans Treatment | Day / diff;
run;
In general I'm getting good results with this (feel free to critique though). The problem is that while most of my groups have data for all 14 days, one of my groups only has data for 8 days (mortality issues), so when I look at the output from the lsmeans, this group is non-estimate-able, and the difference estimates relevant to this group fail.
I thought proc mixed was able to handle missing data, so I'm confused about why it can't deal with this. Does anyone have any thoughts about how I can get the estimates for this group? Am I using the wrong model for this type of data? Please help! Thanks ahead of time.
You can fit models with missing data, but you cannot obtain least squares means for main effects when interaction cells are completely empty (when you have interactions in the model). There are different approaches to handle this. One common way is to fit the 'means model'. Just use the interaction term:
model = Treatment*Day;
lsmeans Treatment*Day / diff;
You can write contrast or estimate statements for selected contrasts of interest.
I'm confused. Perhaps I don't understand what you're suggesting. I don't see how this is any different than what I've done. Doesn't the vertical bar ( | ) imply the isolate terms as well as their interaction? In other words, that writing "Treatment | Day" was identical to writing "Treatment Day Treatment*Day". If my understanding is correct, could you clarify how your suggestion differs from my original coding? I'm just trying to figure this out. Thanks for your time and thoughts.
model Treatment*Day;
only gives Treatment*Day, no main effect of Treatment and no main effect of Day. This is often called a means model in comparison to an effect model (what you coded). If you have combinations treatment and day with no observations, you cannot have the main effect lsmeans (they are not estimable, by definition, if you also have an interaction in the model).
Ah! I understand now. Thanks for explaining it. When I look at my results though, I see an effect of treatment but no effect of day or interaction of the two. So, I'm need to estimate the effect of each of the five treatment groups. If I just use the treatment*day term, can I still estimate the main effect of treatment using "estimate" statements?
Since I have one treatment group that ends the study early, it doesn't seem reasonable to assume my data points are missing at random. The more I look into this, the more it seems like I need some sort of joint model to link the longitudinal quantitative measure with the survival data. Does anyone have any experience with this in SAS? I think I'm getting way out of my league here...
With missing cells, there are different approaches one could take, and no simple guidance for you (even assuming that you have missing at random). There will be detractors and proponents for each possibility. If there is good evidence that you do not have an interaction, then you could just model the main effect(s).
model = Treatment Day;
lsmeans Treatment Day / diff;
Results would be biased if you really have an interaction (but fail to detect it because of poor power). If you don't have many missing cells at the end, you could just end the data set at the last time where you have at least one observation in each treatment-time cell. But if this means giving up too many times for some treatments, then this would not be a great approach. One could use multiple imputation to fill in the cells m times, and use the methodology for imputations. Or you could use the full data set and the means model (just treatment*days, no main effect), as I described above. You could use some contrasts for treatment, but this will be tricky to code if you are not used to contrast syntax in mixed. I can't get into this.
Given that the missing data are not random, this raises all kinds of other issues. There is a big literature on this for mixed models, but you don't have any easy solutions. You probably will need to work with a statistician to work this out. I am sure I forgot some other approaches.
I want to echo 's approach for using a means model. We have used it quite successfully in modeling repeated measures data where treatments were applied for different lengths of time. You will have to use ESTIMATE or LSMESTIMATE statements to get tests of main effects, but that is actually a good thing, and, in my opinion, much to be preferred to multiple imputation where the data are missing not at random.
Steve Denham
It sounds like the means model is the way to go. I've never used the model in this manner before, so I have a couple of questions. First off, I assume the F value from the fixed effects is no longer relevant. It's giving me garbage anyways. Next, Steve mentioned using LSESTIMATE, but unless I'm missing something, PROC MIXED can't do this. So, it seems like I'll need to use ESTIMATE. Given the fixed effects are garbage, do I need to specify the degrees of freedom for the ESTIMATE statement? Since I'm only using the interaction factor, I'll need to include all 70 terms in my ESTIMATE statement, correct? What would that make my DF?
Here's what I was considering for my ESTIMATE statement:
ESTIMATE 'Group 1 vs Group 2' intercept 1 Treatment*Day
.07 .07 .07 .07 .07 .07 .07 .07 .07 .07 .07 .07 .07 .07
-.07 -.07 -.07 -.07 -.07 -.07 -.07 -.07 -.07 -.07 -.07 -.07 -.07 -.07
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0;
Does this seem reasonable? I have 5 treatment groups (so five lines) and I got the .07 from the number of days I have data for (14). The 0's may not be necessary in this case, but they help me keep things straight as I create more ESTIMATE statements. I ran this, and it's still giving me "Non-est' for result. Thoughts? Thanks again for all the help.
I am bothered whey you say the F test for fixed effects are giving you garbage. What do you mean? If you are using the means model as we described, then the F test is for the effect of 'group', where group is the combination of treatment and time. That is, are there differences among the groups? You should be getting realistic numbers for F and the df. If not, you are doing something wrong.
Contrasts can get very tricky with interaction means. You have to be careful about how the levels are sorted, based on the order in the class statement and the values of the codes. You can go very wrong, and I can't tell. But if there are not 14 days for each treatment, there would be less than the 5*14 values.
All of this assumes that the data you have are unbiased measures of the actual random variable.
Here's the code I'm using:
proc mixed data=OND;
class Mouse Treatment Day;
model Running= Treatment*Day;
repeated Day / subject =Mouse;
run;
The Fixed effects results are:
Effect NumDF DenDF F Value Pr > F
Treatment*Day 63 199 2.00 0.0002
I only say they're garbage because they disagreed so intensely with my prior results, which showed p=0.9827 for the interaction. Is this an incorrect statement?
Regarding the ESTIMATES, how do I know if days are being excluded from my model? I assumed that SAS left those cells empty, but that they still existed for the group missing days. Is this not the case? Does it have fewer cells?
These are totally different hypotheses being tested. With
Treat Day Treat*Day,
test of the last term is strictly for the interaction (not for main effects). But if you don't have main effects, the test of
Treat*Day
is for everything (i.e., a test of 'group' where group corresponds to a combination of treat, day, and the interaction). The significant result you now get makes perfect sense. It does not necessarily mean you have an interaction.
I think you should get the book "SAS for Mixed Models, 2nd edition" by Littell et al. (2006).
Ah! That makes sense! Thanks. I'm heading to the library right now. It definitely seems like I need a better background in this, as I have no idea what I'm doing.
In the mean time, I still can't make comparisons between the bad group and the other groups, even using the means model. Using the "/solution" statement, I was able to see the order of the components of my ESTIMATE statement. Using this order, I still get non-est for every comparison with this group. I can accurately compare any other groups, but nothing works with this group. Any idea why this may be the case?
If you are putting in the right coefficients in the estimate statement, you should not have any problem at all with getting estimable values (for the 'means' model). I can't tell from your posts where the problem lies, but the advantage of the 'means' model is that you can get results. I am guessing you are putting in nonzeros for the wrong terms.
CONTRAST requires an exact sum to zero for all of the elements, so rounding in the coefficients can make things non-estimable. The ESTIMATE statement is better, and I see your example. However, it appears that the example has elements for missing parts, and so may be the reason for returning 'non-est'. It is one of the reasons I prefer using PROC GLIMMIX to PROC MIXED, and using the LSMESTIMATE statement. Divisors can be specified so that it all works out. Yes, it can be done with the ESTIMATE statement, but the idea in a means model is that you are estimating the least squares mean for each of the cells in treatment*day. From those, you estimate main effects (and differences in levels of the main effects) as linear combinations of those cell means. Rather quickly you will see if you are correctly specifying coefficients to get the comparisons of interest.
Steve Denham
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.