BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Janes1
Calcite | Level 5

Hi everyone,

I have a question about proc mixed procedure with some missing outcome values. using same dataset, same model, the two code have the same number of "Number of Observations Used", but very different parameter estimates. I suppose these estimates should be the same. Could anyone help explain why these estimates are different?  Thank you in advance!

 

code1:

proc mixed data = mydata;
class id order(ref="0");
model MS=order group stability /s ;
repeated / type=un subject=id r rcorr;
run;

Janes1_0-1662920801556.png

Janes1_1-1662920823879.png

 

 

Code2 (this code returns the same results as using the where statements in a data step to subset mydata, i.e.. data mydata;set mydata;where MS ne .;run; proc mixed data=mydata;...)

proc mixed data = mydata;
class id order(ref="0");
model MS=order group stability /s ;
repeated / type=un subject=id r rcorr;
where MS ne .;
run; 

 
Janes1_2-1662920854543.pngJanes1_3-1662920872380.png

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
StatsMan
SAS Super FREQ

You might get the results to agree if you specify your repeated effect on both the CLASS and REPEATED statement. Suppose you have measured subjects over 3 time points. For the first subject, you have a missing response for their 2nd observation. In your first model, you read in all 3 observations and MIXED creates 2 rows in R for the first subject. Because MIXED sees the missing response on the subject's 2nd observation, the rows in R will correspond to the 1st and 3rd observations for the subject. When you remove the missing responses with the WHERE statement, MIXED sees 2 observations for subject 1 (not 3) and creates 2 rows in R for this subject. But these rows correspond now to the 1st and 2nd observations, not the 1st and 3rd.

 

How to get around all of this? If the variable TIME measures the time point at which the observations are measured, then include TIME on the CLASS statement and before the / on the REPEATED statement. 

View solution in original post

4 REPLIES 4
SteveDenham
Jade | Level 19

This is the difference seen when comparing an "intent to treat (ITT)" analysis to a "per protocol (PP)" .  The first (and your first) uses all of the data to create the matrices needed to solve the mixed model equations, while the second (and your second) uses only those subjects for which the response variable is known.  In a repeated measures analysis, the PP method implies that records with a measured response, but a missing value for any of the covariates (fixed or random) will be included in the analysis. 

 

SteveDenham

StatsMan
SAS Super FREQ

You might get the results to agree if you specify your repeated effect on both the CLASS and REPEATED statement. Suppose you have measured subjects over 3 time points. For the first subject, you have a missing response for their 2nd observation. In your first model, you read in all 3 observations and MIXED creates 2 rows in R for the first subject. Because MIXED sees the missing response on the subject's 2nd observation, the rows in R will correspond to the 1st and 3rd observations for the subject. When you remove the missing responses with the WHERE statement, MIXED sees 2 observations for subject 1 (not 3) and creates 2 rows in R for this subject. But these rows correspond now to the 1st and 2nd observations, not the 1st and 3rd.

 

How to get around all of this? If the variable TIME measures the time point at which the observations are measured, then include TIME on the CLASS statement and before the / on the REPEATED statement. 

SteveDenham
Jade | Level 19

What @StatsMan points out is a solution to a kind of missing that gets overlooked in many cases.  This could be even more apparent if a structured covariance matrix was applied - lsmeans and standard errors could be grossly different, rather than slightly different as in the example posted by @Janes1 .

 

I wish I could apply more than 1 like to the response - this one is pretty profound in its application.

 

SteveDenham

StatsMan
SAS Super FREQ

Adding the R option to the REPEATED statement will show you the block of R for the first subject. You can specify multiple subject numbers using R=1,2,4,8 to print the block of R for the 1st, 2nd, 4th, and 8th subject. I have debugged lots of issues with mixed models over the years by looking at R (and G and V on the RANDOM statement) to see what kind of crazy covariance structure I have (mis)applied to my data. 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1626 views
  • 10 likes
  • 3 in conversation