Solved: proc mixed, same number of observations used, different estimates

Janes1 · Posted 09-11-2022 02:28 PM

Hi everyone,

I have a question about proc mixed procedure with some missing outcome values. using same dataset, same model, the two code have the same number of "Number of Observations Used", but very different parameter estimates. I suppose these estimates should be the same. Could anyone help explain why these estimates are different? Thank you in advance!

code1:

proc mixed data = mydata;
class id order(ref="0");
model MS=order group stability /s ;
repeated / type=un subject=id r rcorr;
run;

Code2 (this code returns the same results as using the where statements in a data step to subset mydata, i.e.. data mydata;set mydata;where MS ne .;run; proc mixed data=mydata;...)

proc mixed data = mydata;
class id order(ref="0");
model MS=order group stability /s ;
repeated / type=un subject=id r rcorr;
where MS ne .;
run;

StatsMan · Posted 09-12-2022 10:37 AM

You might get the results to agree if you specify your repeated effect on both the CLASS and REPEATED statement. Suppose you have measured subjects over 3 time points. For the first subject, you have a missing response for their 2nd observation. In your first model, you read in all 3 observations and MIXED creates 2 rows in R for the first subject. Because MIXED sees the missing response on the subject's 2nd observation, the rows in R will correspond to the 1st and 3rd observations for the subject. When you remove the missing responses with the WHERE statement, MIXED sees 2 observations for subject 1 (not 3) and creates 2 rows in R for this subject. But these rows correspond now to the 1st and 2nd observations, not the 1st and 3rd.

How to get around all of this? If the variable TIME measures the time point at which the observations are measured, then include TIME on the CLASS statement and before the / on the REPEATED statement.

View solution in original post

SteveDenham · Posted 09-12-2022 09:04 AM

This is the difference seen when comparing an "intent to treat (ITT)" analysis to a "per protocol (PP)" . The first (and your first) uses all of the data to create the matrices needed to solve the mixed model equations, while the second (and your second) uses only those subjects for which the response variable is known. In a repeated measures analysis, the PP method implies that records with a measured response, but a missing value for any of the covariates (fixed or random) will be included in the analysis.

SteveDenham

StatsMan · Posted 09-12-2022 10:37 AM

You might get the results to agree if you specify your repeated effect on both the CLASS and REPEATED statement. Suppose you have measured subjects over 3 time points. For the first subject, you have a missing response for their 2nd observation. In your first model, you read in all 3 observations and MIXED creates 2 rows in R for the first subject. Because MIXED sees the missing response on the subject's 2nd observation, the rows in R will correspond to the 1st and 3rd observations for the subject. When you remove the missing responses with the WHERE statement, MIXED sees 2 observations for subject 1 (not 3) and creates 2 rows in R for this subject. But these rows correspond now to the 1st and 2nd observations, not the 1st and 3rd.

How to get around all of this? If the variable TIME measures the time point at which the observations are measured, then include TIME on the CLASS statement and before the / on the REPEATED statement.

SteveDenham · Posted 09-12-2022 10:57 AM

What @StatsMan points out is a solution to a kind of missing that gets overlooked in many cases. This could be even more apparent if a structured covariance matrix was applied - lsmeans and standard errors could be grossly different, rather than slightly different as in the example posted by @Janes1 .

I wish I could apply more than 1 like to the response - this one is pretty profound in its application.

SteveDenham

StatsMan · Posted 09-12-2022 02:07 PM

Adding the R option to the REPEATED statement will show you the block of R for the first subject. You can specify multiple subject numbers using R=1,2,4,8 to print the block of R for the 1st, 2nd, 4th, and 8th subject. I have debugged lots of issues with mixed models over the years by looking at R (and G and V on the RANDOM statement) to see what kind of crazy covariance structure I have (mis)applied to my data.

proc mixed, same number of observations used, different estimates

Re: proc mixed, same number of observations used, different estimates

Re: proc mixed, same number of observations used, different estimates

Re: proc mixed, same number of observations used, different estimates

Re: proc mixed, same number of observations used, different estimates

Re: proc mixed, same number of observations used, different estimates

proc mixed, same number of observations used, different estimates

Re: proc mixed, same number of observations used, different estimates

Re: proc mixed, same number of observations used, different estimates

Re: proc mixed, same number of observations used, different estimates

Re: proc mixed, same number of observations used, different estimates

Re: proc mixed, same number of observations used, different estimates

Registration is open