Unbalanced data using MIXED, ANOVA, GLM

tadgerviloria · Posted 03-18-2024 05:50 AM

Dear SAS community,

I am interested to understand how I can use PROC MIXED to fit an ANOVA repeated measures with unbalanced data in a realiable way. I found several posts

Solved: PROC MIXED vs. ANOVA - SAS Support Communities
1. No codes are provided, but discuss pros and cons of using both procedure.
Microsoft Word - A beginner's example of PROC MIXED- Sarah R Greene.doc (lexjansen.com)
1. Transposing data (wide format) and use of RANDOM statment
2. CODE:
  1. Proc MIXED DATA=mydata.alldata_analysis1;
    CLASS word_type word_length subject;
    MODEL rt= word_type word_length word_type*word_length ddfm=bw;
    RANDOM intercept /sub=subject type=un;
    LSMEANS word_type*word_length;
    run;

When I am fitting an ANOVA model in PROC MIXED, I hope to see similar behaviour than ANOVA (such as GLM) with respect the unbalanced observations (not being included in the model). However, when I used the PROC mixed unbalanced observations are used.

thanks in advance

Kind regards

Philippe

PaigeMiller · Posted 03-18-2024 06:58 AM

Do NOT use PROC ANOVA for unbalanced data. It is my understanding that both PROC MIXED and PROC GLM handle unbalanced data properly, and the complete unbalanced data is used in the analysis. I don't know what it means to say the unbalanced observations are not used in the model -- the concept doesn't even make sense to me.

--
Paige Miller

tadgerviloria · Posted 03-18-2024 07:52 AM

Dear Paige

Many thanks for your prompt reponse.

My understanding is that balanced and unbalanced data is a term equivalent for complete/uncomplete cases.

"A repeated measures ANOVA requires a balanced number of repeated measurements for each experimental unit. Due to this requirement, experimental units with missing measurements are completely excluded from the analysis" (Guidelines for repeated measures statistical analysis approaches with basic science research conside...)

When I said the "the unbalanced observations [are] not being included in the model" is a similar statement than previous guideline paper saying "[the] missing measurements are completely excluded from the analysis". This is a behaviour that I would expect to see in any Anova model process. I wonder if this could be done in PROC MIXED, since the way I am fitting the model is keeping the unbalanced observation (uncomplete cases) in the model. Please see below the used code:

proc mixed data = DATA_ANOVA;
  class ID  TRT VISIT ;
  model chg = TRT TRT*VISIT / solution cl;
  repeated / subject=ID type =   AR(1);
run;

Thanks in advance

Philippe

StatsMan · Posted 03-18-2024 08:48 AM

Balanced/Unbalanced data refers to the counts in each cell of your design (same number of observations per treatment group). PROC ANOVA requires balanced data in the design, PROC GLM and PROC MIXED do not.

The data situation you describe is slightly different. Subjects with incomplete repeated measures are not included in PROC GLM. The method of moments used in GLM requires complete data for each subject. Subjects with incomplete data are used in PROC MIXED. Maximum likelihood methods do not require that subjects have observations for all time points. MIXED does allow only one observation per time point for a subject. So, GLM and MIXED will not agree if you have incomplete data on your subjects in a repeated measures analysis.

tadgerviloria · Posted 03-18-2024 09:16 AM

Dear StatsMan

Many thanks for your response!
Indeed, that it is my experience, PROC MIXED keeps the uncomplete cases in an Anova respeated measures, but PROC GLM is not. It is nice to know the reason behind (MLE in PROC MIXED, and MoM in PROC GLM) the discrepancy.

When you said "PROC ANOVA requires balanced data in the design, PROC GLM and PROC MIXED do not." are you refering to a Pre-post design (and no repeated measures in the middle) ? Why the diferences between PROC GLM and PROC ANOVA in this context?

Do you know what is PROC ANOVA expected to do in an Anova repeated measures when uncomplete cases are observed?

Thanks in advance

Philippe

StatsMan · Posted 03-18-2024 12:56 PM

It is best just to avoid PROC ANOVA. It was meant as a procedure for textbook-type problems. The method behind PROC ANOVA requires balanced data and there is no way to work around that requirement. Use GLM for your modeling needs that use only fixed effects. For models with random effects and/or repeated measures, use PROC MIXED.

ballardw · Posted 03-18-2024 01:07 PM

From the on-line help for Proc Anova in the Overview section at the start:

Overview: ANOVA Procedure

The ANOVA procedure performs analysis of variance (ANOVA) for balanced data from a wide variety of experimental designs. In analysis of variance, a continuous response variable, known as a dependent variable, is measured under experimental conditions identified by classification variables, known as independent variables. The variation in the response is assumed to be due to effects in the classification, with random error accounting for the remaining variation.

Emphasis added.

Traditional ANOVA in my personal opinion had exactly one advantage: traditional calculations could be done by hand. ( And I've done them that way because the computers I had available in the 1970's didn't have appropriate software).

Unbalanced data using MIXED, ANOVA, GLM

Re: Unbalanced data using MIXED, ANOVA, GLM

Re: Unbalanced data using MIXED, ANOVA, GLM

Re: Unbalanced data using MIXED, ANOVA, GLM

Re: Unbalanced data using MIXED, ANOVA, GLM

Re: Unbalanced data using MIXED, ANOVA, GLM

Re: Unbalanced data using MIXED, ANOVA, GLM

Overview: ANOVA Procedure