BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Katrina
Calcite | Level 5

Hi, I am trying to run an ANOVA using Proc Glimmix with an unbalanced dataset. About 30% of my data are missing and I think it is causing severe underdispersion. My generalized chi square/DF value is 0.04. 

 

This is my code:

 

proc glimmix;
class Coll Gen Rep;
model SG = Coll | Gen ;
random Rep;
lsmeans Gen/ pdiff adjust=tukey;
ods output lsmeans diffs;
ods output lsmeans=mmm diffs=ppp;
run;
%include 'C:\Users\uqkhodg1\Desktop\School-related\sas-macros\pdmix800.sas';
%pdmix800(ppp,mmm,alpha=0.05, sort=no);
run;

 

Any suggestions for dealing with the missing data?

1 ACCEPTED SOLUTION

Accepted Solutions
plf515
Lapis Lazuli | Level 10

The first step with missing data is to determine (as best you can) whether the data are MCAR, MAR or MNAR.

 

MCAR stands for missing completely at random. This means that there is no particular reason why some data are missing. Maybe the hard disk crashed, or some responses (at random) were lost or something like that.

 

MAR stands for missing at random.  This means that there may be reasons for the missingness, but that you can model those reasons using data that you actually have.

 

MNAR means missing not at random (also known as nonignorable nonresponse). That's when neither of the above are true.

 

Unfortunately, there's no test for this - you have to figure it out, based on logic and what you know.

 

For MCAR, you don't have to do anything. The only issue will be a loss of power. Estimates will be unbiased and so on.

 

For MAR and MNAR you can use PROC MI and PROC MIANALYZE to do multiple imputation of the missing data. PROC MI is pretty complicated and the choices aren't always obvious. You may want to consult with an expert.

View solution in original post

2 REPLIES 2
plf515
Lapis Lazuli | Level 10

The first step with missing data is to determine (as best you can) whether the data are MCAR, MAR or MNAR.

 

MCAR stands for missing completely at random. This means that there is no particular reason why some data are missing. Maybe the hard disk crashed, or some responses (at random) were lost or something like that.

 

MAR stands for missing at random.  This means that there may be reasons for the missingness, but that you can model those reasons using data that you actually have.

 

MNAR means missing not at random (also known as nonignorable nonresponse). That's when neither of the above are true.

 

Unfortunately, there's no test for this - you have to figure it out, based on logic and what you know.

 

For MCAR, you don't have to do anything. The only issue will be a loss of power. Estimates will be unbiased and so on.

 

For MAR and MNAR you can use PROC MI and PROC MIANALYZE to do multiple imputation of the missing data. PROC MI is pretty complicated and the choices aren't always obvious. You may want to consult with an expert.

Katrina
Calcite | Level 5
Great, thanks so much. I think my data are MCAR so it sounds like there is nothing to do. I'm planning on repeating the measurements for the missing data, so hopefully this helps.

sas-innovate-white.png

Missed SAS Innovate in Orlando?

Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.

 

Register now

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1458 views
  • 1 like
  • 2 in conversation