BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Katrina
Calcite | Level 5

Hi, I am trying to run an ANOVA using Proc Glimmix with an unbalanced dataset. About 30% of my data are missing and I think it is causing severe underdispersion. My generalized chi square/DF value is 0.04. 

 

This is my code:

 

proc glimmix;
class Coll Gen Rep;
model SG = Coll | Gen ;
random Rep;
lsmeans Gen/ pdiff adjust=tukey;
ods output lsmeans diffs;
ods output lsmeans=mmm diffs=ppp;
run;
%include 'C:\Users\uqkhodg1\Desktop\School-related\sas-macros\pdmix800.sas';
%pdmix800(ppp,mmm,alpha=0.05, sort=no);
run;

 

Any suggestions for dealing with the missing data?

1 ACCEPTED SOLUTION

Accepted Solutions
plf515
Lapis Lazuli | Level 10

The first step with missing data is to determine (as best you can) whether the data are MCAR, MAR or MNAR.

 

MCAR stands for missing completely at random. This means that there is no particular reason why some data are missing. Maybe the hard disk crashed, or some responses (at random) were lost or something like that.

 

MAR stands for missing at random.  This means that there may be reasons for the missingness, but that you can model those reasons using data that you actually have.

 

MNAR means missing not at random (also known as nonignorable nonresponse). That's when neither of the above are true.

 

Unfortunately, there's no test for this - you have to figure it out, based on logic and what you know.

 

For MCAR, you don't have to do anything. The only issue will be a loss of power. Estimates will be unbiased and so on.

 

For MAR and MNAR you can use PROC MI and PROC MIANALYZE to do multiple imputation of the missing data. PROC MI is pretty complicated and the choices aren't always obvious. You may want to consult with an expert.

View solution in original post

2 REPLIES 2
plf515
Lapis Lazuli | Level 10

The first step with missing data is to determine (as best you can) whether the data are MCAR, MAR or MNAR.

 

MCAR stands for missing completely at random. This means that there is no particular reason why some data are missing. Maybe the hard disk crashed, or some responses (at random) were lost or something like that.

 

MAR stands for missing at random.  This means that there may be reasons for the missingness, but that you can model those reasons using data that you actually have.

 

MNAR means missing not at random (also known as nonignorable nonresponse). That's when neither of the above are true.

 

Unfortunately, there's no test for this - you have to figure it out, based on logic and what you know.

 

For MCAR, you don't have to do anything. The only issue will be a loss of power. Estimates will be unbiased and so on.

 

For MAR and MNAR you can use PROC MI and PROC MIANALYZE to do multiple imputation of the missing data. PROC MI is pretty complicated and the choices aren't always obvious. You may want to consult with an expert.

Katrina
Calcite | Level 5
Great, thanks so much. I think my data are MCAR so it sounds like there is nothing to do. I'm planning on repeating the measurements for the missing data, so hopefully this helps.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 936 views
  • 1 like
  • 2 in conversation