Hi:
I am working on a study. It has been planned to do multiple imputation for missing values related to the primary endpoint by an agency's requirements. I have read many materials online and have some ideas. But, I am still not sure.
Study information: The study's indication is Epilepsy. Each patient is given a diary and they are supposed to record how many seizures they expeirenced each day during the study period (DB period is ~ 85 days). As you can image, some patients may forget to record their seizures on some days and some patients may discontinue from the study before the Day 85. So there are missing values for seizure counts on some days for some subjects.
My questions:
1. Our missing type is considered as 'missing as random' and so this procedure(Proc MI) can be used, correct?
2. Since our missing data is only in one variable, i.e., seizure count, so I think I should NOT use the methods in SAS documentation (Imputation Methods, Table 5) with 'monotone', correct?
3. Since our data is 'seizure count', which should follow poission distribution (correct?), not normal, I should NOT use methods with 'MCMC' since MCMC method is based on the assumption of multivariate normal distribution (MVN) for variables, correct?
4. Then I thought I should use FCS, fcs reg, or fcs regpmm. I read SAS documentation, it has "The predictive mean matching method ensures that imputed values are plausible; it might be more appropriate than the regression method if the normality assumption is violated (Horton and Lipsitz 2001, p. 246)." So I thought I should use 'fcs regpmm'.
I also tried 'fcs reg', the imputed values gives non-integer, a number with decimal. It seems it does not fit my case. Our seizure is an count; so it should be an interger.
If I use 'fcs regpmm', the imputed values are integers.
5. If using 'fcs regpmm' is correct for my case, what number of 'k' (SAS option with 'fcs regpmm' option) should I pick?
Here is the code I use.
proc mi data = post nimpute = 25 out = post_mi seed = 54321 noprint;
by subjid;
var qsdy count;'
fcs regpmm (/k = 5);
run;
Note: 'qsdy' is the study Day variable; it is from Day 1 till Day 85. 'count' is seizure count for each day. There are missings in this variable.
Note: since the imputation is by subjid, so covariates such as age, treatment, etc, are not needed (no change for an individual), correct?
If any detailed information is needed for this discussion, please ask me.
Thanks a lot in advance.
Xiaoshu
With one variable that has missingness you can always create a monotone missing data pattern by listing that variable last on the VAR statement.
Since you only have one variable with a missing value, it really doesn't matter whether you use the MONOTONE or FCS statement. But in general, you can always use the FCS statement for any missing data pattern (including monotone). It would be preferred however to use the MONOTONE method whenever possible because it is simpler with respect to the number of imputation models that would need to be run.
Proc MI does not have a specific option for dealing with count (poisson) data. To circumvent this you could use the REGPMM method on a MONOTONE statement. This will lead to only integer values being selected as imputed values.
As far as the value of K= that you should select, there is not really an ideal value, but instead there is a tradeoff of sorts. A smaller K= value tends to increase the correlation among the multiple imputations for the missing observation and results in a higher variability of point estimators in repeated sampling. On the other hand, a larger K= value tends to lessen the effect from the imputation model and results in biased estimators. There is a good discussion in Schenker and Taylor (1996 ) on p. 430.
Thanks so much for your reply.
I have a further question. Why would my data be considered as monotone? I have only one varialbe which has missing. Can you please help me out? I should not use 'fcs'?
in SAS code,
not 'fcs regpmm', but 'monotone regpmm'?
Thanks a lot!
Xiaoshu
With one variable that has missingness you can always create a monotone missing data pattern by listing that variable last on the VAR statement.
Since you only have one variable with a missing value, it really doesn't matter whether you use the MONOTONE or FCS statement. But in general, you can always use the FCS statement for any missing data pattern (including monotone). It would be preferred however to use the MONOTONE method whenever possible because it is simpler with respect to the number of imputation models that would need to be run.
Thank you so much for your reply. I am clear now. One of the key word is 'last'. I should put the variable with missing values (seizure count) in the last if I use monotone.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.