- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm trying to impute 6 variables using PROC MI and the fully condidional specification method.
Two of these variables are attendance rates bounded by 0 and 100. I have used PROC MI and FCS to imputed values which works fine except for the fact that some attendance values end up being a lot more than 100. I've tried setting MAX =100 but the algorithm ends up stopping. I then tried to use predictive mean matching for the attendance variables. However, the dataset is very large so this seems to take a very long time. I was wondering if there was any way to speed up predictive mean matching.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
How large is "large"? Large number of observations takes time.
It may help to include the code you are running to avoid suggestions of options you are using or to get hints that are more likely to be applicable.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There are 757,400 observations.
The code below takes about a minute with 50,000 observations. However, the procedure seems to take for ever (at least 2 hours) after going beyond about 70,000 observations, which is only 1/10 of the data.
PROC MI DATA = HSPSTUID(OBS = 50000) NIMPUTE = 1 OUT = HSPSTU_MI1;
CLASS ETHCAT GENCAT OVERAGEG09 REGMTHP65G08P;
FCS NBITER = 1 REGPMM ( ATTPCTROLG08MI);
FCS NBITER = 1 REG (ELASSCZG08MI);
FCS NBITER = 1 REG (MTHSSCZG07MI);
FCS NBITER = 1 REGPMM ( ATTPCTROLG07MI);
FCS NBITER = 1 REG (ELASSCZG07MI);
FCS NBITER = 1 REG (MTHSSCZG08MI);
VAR ETHCAT GENCAT OVERAGEG09 REGMTHP65G08P ATTPCTROLG08MI ELASSCZG08MI MTHSSCZG07MI ATTPCTROLG07MI ELASSCZG07MI MTHSSCZG08MI;
WHERE IN_MI = 1;
RUN
None of the variables in the class statement have any missing data. The two attendance variables use REGPMM and the other four continous variables use REG. I've order the FCS statements by the number of missing observations in each variable.
I purposedly set NIMPUTE = 1. I know multiple imputation is superior but we're currently sticking with single imputation. In any case, I imagine multiple imputation would take even longer.
Thanks for your help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
For at 0 and 100 both of the imputed variables truncated at 0 and 100, do you feel comfortable imputing using the FCS method, and then post-processing the two variables to have a max or min of 0 or 100, as those are what could be actually measured?
SteveDenham
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Is this a duplicate of
https://communities.sas.com/t5/Statistical-Procedures/proc-mi-produces-no-log/m-p/754134
If not, please let us know how this question is different.