topic Re: PROC MI: Predicitive Mean Matching is Running slowly on Large Dataset in Statistical Procedures

PROC MI: Predicitive Mean Matching is Running slowly on Large Dataset

whs278 — Tue, 13 Jul 2021 20:38:32 GMT

I'm trying to impute 6 variables using PROC MI and the fully condidional specification method.

Two of these variables are attendance rates bounded by 0 and 100. I have used PROC MI and FCS to imputed values which works fine except for the fact that some attendance values end up being a lot more than 100. I've tried setting MAX =100 but the algorithm ends up stopping. I then tried to use predictive mean matching for the attendance variables. However, the dataset is very large so this seems to take a very long time. I was wondering if there was any way to speed up predictive mean matching.

Re: PROC MI: Predicitive Mean Matching is Running slowly on Large Dataset

ballardw — Tue, 13 Jul 2021 20:45:13 GMT

How large is "large"? Large number of observations takes time.

It may help to include the code you are running to avoid suggestions of options you are using or to get hints that are more likely to be applicable.

Re: PROC MI: Predicitive Mean Matching is Running slowly on Large Dataset

whs278 — Tue, 13 Jul 2021 21:04:43 GMT

There are 757,400 observations.

The code below takes about a minute with 50,000 observations. However, the procedure seems to take for ever (at least 2 hours) after going beyond about 70,000 observations, which is only 1/10 of the data.

PROC MI DATA = HSPSTUID(OBS = 50000) NIMPUTE =  1 OUT = HSPSTU_MI1;
	CLASS ETHCAT GENCAT OVERAGEG09 REGMTHP65G08P;
	FCS NBITER = 1 REGPMM ( ATTPCTROLG08MI);
	FCS NBITER = 1 REG (ELASSCZG08MI);
	FCS NBITER = 1 REG (MTHSSCZG07MI);
	FCS NBITER = 1 REGPMM ( ATTPCTROLG07MI);
	FCS NBITER = 1 REG (ELASSCZG07MI);
	FCS NBITER = 1 REG (MTHSSCZG08MI);
	VAR ETHCAT GENCAT OVERAGEG09 REGMTHP65G08P ATTPCTROLG08MI ELASSCZG08MI MTHSSCZG07MI ATTPCTROLG07MI ELASSCZG07MI MTHSSCZG08MI;
	WHERE IN_MI = 1;

RUN

None of the variables in the class statement have any missing data. The two attendance variables use REGPMM and the other four continous variables use REG. I've order the FCS statements by the number of missing observations in each variable.

I purposedly set NIMPUTE = 1. I know multiple imputation is superior but we're currently sticking with single imputation. In any case, I imagine multiple imputation would take even longer.

Thanks for your help.

Re: PROC MI: Predicitive Mean Matching is Running slowly on Large Dataset

SteveDenham — Thu, 15 Jul 2021 12:48:24 GMT

For at 0 and 100 both of the imputed variables truncated at 0 and 100, do you feel comfortable imputing using the FCS method, and then post-processing the two variables to have a max or min of 0 or 100, as those are what could be actually measured?

SteveDenham

Re: PROC MI: Predicitive Mean Matching is Running slowly on Large Dataset

Rick_SAS — Thu, 15 Jul 2021 14:11:00 GMT

Is this a duplicate of

https://communities.sas.com/t5/Statistical-Procedures/proc-mi-produces-no-log/m-p/754134

If not, please let us know how this question is different.