BookmarkSubscribeRSS Feed
whs278
Quartz | Level 8

I'm trying to impute 6 variables using PROC MI and the  fully condidional specification method.  

 

Two of these variables are attendance  rates bounded by 0 and 100.   I have used PROC MI and FCS to imputed values which works fine except for the fact that some attendance values end up being a lot more than 100.  I've tried setting MAX =100 but the algorithm ends up stopping.  I then tried to use predictive mean matching for the attendance variables.  However, the dataset is very large so this seems to take a very long time.  I was wondering if there was any way to speed up predictive mean matching. 

4 REPLIES 4
ballardw
Super User

How large is "large"? Large number of observations takes time.

 

It may help to include the code you are running to avoid suggestions of options you are using or to get hints that are more likely to be applicable.

whs278
Quartz | Level 8

There are  757,400 observations.

 

The code below takes about a minute with 50,000 observations.  However, the procedure seems to take for ever (at least 2 hours) after going beyond about 70,000 observations, which is only 1/10 of the data.

 

PROC MI DATA = HSPSTUID(OBS = 50000) NIMPUTE =  1 OUT = HSPSTU_MI1;
	CLASS ETHCAT GENCAT OVERAGEG09 REGMTHP65G08P;
	FCS NBITER = 1 REGPMM ( ATTPCTROLG08MI);
	FCS NBITER = 1 REG (ELASSCZG08MI);
	FCS NBITER = 1 REG (MTHSSCZG07MI);
	FCS NBITER = 1 REGPMM ( ATTPCTROLG07MI);
	FCS NBITER = 1 REG (ELASSCZG07MI);
	FCS NBITER = 1 REG (MTHSSCZG08MI);
	VAR ETHCAT GENCAT OVERAGEG09 REGMTHP65G08P ATTPCTROLG08MI ELASSCZG08MI MTHSSCZG07MI ATTPCTROLG07MI ELASSCZG07MI MTHSSCZG08MI;
	WHERE IN_MI = 1;

RUN

None of the variables in the class statement have any missing data.   The two attendance variables use REGPMM and the other four continous variables use REG.  I've order the FCS statements by the number of missing observations in each variable. 

 

I purposedly set NIMPUTE  = 1.  I know multiple imputation is superior but we're currently sticking with single imputation.  In any case, I imagine multiple imputation would take even longer.  

 

Thanks for your help.  

SteveDenham
Jade | Level 19

For at 0 and 100 both of the imputed variables truncated at 0 and 100, do you feel comfortable imputing using the FCS method, and then post-processing the two variables to have a max or min of 0 or 100, as those are what could be actually measured?

 

SteveDenham

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1233 views
  • 0 likes
  • 4 in conversation