BookmarkSubscribeRSS Feed
stodo53
Calcite | Level 5

To give some context, I am using multiple imputation (for the first time) to impute some missing variables in my dataset (some are categorical and some are continuous). I am working with complex survey data and have chosen the FCS method in SAS as I am working with arbitrary missing data. Basically, my problem begins when I try to incorporate an interaction term in the PROC MI imputation step. The interaction term I would like to include is a 4 level categorical variable with a continuous variable. The categorical variable (t1_interruption) does NOT need to be imputed (it has no missing data), however the continuous variable (oasis) does require imputation. As you can see in the code below, I originally just imputed the continuous variable I would be using in my interaction term in the PROC MI step, and then in the PROC SURVEYREG step, I just added an interaction term in the model which would then use the imputed oasis variable. This worked fine. However, I have also read that it is important to include all terms (including interaction terms) in the PROC MI step, along with the regression step. So, my question is, how do I include a categorical*continuous interaction term in the PROC MI step? And if I am only imputing the continuous variable, is it necessary to worry about including this interaction term in the PROC MI step? Or is it okay the way I have it right now? 

 

Thanks! 

 

 

*Proc MI imputation;
proc mi data=combined seed=1180431796 nimpute=10 out=combined_imp_fcs;
 class t1_30grp t1_province t1_10_5cat t1_conchealth t1_poverty_2cat I2 t1_interruption;
 fcs nbiter=10 logistic(t1_conchealth/details) logistic(t1_30grp) discrim(t1_poverty_2cat/classeffects=include) regression(oasis) regression(cesd10_resc);
 var covid_weights t1_province t1_10_5cat I2 t1_interruption mos_ss t1_cesd t1_oasis t1_conchealth t1_30grp oasis 
 cesd10_resc t1_poverty_2cat;
run;

*Proc surveyreg;
proc surveyreg data=combined_imp_fcs;     
weight covid_weights;
by _imputation_ ; 
domain include; 
class t1_30grp (ref='0') t1_province (ref='1') t1_10_5cat (ref='1') t1_conchealth (ref='0') t1_poverty_2cat (ref='0') I2 (ref='1') t1_interruption (ref='4');  
model t1_oasis= t1_province t1_10_5cat I2 t1_interruption mos_ss t1_conchealth t1_30grp oasis cesd10_resc t1_poverty_2cat t1_interruption*oasis / solution;   
ods output parameterestimates = regparms;  
run;
2 REPLIES 2
SAS_Rob
SAS Employee

To include interactions terms in the model, you would need to explicitly define the model.  It only includes the main effects by default.  Below is a modified version of an example from the documentation that includes the interaction of a categorical and continuous variable in the FCS regression model.

 

data Fish3;
title 'Fish Measurement Data';
input Species $ Length Width @@;
datalines;
Roach 16.2 2.2680 Roach 20.3 2.8217 Roach 21.2 .
Roach . 3.1746 Roach 22.2 3.5742 Roach 22.8 3.3516
Roach 23.1 3.3957 . 23.7 . Roach 24.7 3.7544
Roach 24.3 3.5478 Roach 25.3 . Roach 25.0 3.3250
Roach 25.0 3.8000 Roach 27.2 3.8352 Roach 26.7 3.6312
Roach 26.8 4.1272 Roach 27.9 3.9060 Roach 29.2 4.4968
Roach 30.6 4.7736 Roach 35.0 5.3550 Parkki 16.5 2.3265
Parkki 17.4 . Parkki 19.8 2.6730 Parkki 21.3 2.9181
Parkki 22.4 3.2928 Parkki 23.2 3.2944 Parkki 23.2 3.4104
Parkki 24.1 3.1571 . . 3.6636 Parkki 28.0 4.1440
Parkki 29.0 4.2340 Perch 8.8 1.4080 . 14.7 1.9992
Perch 16.0 2.4320 Perch 17.2 2.6316 Perch 18.5 2.9415
Perch 19.2 3.3216 . 19.4 3.1234 Perch 20.2 .
Perch 20.8 3.0368 Perch 21.0 2.7720 Perch 22.5 3.5550
Perch 22.5 3.3075 Perch 22.5 3.6675 Perch . 3.5340
Perch 23.5 3.4075 Perch 23.5 3.5250 Perch 23.5 3.5250
. 23.5 3.5250 Perch 23.5 3.9950 Perch 24.0 3.6240
Perch 24.0 3.6240 Perch 24.2 3.6300 Perch 24.5 3.6260
Perch 25.0 3.7250 Perch . 3.7230 Perch 25.5 3.8250
Perch . 4.1658 Perch 26.5 3.6835 . 27.0 4.2390
Perch . 4.1440 Perch 28.7 5.1373 . 28.9 4.3350
Perch 28.9 4.3350 Perch 28.9 4.5662 Perch 29.4 4.2042
Perch 30.1 4.6354 Perch 31.6 4.7716 Perch 34.0 6.0180
Perch 36.5 6.3875 Perch 37.3 7.7957 Perch 39.0 .
Perch 38.3 6.7408 Perch . 6.2646 . 39.3 .
Perch 41.4 7.4934 Perch 41.4 6.0030 Perch 41.3 7.3514
Perch 42.3 7.1064 Perch 42.5 7.2250 Perch 42.4 7.4624
Perch 42.5 6.6300 Perch 44.6 6.8684 Perch 45.2 7.2772
Perch 45.5 7.4165 Perch 46.0 8.1420 . 46.6 7.5958
;

proc mi data=Fish3 seed=1305417 nimpute=15 out=outex7;
class Species;
fcs nbiter=10 discrim(Species/details) reg(Width=length species length*species/details);
var Species Length Width;
run;

 

 

stodo53
Calcite | Level 5

Hi Rob,

 

Thanks for your reply! So, I have updated my code (see below) however, I just had one follow-up clarification question if you don't mind! So, I tried to include the two interaction terms for each variable that needs to be imputed along with all other variables in the "var" statement (unless of course the variable being imputed was in the interaction term - e.g.,  fcs reg (oasis=oasis*t1_interruption) would be excluded and the weights variable was also excluded). After doing this, I received an error message for the one "fcs discrim" function which was: "ERROR: The cross effects cannot be used as covariates in an FCS discriminant method." Therefore, the interaction terms were excluded from the "fcs discrim" function. So, I am just wondering for the "fcs discrim" function, is it necessary to add the "=" sign and list out all the "var" statement variables I want t1_poverty2cat to be imputed from? Or could I just have it as "fcs discrim(t1_poverty_2cat/classeffects=include)" 

 

Thanks!

*Proc MI imputation;
proc mi data=combined seed=1180431796 nimpute=20 out=combined_imp_fcs;
 class t1_30grp t1_province t1_10_5cat t1_conchealth t1_poverty_2cat I2 t1_interruption;
 fcs logistic(t1_conchealth= t1_province t1_10_5cat I2 t1_interruption mos_ss t1_cesd t1_oasis t1_30grp oasis 
 cesd10_resc t1_poverty_2cat cesd10_resc*t1_interruption oasis*t1_interruption/details); 
 fcs logistic(t1_30grp= t1_province t1_10_5cat I2 t1_interruption mos_ss t1_cesd t1_oasis t1_conchealth oasis 
 cesd10_resc t1_poverty_2cat cesd10_resc*t1_interruption oasis*t1_interruption); 
 fcs discrim(t1_poverty_2cat=covid_weights t1_province t1_10_5cat I2 t1_interruption mos_ss t1_cesd t1_oasis t1_conchealth t1_30grp oasis 
 cesd10_resc/classeffects=include); 
 fcs regression(oasis=t1_province t1_10_5cat I2 t1_interruption mos_ss t1_cesd t1_oasis t1_conchealth t1_30grp 
 cesd10_resc t1_poverty_2cat cesd10_resc*t1_interruption); 
 fcs regression(cesd10_resc=t1_province t1_10_5cat I2 t1_interruption mos_ss t1_cesd t1_oasis t1_conchealth t1_30grp oasis 
 t1_poverty_2cat oasis*t1_interruption);
 var covid_weights t1_province t1_10_5cat I2 t1_interruption mos_ss t1_cesd t1_oasis t1_conchealth t1_30grp oasis 
 cesd10_resc t1_poverty_2cat;
run;

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1047 views
  • 1 like
  • 2 in conversation