BookmarkSubscribeRSS Feed
bobeng
Obsidian | Level 7

I have been having trouble trying to understand multiple logistic regression with an imputed data set using the FCS method.

 

I have explained everything in the document attached including the questions that I have

12 REPLIES 12
SAS_Rob
SAS Employee

It might be helpful to read the section in the MI documentation related to the imputer's model versus the analyst's model.  The fact that you do not include the response variable in the imputation model for either of the variables may be biasing your results.

SAS Help Center: Imputer’s Model Versus Analyst’s Model

 

The WARNING message you are receiving can have a number of possible causes.  The first is that there is some problem with the imputation model itself usually by leaving off an important variable (like the response variable).  In general, though, it occurs when the missing data has no influence on the sampling error of a parameter estimate. There is no fix or adjustment for that, but it does require some further investigation on your part.  Usually, it is an indication of a problem with the imputation model with possible causes ranging from a poor imputation model to no real need to impute due to a very small fraction of missing information. 

  

Without fixing the problem, when the between imputation variance is zero then the number of Degrees of Freedom is undefined so you cannot get a confidence interval or p-values.  If you are unable to determine the cause then one suggestion would be that, if there are only minimum differences among these within-imputation variances, then look at the results for that variable for one of the imputations.  

bobeng
Obsidian | Level 7

Hi,

 

I need more clarification on the response variable.

 

My exposure of interest is the "race/eth2" variable and my outcome is "pref." 

 

Any help would be appreciated.

 

BA

SAS_Rob
SAS Employee

By response variable I was referring to the outcome variable pref.  Most statisticians using multiple imputation would argue that it should be included in any imputation models in order to avoid biasing your results.

bobeng
Obsidian | Level 7

Thank you this makes sense.

bobeng
Obsidian | Level 7

Looking back at my code I see that this variable was included in my imputation.

 

proc mi data=study_ds_MI seed= 347 nimpute=40 out=fcsoutput;
class pref Race_eth2 /*GestWeeks2*/ agegroup PrePriorARVStatusc2 TrimesterFst2 Mat_payor2 ROS_NYC2 tot_supp2 first_sup STI hep;; *d_year taken out because its continuous;
var Pref Race_eth2 d_year /*GestWeeks2*/ agegroup PrePriorARVStatusc2 ROS_NYC2 tot_supp2 first_sup sup_preg STI hep TrimesterFst2 Mat_payor2 ;
fcs logistic ( Mat_payor2= d_year PrePriorARVStatusc2 ROS_NYC2 first_sup sup_preg);*variables MAR with missing Mat_Payor;
fcs logistic (TrimesterFst2= d_year PrePriorARVStatusc2 ROS_NYC2 first_sup sup_preg STI);*variables MAR with missing TrimesterFst;
fcs regpmm (d_year) plots=trace; *continuous variables are listed here;
run;*127000;

SAS_Rob
SAS Employee

If you look at the models for the two FCS LOGISTIC models, then you will see that PREF is not included as a predictor.

fcs logistic ( Mat_payor2= d_year PrePriorARVStatusc2 ROS_NYC2 first_sup sup_preg);
fcs logistic (TrimesterFst2= d_year PrePriorARVStatusc2 ROS_NYC2 first_sup sup_preg STI);

 

It does look like because you put it on the VAR statement and did not explicitly define the model for d_year that it was used in that model only.

bobeng
Obsidian | Level 7

if I now add pref this is what it should look like?

 

proc mi data=study_ds_MI seed= 347 nimpute=40 out=fcsoutput;
class pref Race_eth2 /*GestWeeks2*/ agegroup PrePriorARVStatusc2 TrimesterFst2 Mat_payor2 ROS_NYC2 tot_supp2 first_sup STI hep;; *d_year taken out because its continuous;
var Pref Race_eth2 d_year /*GestWeeks2*/ agegroup PrePriorARVStatusc2 ROS_NYC2 tot_supp2 first_sup sup_preg STI hep TrimesterFst2 Mat_payor2 ;
fcs logistic ( Mat_payor2= pref d_year PrePriorARVStatusc2 ROS_NYC2 first_sup sup_preg);*variables MAR with missing Mat_Payor;
fcs logistic (TrimesterFst2= pref d_year PrePriorARVStatusc2 ROS_NYC2 first_sup sup_preg STI);*variables MAR with missing TrimesterFst;
fcs regpmm (d_year) plots=trace; *continuous variables are listed here;
run;*127000;

 

Also, it was my understanding that we only include variables that are associated with the missing variable in the fcs logistic statement. Please correct me if I am wrong.

 

Thank  you for  your continued help.

 

BA

SAS_Rob
SAS Employee

Yes, that is what I was suggesting.

 

Schafer 1997 p.143 has an extended explanation, but the gist is that the imputation model should also include variables that are potentially related to the imputed variable and variables that are potentially related to the missingness of the imputed variable to produce the best imputation distributions.

bobeng
Obsidian | Level 7

I created a missing variable (1= missing 0=not missing) for the two variables I imputed. I found that pref was only associated with missing trimester established. Should I still include this variable in the FCS logistic statement for both?

 

data study_ds_MI; set study_ds;
if Mat_payor2=. then n_miss_Mat_payor=1; else n_miss_Mat_payor=0;
if TrimesterFst2=. then n_miss_TrimesterFst=1; else n_miss_TrimesterFst=0;
run;*3175;

 

proc logistic data = study_ds_MI descending;
model n_miss_Mat_payor = pref;
run;

bobeng_1-1704896800164.png.

 

proc logistic data = study_ds_MI descending;
model n_miss_TrimesterFst = pref;
run;

 

bobeng_0-1704896745156.png

 

 

BA

 

SAS_Rob
SAS Employee

To be clear, aren't you assuming that there is some relationship between PREF and both of the variables when you created your analysis model?  In other words when you fit this model:

model pref (event='1')=race_eth2 d_year GestWeeks2 agegroup PrePriorARVStatusc2 TrimesterFst2 Mat_payor2 ROS_NYC2  tot_supp2 first_sup STI hep;

you are assuming that there is some sort of relationship between PREF and each of the independent variables.  Your model is an attempt to quantify and in some sense qualify that relationship (predictive, linear in the logit, etc.).  It would be a contradiction of sorts to start out assuming they do not have any relationship (in the imputation model) and then test to see if there is a relationship (in the analysis model).  

bobeng
Obsidian | Level 7

Thank you this makes sense.

 

BA

bobeng
Obsidian | Level 7

I'm sorry but I have one more question. If I am doing backward logistic regression I was told that I could not remove both of the variables with imputed values from the model. When I did this I kept getting a variance of zero. I guess my question is with backward approach logistic regression do I have to keep both of the variables with imputed values in the model, even if the P value is the least significant?

 

BA

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 1079 views
  • 2 likes
  • 2 in conversation