BookmarkSubscribeRSS Feed
c02584381
Calcite | Level 5

Hi! I'm really stuck... And after searching for an answer for many days now, I'm going to post my first ever question on the SAS boards. I'm going to give a lot of background before I ask my questions. Here we go... 

 

Firstly, I'm using SAS Version 9.4.

 

My logistic regression model consists of a binary outcome (ever vs. never), a binary main predictor ("X1"; also ever vs. never), several binary covariates, one ordered categorical covariate (age), and one non-ordered categorical covariate (race).  (I should note that I originally used a model with age as a continuous predictor, but it did not pass the scale-check as a continuous variable, so I categorized it and everything checked out for it to be categorical.)

 

I began a model building process with a non-imputed dataset and followed Hosmer & Lemeshow's "purposeful selection" model building process from Applied Logistic Regression, which specifies to use a Likelihood Ratio Test - not Wald p-values -- to determine significance of interaction terms. During this process I identified that there was a significant interaction with X1 (X1*X5, with non-missing data), which I included in the final model. The final model passed the H&L Goodness of Fit test.

 

There are 4 variables with missing data in my data set, three that have 1-2% missing data each, but one of the binary predictors --X3 -- has 12% missing data. Moreover, a literature review in this field reveals that the outcome is quite often stratified by X3 due to a significant interaction. X3 was not, however, identified as an interaction term in my non-imputed model building process. I have a hunch that the reason it was not identified as an interaction is due to the fact that it is missing so many data points... So I decided to use multiple imputation for the first time! Yay for the unknown! I have read A LOT on the topic, but am still a beginner at it.

 

Many have noted that it is important to include interaction terms in a multiple imputation model, so I included the known interaction (X1*binarycovariate with non-missing data) and the suspected interaction (X1*X3 with missing data) in the "PROC MI" statement. The "FCS LOGISTIC" option in "PROC MI" would not allow me to "impute" an interaction term in the "X1*X3" format (like interactions are specified in "PROC LOGISTIC"). So, I created a new variable "X1intX3" by multiplying X1 by X3, taking care to leave missing data where X3 is missing. Syntax below:

 

title1 'imputation phase';

proc mi data = final_mi nimpute = 12 out = mi_in seed = 548476;
class outcome;
class X1;
class X2;
class X3;
class race;
class X4;
class age;
class X5;
class X6;
class X1intX3;
var /*imputed variables*/ X3 race X4 age
/*imputed interaction*/ X1intX3
/*non-imputed variables*/ Outcome X1 X2 X5 X6;
fcs logistic (X3 race X4 age X1intX3 = Outcome X1 X2 X5 X6 X1*X5 /  link = logit order = internal); run
 
After imputing the data, I performed a logistic regression. Syntax below:
 
title1 'Analysis Phase'
proc logistic data = work.mi_in;
by _imputation_;
class outcome (ref = '0') / param = ref;
class X1 (ref = '0') / param = ref;
class X2 (ref = '0') / param = ref;
class X3 (ref = '0') / param = ref;
class race (ref = '5') / param = ref;
class X4 (ref = '0') / param = ref;
class age (ref = '1') / param = ref;
class X5 (ref = '0') / param = ref;
class X6 (ref = '0') / param = ref;
class X1intX3 (ref = '0') / param = ref;
model outcome (event = '1') = X1 X2 X3 race X4 age X5 X6 X1*X5 X1intX3 / cl lackfit
ods output parameterestimates = lgsparms oddsratios = lgsodds;
oddsratio BinaryExposureCategory / diff = ref; run;
proc print data = work.lgsparms; run;
proc print data = work.lgsodds; run;
 
Note: In my logistic regression, I used "X1intX3" as the interaction between X1 and X3 because I had to create that variable in order to get PROC MI to impute that interaction term. 
 
My first question -- If I was performing the logistic regression with non-imputed data, I could easily get odds ratios stratified by my two interaction terms using the following syntax in PROG LOGISTIC: "oddsratio BinaryExposureCategory / diff = ref;". However, because the PROC LOGISTIC procedure doesn't recognize X1intX3 as an interaction between main effects of the model, it does not stratify odds based upon thisX1intX3 (only on X1*X5). How can I stratify on X1intX3 as well?? I feel like I am missing something totally obvious!
 
After performing the logistic regression, I then pooled using PROC MIANALYZE. Syntax below:
 
title1 'Pooling Phase';
proc mianalyze parms (classvar = classval link = logit) = work.lgsparms; 
class X1 X2 X3 race X4 age X5 X6; 
modeleffects intercept X1 X2 X3 race X4 age X5 X6 X1*X5 X1intX3
ods output parameterestimates = mian_lgsparms; run;
 
My second question: How do I get pooled odds after the "PROC MIANALYZE" procedure that are also stratified on the two interaction terms? PROC MIANALYZE only gives pooled beta coefficients, and with interaction terms, I cannot simply exponentiate the betas like I could in a model without interaction terms. Furthermore, I found a SAS Global Forum paper that does have a way to get pooled odds, but it does not discuss interaction terms. Again, I feel like I am missing something obvious!
 
My Third question: How do I now go about assessing whether the interaction between X1 and X3 is significant, as found in other peer-reviewed literature? The non-imputed model building process uses LRT to assess this, but there are no pooled -2 Log L's or Wald Chi Square statistics to compare after PROC MIANALYZE. 
 
Many thanks to those of you who have read all of this! I will be eternally grateful for a response 🙂 
 
 
Last Note: When I submitted this comment first I got an error that said "Your post has been changed because invalid HTML was found in the message body. The invalid HTML has been removed. Please review the message and submit the message when you are satisfied."... I have no idea what was removed, so I hope that doesn't change anything!
1 REPLY 1
ChuksManuel
Pyrite | Level 9

Hello,

 

Did you finally get an answer to your question 2?

Did you finally stratify your interaction terms?

I'm having the same problem trying to stratify my OR by level of the third variable.

Please if you found a way could you please share the syntax?

 

Sincerely,

Manuel

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1307 views
  • 0 likes
  • 2 in conversation