BookmarkSubscribeRSS Feed
Iamhumerus77
Calcite | Level 5

Hello SAS Community,

I have a class project that analyzes factors associated with a binary outcome within a CBT intervention program. The dataset includes a categorical exposure variable (cbtmod) with three levels (1,2,3), along with other predictors.

The study aims to answer two main research questions:

  1. What risk factors are associated with the binary outcome among participants, and how do these vary across different CBT modalities, especially with respect to sex differences?
  2. How do perceptions regarding the safety of the CBT intervention differ by modality, what factors contribute to these differences, and how do these vary by sex?

Mainly, I'm interested in the interactions between cbtmod and sex (coded as 0 for males and 1 for females), as well as cbtmod's interactions with all other variables in the model, as advised by my supervisor.

I've structured the PROC LOGISTIC for the fully adjusted model to check for interaction as follows:

proc logistic data=mydata;
    class cbtmod(ref='1') sex(ref='0') / param=ref;
    model outcome(event='1') = cbtmod sex hh_disabilityn pregnant_1_sex Marriage Resstatus cbtmod*sex / lackfit clodds=wald;
    contrast 'cbtmod=2 for males' cbtmod 2 cbtmod*sex 0 / est=exp;
    contrast 'cbtmod=2 for females' cbtmod 2 cbtmod*sex 1 / est=exp;
    contrast 'cbtmod=3 for males' cbtmod 3 cbtmod*sex 0 /est=exp;
 contrast 'cbtmod=3 for females' cbtmod 3 cbtmod*sex 1 / est=exp;
run;

Questions:

  1. Interaction Testing: Is this setup correct for testing the interaction between cbtmod and sex? How should I structure my CONTRAST statements to best explore these interaction effects across different CBT modalities by sex?

  2. Fully Adjusted Model Reporting: In the context of the fully adjusted model that includes interaction terms, what are the essential values or estimates to report for a comprehensive interpretation? Should  I report the aOR for the interaction term or just the exposure and should the interaction term be included if I am reporting the aor for the full model?

  3. Handling Zero Cell Counts: For my second research question, I encounter a scenario where, for males (sex=0), with the outcome being 0 and cbtmod=3, there is a cell count of 0. How should this be addressed in the logistic regression analysis? Should I just make a note of it in the limitation section of my report?

  4. Checking Interactions with Other Variables: Given the directive to check for interactions between cbtmod and all other variables, what would be an efficient strategy to approach this without complicating the model unnecessarily?

I value any insights or guidance on refining my approach, particularly regarding CONTRAST statements for interaction testing, strategies for addressing zero cell counts, and managing multiple interactions in logistic regression.

Thank you for your assistance!

1 REPLY 1
dpalmer1
Fluorite | Level 6
Here's code for an example dataset.
proc logistic data = sashelp.heart;
	class Status Smoking_Status(ref = 'Non-smoker') Sex(ref = 'Male') / param = glm;
	model Status(event = 'Dead') = Smoking_Status | Sex Cholesterol Weight / expb;
	estimate 'Female; Heavy (16-25)' intercept 1 Smoking_Status 1 Sex 1 Smoking_Status*Sex 1 Cholesterol 227.4174412 Weight 153.0866808 / exp ilink e cl;
	estimate 'Male; Heavy (16-25)' intercept 1 Smoking_Status 1 Sex 0 1 Smoking_Status*Sex 0 1 Cholesterol 227.4174412 Weight 153.0866808 / exp ilink e cl;
	estimate 'Female; Light (1-5)' intercept 1 Smoking_Status 0 1 Sex 1 Smoking_Status*Sex 0 0 1 Cholesterol 227.4174412 Weight 153.0866808 / exp ilink e cl;
	estimate 'Male; Light (1-5)' intercept 1 Smoking_Status 0 1 Sex 0 1 Smoking_Status*Sex 0 0 0 1 Cholesterol 227.4174412 Weight 153.0866808 / exp ilink e cl;
	estimate 'Female; Moderate (6-15)' intercept 1 Smoking_Status 0 0 1 Sex 1 Smoking_Status*Sex 0 0 0 0 1 Cholesterol 227.4174412 Weight 153.0866808 / exp ilink e cl;
	estimate 'Male; Moderate (6-15)' intercept 1 Smoking_Status 0 0 1 Sex 0 1 Smoking_Status*Sex 0 0 0 0 0 1 Cholesterol 227.4174412 Weight 153.0866808 / exp ilink e cl;
	estimate 'Female; Very Heavy (> 25)' intercept 1 Smoking_Status 0 0 0 1 Sex 1 Smoking_Status*Sex 0 0 0 0 0 0 1 Cholesterol 227.4174412 Weight 153.0866808 / exp ilink e cl;
	estimate 'Male; Very Heavy (> 25)' intercept 1 Smoking_Status 0 0 0 1 Sex 0 1 Smoking_Status*Sex 0 0 0 0 0 0 0 1 Cholesterol 227.4174412 Weight 153.0866808 / exp ilink e cl;
	estimate 'Female; Non-smoker' intercept 1 Smoking_Status 0 0 0 0 1 Sex 1 Smoking_Status*Sex 0 0 0 0 0 0 0 0 1 Cholesterol 227.4174412 Weight 153.0866808 / exp ilink e cl;
	estimate 'Male; Non-smoker' intercept 1 Smoking_Status 0 0 0 0 1 Sex 0 1 Smoking_Status*Sex 0 0 0 0 0 0 0 0 0 1 Cholesterol 227.4174412 Weight 153.0866808 / exp ilink e cl;
	lsmeans Smoking_Status * Sex / at means ilink pdiff oddsratio cl;
run;
To write contrasts/estimates, I like to use the "e" option and print out the least squares means to make sure I'm writing them correctly.  Categorical variables will have all levels used with GLM parameterization.  In simpler cases, 1s are used for the levels of interest and 0s otherwise for 
estimates.  For continuous variables, we can make predictions at the mean values of these variables.
proc means data = sashelp.heart;
	var Cholesterol Weight;
run;
This is done with the "at means" option in the lsmeans statement.
 
I don't think that your current code includes any odd ratios.  You can get odds ratios from the logistic regression coefficients with the "expb" option in the model statement.  The above estimate statements give you log-odds estimates and predicted probabilities (ilink option).  See the calculations below for 'Female; Heavy (16-25)'.
data _null_;
	logOdds = -0.7022;
	odds = exp(logOdds);
	prob = exp(logOdds) / (exp(logOdds) + 1);
	put logOdds = odds = prob = ;
run;
Exponentiating the log-odds estimates will give you the odds estimates.  Adding the "pdiff" and "oddsratio" options to the lsmeans statement will give differences in log-odds and odds ratios.  See the calculations below for 'Female; Heavy (16-25)' vs 'Male; Heavy (16-25)'.
data _null_;
	logOddsDiff = -0.7022 - (-0.1660);
	oddsRatio = exp(logOddsDiff);
	put logOddsDiff = oddsRatio = ;
run;
This modified code gives the difference in log-odds and odds ratio for 'Female; Heavy (16-25) vs Male; Heavy (16-25)' which is just subtracting the individual estimates.
proc logistic data = sashelp.heart;
	class Status Smoking_Status(ref = 'Non-smoker') Sex(ref = 'Male') / param = glm;
	model Status(event = 'Dead') = Smoking_Status | Sex Cholesterol Weight / expb;
	estimate 'Female; Heavy (16-25) vs Male; Heavy (16-25)' Sex 1 -1 Smoking_Status*Sex 1 -1 / exp ilink e cl;
	lsmeans Smoking_Status * Sex / at means ilink pdiff oddsratio cl;
run;
Hopefully this helps you to apply it to your own dataset.
 
For #2, I would report the odds ratios corresponding to the interaction since it seems to be what you are interested in.  For #3, look up quasi-complete separation.  For #4, include the interactions justified by the theory and literature.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 905 views
  • 0 likes
  • 2 in conversation