I am trying to compare two independent samples that were treated using two protocols (one that is more advanced than the other) to see if there was an impact of the advanced protocol on patients by lowering adverse effects. The outcome variable is a Y/N categorical variable. What is the correct way to perform multivariable analysis to assess if the protocol was independently associated with the adverse effect? How do you go about looking at confounders? Thank you in advance!
You can model the effect of Protocol and the confounders on your 0/1 response variable using Logistic Regression (PROC LOGISTIC)
Hi Paige, Thanks for your response!
Additionally, I am confused about how to keep adding variables in the model and deciding about confounders. Could you please shed some light on the approach to take for that as well? Much appreciate any response!
This happens to be a topic that comes up a lot, and causes many people many problems.
The general approach is
proc logistic data=mydata;
class protocol;
model outcome = protocol confounder1 confounder2 /* add as many as you would like */;
run;
but the problem arises when confounder1 is correlated with confounder2, and confounder2 is correlated with confounder3, and so on. Then you have the very hairy and difficult problem of selecting variables. There's a lot that has been written on this subject of selecting variables for modeling when they are correlated, and no universally agreed upon best method. I suggest you do some reading on the subject.
Thank you! This is very helpful.
I have a follow-up question to add to this thread. I have checked for associations between my outcome of interest with other variables in the data. Along with that I have also checked the associations between the primary predictor and other variables of interest. Once I gauged which variables I could use in the model, I used a stepwise method to add in variables and checked the model fit. However, my AIC does not seem to be affected no matter how many variables I add to the model. What I did see is that with the addition of some variables, the primary predictor of interest seems to lose significance. I am not sure how to proceed and which model to choose as the final model.
Any feedback will be appreciated!
@rajrao wrote:
I have a follow-up question to add to this thread. I have checked for associations between my outcome of interest with other variables in the data.
Does "associations" mean "correlations", or something else?
Along with that I have also checked the associations between the primary predictor and other variables of interest. Once I gauged which variables I could use in the model, I used a stepwise method to add in variables and checked the model fit. However, my AIC does not seem to be affected no matter how many variables I add to the model. What I did see is that with the addition of some variables, the primary predictor of interest seems to lose significance. I am not sure how to proceed and which model to choose as the final model.
A common problem with stepwise, and one of the many reasons most statisticians say terrible things about stepwise. Adding variables to a logistic regression equation via stepwise (or other) method causes the regression coefficients to change, and thus the statistical significance of the variables change.
Solution: don't do stepwise!!
Use some other method that isn't greatly affected by adding variables into the model, such as Partial Least Squares. SAS has PROC PLS, but that only works for continuous responses, it does not work for binary (logistic) responses. What to do? Complain to SAS that they need to develop a Logistic PLS. Or write your own code based on this article. Or use R which has logistic PLS package. Yes, I know none of these are really solutions, but that's all I got.
Thank you for the prompt response! Yes, I meant testing for preliminary associations using chi-sq / t-tests/ Wilcoxon-rank sum/univariate logistic regression.
Your solution of using the R logistic PLS package is very helpful. I will try it out. Thank you very much!
Hi, two more questions: Does it matter that my intercept was not significant even though the effects of the primary predictor and a few other variables in the model were significant? Having a wide confidence interval for the primary predictor is not a good sign, right? eg. 3.8 (1.07-13.39)
Thank you!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.