I need help interpreting categorical factors interactions in the logistic regression. I have 5 categorical factors interacting with other 5 categorical factors. The examples I found so far deal mostly with categorical factors coded as 0 and 1.
i choose the parametrization method "reference". My problem is how to interpret the interactions with respect to the reference.
I am attaching two files that further explain my question.
I hope you can help me
Regards,
You can save just about anything that a procedure creates using ODS OUTPUT. See
https://blogs.sas.com/content/iml/2017/01/09/ods-output-any-statistic.html
and
https://stats.idre.ucla.edu/sas/faq/how-can-i-use-the-sas-output-delivery-system/
For example, you can save the lsmeans to a dataset
ods output lsmeans= my_lsmeans;
lsmeans bact*sick / plots=meanplot(join cl sliceby=bact) ilink cl;
and then export that dataset to CSV or Excel.
I do not recommend using a Type I error adjustment on all pairwise comparisons of interaction means. As you note, when you have 25 means, there are 300 possible pairwise comparisons, many of which you are not interested in because they compare something like B1S1 to B3S4; slices make comparisons for one factor within a given level of another factor so you don't get those "cross-level" comparisons. At most you may be interested in 100 of the 300, but still, if you invoke Type I error adjustment for 100 comparisons, most, if not all, of the p-values will be close to 1. Granted, you won't make any Type I errors, but you won't have any power (i.e., you'll potentially make a lot of Type II errors).
I don't find pairwise comparisons to be very helpful in deciphering interactions because an interaction is a comparison of (pairwise) comparisons. Instead, I estimate a carefully chosen set of contrasts that are meaningful in the context of the study. I might group those contrasts into sensible "families" and control the family-wise Type I error separately for each family. This task, including the multiplicity adjustment!, can be accomplished using the LSMESTIMATE statement; see
https://support.sas.com/resources/papers/proceedings11/351-2011.pdf
The Bonferroni adjustment is excessively conservative (i.e., you lose too much power); I generally use the simulate method and I always specify the random seed for reproducible results. If you need to control Type I error for multiple tests, the goal is to do so while maintaining as much power as possible; not all methods are equally good (and some are quite lousy; some don't even control Type I error appropriately).
I may be able to look at your results within a few days.
can you show us your code?
Here is the code
ods graphics on/imagemap ;
proc logistic data=data1
class Bact (param=ref ref="A") Sick (param=ref ref="V") ;
model Event/Trials = Sick Bact Sick*Bact /expb scale=none link=logit lackfit firth clodds=pl;
ods output cloddspl=firth;
id idor;
run;
ods graphics off;
Thank you for your help
For categorical variables, the words in your attached .rtf file don't quite fit.
For categorical variables, the estimate column is the coefficient in the logistic regression model for each interaction cell. So when Sick is B and Bact is W, the coefficient in the logistic regression model that is given by the interaction term to predict the values in this cell is 10.2869.
Dear PaigeMiller,
You are suggesting that the basic idea on interpreting categorical interactions expressed in the rtf file is not correct. If that is the case, then I need help to correctly interpret those interactions.
To restate my question:
I can't figure out how to interpret the coefficients for the interactions. Do you have any suggestion on a decent and easy to read (for the non-expert) literature on the topic?
Thank you.
Marcel
I restate my answer
For categorical variables, the estimate column is the coefficient in the logistic regression model for each interaction cell. So when Sick is B and Bact is W, the coefficient in the logistic regression model that is given by the interaction term to predict the values in this cell is 10.2869.
Adding... (because you gave us more information when you posted your code)
Since the reference levels have a zero main effect and a zero interaction, the interaction coefficient of 10.2869 is interpreted as 10.2869 above the reference level interaction coefficient at zero for A and V.
But, I think what you really really really really really really really really really really really really really really really really really really want is the LSMEANS for the interactions and not the model coefficients. The LSMEANS are easily interpreted, and as I said, probably really really really really really really really really really really really really really really really really really really what you want.
PageMiller,
Thank you for your answer. The excel file also shows the SAS code (abbreviated version), the Chi^2 test, and the class level information produced by sas. It also shows a summary of the success of infections from actual data from my experiment.
I will check the LSMEANS for categorical variables. This seems to be getting more and more and more and more complicated. I hope I am not choosing to take the red pill. I am really a novice at the logistic.
Thank you,
marcel
@marcel wrote:
PageMiller,
Thank you for your answer. The excel file also shows the code (abbreviated version) and the Chi^2 test, and the class level information produce by sas. It also shows the a summary of the success of infections in actual data from the experiment I conducted.
Some of us refuse to open Microsoft Office documents because they can be security risks. Better to paste the code directly into your message after clicking on {i}. Or use .txt or .pdf files.
I will check the LSMEANS for categorical variables. This seems to be more and more and more and more complicated. I hope I am not choosing to take the red pill. I am really a novice at the logistic.
It's not more complicated. It's using the right tool. It would be more complicated if you never use LSMEANS and try to interpret the model coefficients.
Following PaigeMiller's advice, I looked into the use of lsmeans to explore the interactions in the logistic regression. I am attaching two .rtf files showing my interpretation of the results and some doubts I have about it. The messages would be too long to show them in a normal message window.
Please, would any of you kindly take a look at my files and provide me with your expert advice?
Regards,
marcel
I don't see any LSMEANS sick*bact; statement in your code.
I followed this note to explore the use of lsmeans.
I used slice to explore the interactions, instead of lsmeans. They produce practically the same output. But, lsmeans produces 300 pairwise comparisons and the adj P values are mostly 1s. Slice produces smaller sets of pairwise comparisons. However, I see the advantages of using one or the other.
I think this is better visualized the .rtf file attached, because I highlighted the comparisons. Please, see the attached file.
Ex.
lsmeans Bact*Sick / diff oddsratio cl adjust=bon;
Bact Sick _Bact _Sick Estimate Standard Error z Value Pr > |z| Adj P Alpha Lower Upper Adj Lower Adj Upper Odds Ratio Lower Confidence Limit for Odds Ratio Upper Confidence Limit for Odds Ratio Adj Lower Adj Upper Odds Ratio Odds Ratio E Z E W -2.9256 1.4712 -1.99 0.0467 1 0.05 -5.8091 -0.04219 -8.4644 2.6131 0.054 0.003 0.959 <0.001 13.641 E Z E Y -1.1096 1.6488 -0.67 0.501 1 0.05 -4.3412 2.122 -7.317 5.0978 0.33 0.013 8.348 <0.001 163.664 E Z E X -3.4508 1.4564 -2.37 0.0178 1 0.05 -6.3053 -0.5963 -8.9339 2.0323 0.032 0.002 0.551 <0.001 7.632 E Z E V 2.00E-15 2.0165 0 1 1 0.05 -3.9523 3.9523 -7.5918 7.5918 1 0.019 52.054 <0.001 >999.999 E W E Y 1.816 0.9036 2.01 0.0445 1 0.05 0.04492 3.5872 -1.586 5.2181 6.147 1.046 36.131 0.205 184.587 E W E X -0.5251 0.4681 -1.12 0.262 1 0.05 -1.4427 0.3924 -2.2876 1.2373 0.591 0.236 1.481 0.102 3.446 E W E V 2.9256 1.4712 1.99 0.0467 1 0.05 0.04219 5.8091 -2.6131 8.4644 18.646 1.043 333.317 0.073 >999.999 E Y E X -2.3412 0.8794 -2.66 0.0078 1 0.05 -4.0647 -0.6176 -5.6519 0.9695 0.096 0.017 0.539 0.004 2.637 E Y E V 1.1096 1.6488 0.67 0.501 1 0.05 -2.122 4.3412 -5.0978 7.317 3.033 0.12 76.798 0.006 >999.999 E X E V 3.4508 1.4564 2.37 0.0178 1 0.05 0.5963 6.3053 -2.0323 8.9339 31.525 1.815 547.45 0.131 >999.999
slice Bact*Sick / sliceby=Bact diff oddsratio cl adjust=bon;
Slice Sick _Sick Estimate Standard Error z Value Pr > |z| Adj P Alpha Lower Upper Adj Lower Adj Upper Odds Ratio Lower Confidence Limit for Odds Ratio Upper Confidence Limit for Odds Ratio Adj Lower Adj Upper Odds Ratio Odds Ratio Bact E Z W -2.9256 1.4712 -1.99 0.0467 0.4674 0.05 -5.8091 -0.04219 -7.0553 1.204 0.054 0.003 0.959 <0.001 3.333 Bact E Z Y -1.1096 1.6488 -0.67 0.501 1 0.05 -4.3412 2.122 -5.7378 3.5186 0.33 0.013 8.348 0.003 33.738 Bact E Z X -3.4508 1.4564 -2.37 0.0178 0.1782 0.05 -6.3053 -0.5963 -7.5389 0.6374 0.032 0.002 0.551 <0.001 1.891 Bact E Z V 6.66E-16 2.0165 0 1 1 0.05 -3.9523 3.9523 -5.6604 5.6604 1 0.019 52.054 0.003 287.264 Bact E W Y 1.816 0.9036 2.01 0.0445 0.4447 0.05 0.04492 3.5872 -0.7205 4.3526 6.147 1.046 36.131 0.486 77.681 Bact E W X -0.5251 0.4681 -1.12 0.262 1 0.05 -1.4427 0.3924 -1.8392 0.7889 0.591 0.236 1.481 0.159 2.201 Bact E W V 2.9256 1.4712 1.99 0.0467 0.4674 0.05 0.04219 5.8091 -1.204 7.0553 18.646 1.043 333.317 0.3 >999.999 Bact E Y X -2.3412 0.8794 -2.66 0.0078 0.0776 0.05 -4.0647 -0.6176 -4.8096 0.1273 0.096 0.017 0.539 0.008 1.136 Bact E Y V 1.1096 1.6488 0.67 0.501 1 0.05 -2.122 4.3412 -3.5186 5.7378 3.033 0.12 76.798 0.03 310.387 Bact E X V 3.4508 1.4564 2.37 0.0178 0.1782 0.05 0.5963 6.3053 -0.6374 7.5389 31.525 1.815 547.45 0.529 >999.999
Thank you for your help;
marcel
I agree with @PaigeMiller that the place to start is with lsmeans and an interaction plot. Try
lsmeans bact*sick / plots=meanplot(join cl sliceby=bact);
sld,
That is a beautiful graph. That was one of my headaches, how to plot the logistic interactions in sas. Thank you very much.
For me, is ts hard to customize sas graphs. I will use the sas output to customize a graph in excel.
As per lsmeans vs slice, I see that lsmeans produces pairwise comparisons that slice does not (at least by running the sas codes I used for my examples). The thing is that my code for lsmeans produces 300 pairwise comparisons, and the bon adjustment of the P values reflects that, with most of the adj P values being 1. The adj P values are lower when using slice than when using lsmeans. If there would be a way to make lsmeans compare sets of 10 pairs, it would be helpful.
I am also interested to know if my interpretation of lsmeans for the main effects and slice for the interactions is correct. Could you please take a look at the rtf files attached to my previous reply? If you have the time, of course.
Regards,
marcel
You can save just about anything that a procedure creates using ODS OUTPUT. See
https://blogs.sas.com/content/iml/2017/01/09/ods-output-any-statistic.html
and
https://stats.idre.ucla.edu/sas/faq/how-can-i-use-the-sas-output-delivery-system/
For example, you can save the lsmeans to a dataset
ods output lsmeans= my_lsmeans;
lsmeans bact*sick / plots=meanplot(join cl sliceby=bact) ilink cl;
and then export that dataset to CSV or Excel.
I do not recommend using a Type I error adjustment on all pairwise comparisons of interaction means. As you note, when you have 25 means, there are 300 possible pairwise comparisons, many of which you are not interested in because they compare something like B1S1 to B3S4; slices make comparisons for one factor within a given level of another factor so you don't get those "cross-level" comparisons. At most you may be interested in 100 of the 300, but still, if you invoke Type I error adjustment for 100 comparisons, most, if not all, of the p-values will be close to 1. Granted, you won't make any Type I errors, but you won't have any power (i.e., you'll potentially make a lot of Type II errors).
I don't find pairwise comparisons to be very helpful in deciphering interactions because an interaction is a comparison of (pairwise) comparisons. Instead, I estimate a carefully chosen set of contrasts that are meaningful in the context of the study. I might group those contrasts into sensible "families" and control the family-wise Type I error separately for each family. This task, including the multiplicity adjustment!, can be accomplished using the LSMESTIMATE statement; see
https://support.sas.com/resources/papers/proceedings11/351-2011.pdf
The Bonferroni adjustment is excessively conservative (i.e., you lose too much power); I generally use the simulate method and I always specify the random seed for reproducible results. If you need to control Type I error for multiple tests, the goal is to do so while maintaining as much power as possible; not all methods are equally good (and some are quite lousy; some don't even control Type I error appropriately).
I may be able to look at your results within a few days.
sld,
Very insightful observations and thank you for sharing your experience with me. I appreciate it very much.
I was worried about the Bonferroni adjustment for 300 comparisons. Your advice with respect to this issue is timely.
I will run a few LSMESTIMATE contrasts with my data. I will check the pdf you suggest me to read.
Regards,
marcel
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.