BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
marcel
Obsidian | Level 7

I need help interpreting categorical factors interactions in the logistic regression. I have 5 categorical factors interacting with other 5 categorical factors. The examples I found so far deal mostly with categorical factors coded as 0 and 1.

 

i choose the parametrization method "reference". My problem is how to interpret the interactions with respect to the reference.

 

I am attaching two files that further explain my question.

 

I hope you can help me

 

Regards,

 

1 ACCEPTED SOLUTION

Accepted Solutions
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

You can save just about anything that a procedure creates using ODS OUTPUT. See

 

https://blogs.sas.com/content/iml/2017/01/09/ods-output-any-statistic.html

 

and

 

https://stats.idre.ucla.edu/sas/faq/how-can-i-use-the-sas-output-delivery-system/

 

 

For example, you can save the lsmeans to a dataset

 

ods output lsmeans= my_lsmeans;
lsmeans bact*sick / plots=meanplot(join cl sliceby=bact) ilink cl;

 

and then export that dataset to CSV or Excel.

 

I do not recommend using a Type I error adjustment on all pairwise comparisons of interaction means. As you note, when you have 25 means, there are 300 possible pairwise comparisons, many of which you are not interested in because they compare something like B1S1 to B3S4; slices make comparisons for one factor within a given level of another factor so you don't get those "cross-level" comparisons. At most you may be interested in 100 of the 300, but still, if you invoke Type I error adjustment for 100 comparisons, most, if not all, of the p-values will be close to 1. Granted, you won't make any Type I errors, but you won't have any power (i.e., you'll potentially make a lot of Type II errors).

 

I don't find pairwise comparisons to be very helpful in deciphering interactions because an interaction is a comparison of (pairwise) comparisons. Instead, I estimate a carefully chosen set of contrasts that are meaningful in the context of the study. I might group those contrasts into sensible "families" and control the family-wise Type I error separately for each family. This task, including the multiplicity adjustment!, can be accomplished using the LSMESTIMATE statement; see

 

https://support.sas.com/resources/papers/proceedings11/351-2011.pdf

 

The Bonferroni adjustment is excessively conservative (i.e., you lose too much power); I generally use the simulate method and I always specify the random seed for reproducible results. If you need to control Type I error for multiple tests, the goal is to do so while maintaining as much power as possible; not all methods are equally good (and some are quite lousy; some don't even control Type I error appropriately).

 

I may be able to look at your results within a few days.

 

View solution in original post

15 REPLIES 15
marcel
Obsidian | Level 7

Here is the code

 

ods graphics on/imagemap ;
proc logistic data=data1
class Bact (param=ref ref="A") Sick (param=ref ref="V") ;
model Event/Trials = Sick Bact Sick*Bact /expb scale=none link=logit lackfit firth clodds=pl;
ods output cloddspl=firth;
id idor;
run;

ods graphics off;

 

 

Thank you for your help

PaigeMiller
Diamond | Level 26

For categorical variables, the words in your attached .rtf file don't quite fit.

 

For categorical variables, the estimate column is the coefficient in the logistic regression model for each interaction cell. So when Sick is B and Bact is W, the coefficient in the logistic regression model that is given by the interaction term to predict the values in this cell is 10.2869.

--
Paige Miller
marcel
Obsidian | Level 7

Dear PaigeMiller,

 

You are suggesting that the basic idea on interpreting categorical interactions expressed in the rtf file is not correct. If that is the case, then I need help to correctly interpret those interactions.

 

To restate my question:

I can't figure out how to interpret the coefficients for the interactions. Do you have any suggestion on a decent and easy to read (for the non-expert) literature on the topic? 

 

Thank you.

 

Marcel

PaigeMiller
Diamond | Level 26

I restate my answer

 

For categorical variables, the estimate column is the coefficient in the logistic regression model for each interaction cell. So when Sick is B and Bact is W, the coefficient in the logistic regression model that is given by the interaction term to predict the values in this cell is 10.2869.

Adding... (because you gave us more information when you posted your code)

 

Since the reference levels have a zero main effect and a zero interaction, the interaction coefficient of 10.2869 is interpreted as 10.2869 above the reference level interaction coefficient at zero for A and V.

 

But, I think what you really really really really really really really really really really really really really really really really really really want is the LSMEANS for the interactions and not the model coefficients. The LSMEANS are easily interpreted, and as I said, probably really really really really really really really really really really really really really really really really really really what you want.

--
Paige Miller
marcel
Obsidian | Level 7

PageMiller,

 

Thank you for your answer. The excel file also shows the SAS code (abbreviated version), the Chi^2 test, and the class level information produced by sas. It also shows a summary of the success of infections from actual data from my experiment.

 

I will check the LSMEANS for categorical variables. This seems to be getting more and more and more and more complicated. I hope I am not choosing to take the red pill. I am really a novice at the logistic.

 

Thank you,

 

marcel

PaigeMiller
Diamond | Level 26

@marcel wrote:

PageMiller,

 

Thank you for your answer. The excel file also shows the code (abbreviated version) and the Chi^2 test, and the class level information produce by sas. It also shows the a summary of the success of infections in actual data from the experiment I conducted.




Some of us refuse to open Microsoft Office documents because they can be security risks. Better to paste the code directly into your message after  clicking on {i}. Or use .txt or .pdf files.

 

 

I will check the LSMEANS for categorical variables. This seems to be more and more and more and more complicated. I hope I am not choosing to take the red pill. I am really a novice at the logistic.

 

It's not more complicated. It's using the right tool. It would be more complicated if you never use LSMEANS and try to interpret the model coefficients.

--
Paige Miller
marcel
Obsidian | Level 7

Following PaigeMiller's advice, I looked into the use of lsmeans to explore the interactions in the logistic regression. I am attaching two .rtf files showing my interpretation of the results and some doubts I have about it. The messages would be too long to show them in a normal message window.

Please, would any of you kindly take a look at my files and provide me with your expert advice?

Regards,

marcel

PaigeMiller
Diamond | Level 26

I don't see any LSMEANS sick*bact; statement in your code.

--
Paige Miller
marcel
Obsidian | Level 7

I followed this note to explore the use of lsmeans.

 

I used slice to explore the interactions, instead of lsmeans. They produce practically the same output. But, lsmeans produces 300 pairwise comparisons and the adj P values are mostly 1s. Slice produces smaller sets of pairwise comparisons. However, I see the advantages of using one or the other.

 

I think this is better visualized the .rtf file attached, because I highlighted the comparisons. Please, see the attached file.

 

Ex.

lsmeans Bact*Sick / diff oddsratio cl adjust=bon;

Bact	Sick	_Bact	_Sick	Estimate	Standard Error	z Value	Pr > |z|	Adj P	Alpha	Lower	Upper	Adj Lower	Adj Upper	Odds Ratio	Lower Confidence Limit for Odds Ratio	Upper Confidence Limit for Odds Ratio	Adj Lower	Adj Upper
																	Odds Ratio	Odds Ratio
E	Z	E	W	-2.9256	1.4712	-1.99	0.0467	1	0.05	-5.8091	-0.04219	-8.4644	2.6131	0.054	0.003	0.959	<0.001	13.641
E	Z	E	Y	-1.1096	1.6488	-0.67	0.501	1	0.05	-4.3412	2.122	-7.317	5.0978	0.33	0.013	8.348	<0.001	163.664
E	Z	E	X	-3.4508	1.4564	-2.37	0.0178	1	0.05	-6.3053	-0.5963	-8.9339	2.0323	0.032	0.002	0.551	<0.001	7.632
E	Z	E	V	2.00E-15	2.0165	0	1	1	0.05	-3.9523	3.9523	-7.5918	7.5918	1	0.019	52.054	<0.001	>999.999
E	W	E	Y	1.816	0.9036	2.01	0.0445	1	0.05	0.04492	3.5872	-1.586	5.2181	6.147	1.046	36.131	0.205	184.587
E	W	E	X	-0.5251	0.4681	-1.12	0.262	1	0.05	-1.4427	0.3924	-2.2876	1.2373	0.591	0.236	1.481	0.102	3.446
E	W	E	V	2.9256	1.4712	1.99	0.0467	1	0.05	0.04219	5.8091	-2.6131	8.4644	18.646	1.043	333.317	0.073	>999.999
E	Y	E	X	-2.3412	0.8794	-2.66	0.0078	1	0.05	-4.0647	-0.6176	-5.6519	0.9695	0.096	0.017	0.539	0.004	2.637
E	Y	E	V	1.1096	1.6488	0.67	0.501	1	0.05	-2.122	4.3412	-5.0978	7.317	3.033	0.12	76.798	0.006	>999.999
E	X	E	V	3.4508	1.4564	2.37	0.0178	1	0.05	0.5963	6.3053	-2.0323	8.9339	31.525	1.815	547.45	0.131	>999.999

slice Bact*Sick / sliceby=Bact diff oddsratio cl adjust=bon;

 
Slice	Sick	_Sick	Estimate	Standard Error	z Value	Pr > |z|	Adj P	Alpha	Lower	Upper	Adj Lower	Adj Upper	Odds Ratio	Lower Confidence Limit for Odds Ratio	Upper Confidence Limit for Odds Ratio	Adj Lower	Adj Upper
																Odds Ratio	Odds Ratio
Bact E	Z	W	-2.9256	1.4712	-1.99	0.0467	0.4674	0.05	-5.8091	-0.04219	-7.0553	1.204	0.054	0.003	0.959	<0.001	3.333
Bact E	Z	Y	-1.1096	1.6488	-0.67	0.501	1	0.05	-4.3412	2.122	-5.7378	3.5186	0.33	0.013	8.348	0.003	33.738
Bact E	Z	X	-3.4508	1.4564	-2.37	0.0178	0.1782	0.05	-6.3053	-0.5963	-7.5389	0.6374	0.032	0.002	0.551	<0.001	1.891
Bact E	Z	V	6.66E-16	2.0165	0	1	1	0.05	-3.9523	3.9523	-5.6604	5.6604	1	0.019	52.054	0.003	287.264
Bact E	W	Y	1.816	0.9036	2.01	0.0445	0.4447	0.05	0.04492	3.5872	-0.7205	4.3526	6.147	1.046	36.131	0.486	77.681
Bact E	W	X	-0.5251	0.4681	-1.12	0.262	1	0.05	-1.4427	0.3924	-1.8392	0.7889	0.591	0.236	1.481	0.159	2.201
Bact E	W	V	2.9256	1.4712	1.99	0.0467	0.4674	0.05	0.04219	5.8091	-1.204	7.0553	18.646	1.043	333.317	0.3	>999.999
Bact E	Y	X	-2.3412	0.8794	-2.66	0.0078	0.0776	0.05	-4.0647	-0.6176	-4.8096	0.1273	0.096	0.017	0.539	0.008	1.136
Bact E	Y	V	1.1096	1.6488	0.67	0.501	1	0.05	-2.122	4.3412	-3.5186	5.7378	3.033	0.12	76.798	0.03	310.387
Bact E	X	V	3.4508	1.4564	2.37	0.0178	0.1782	0.05	0.5963	6.3053	-0.6374	7.5389	31.525	1.815	547.45	0.529	>999.999


Thank you for your help;

 

marcel

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

I agree with @PaigeMiller that the place to start is with lsmeans and an interaction plot. Try

 

lsmeans bact*sick / plots=meanplot(join cl sliceby=bact);
marcel
Obsidian | Level 7

sld,

 

That is a beautiful graph. That was one of my headaches, how to plot the logistic interactions in sas. Thank you very much.

For me, is ts hard to customize sas graphs. I will use the sas output to customize a graph in excel.

 

As per lsmeans vs slice, I see that lsmeans produces pairwise comparisons that slice does not (at least by running the sas codes I used for my examples). The thing is that my code for lsmeans produces 300 pairwise comparisons, and the bon adjustment of the P values reflects that, with most of the adj P values being 1. The adj P values are lower when using slice than when using lsmeans. If there would be a way to make lsmeans compare sets of 10 pairs, it would be helpful.

 

I am also interested to know if my interpretation of lsmeans for the main effects and slice for the interactions is correct. Could you please take a look at the rtf files attached to my previous reply? If you have the time, of course.

 

Regards,

 

marcel

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

You can save just about anything that a procedure creates using ODS OUTPUT. See

 

https://blogs.sas.com/content/iml/2017/01/09/ods-output-any-statistic.html

 

and

 

https://stats.idre.ucla.edu/sas/faq/how-can-i-use-the-sas-output-delivery-system/

 

 

For example, you can save the lsmeans to a dataset

 

ods output lsmeans= my_lsmeans;
lsmeans bact*sick / plots=meanplot(join cl sliceby=bact) ilink cl;

 

and then export that dataset to CSV or Excel.

 

I do not recommend using a Type I error adjustment on all pairwise comparisons of interaction means. As you note, when you have 25 means, there are 300 possible pairwise comparisons, many of which you are not interested in because they compare something like B1S1 to B3S4; slices make comparisons for one factor within a given level of another factor so you don't get those "cross-level" comparisons. At most you may be interested in 100 of the 300, but still, if you invoke Type I error adjustment for 100 comparisons, most, if not all, of the p-values will be close to 1. Granted, you won't make any Type I errors, but you won't have any power (i.e., you'll potentially make a lot of Type II errors).

 

I don't find pairwise comparisons to be very helpful in deciphering interactions because an interaction is a comparison of (pairwise) comparisons. Instead, I estimate a carefully chosen set of contrasts that are meaningful in the context of the study. I might group those contrasts into sensible "families" and control the family-wise Type I error separately for each family. This task, including the multiplicity adjustment!, can be accomplished using the LSMESTIMATE statement; see

 

https://support.sas.com/resources/papers/proceedings11/351-2011.pdf

 

The Bonferroni adjustment is excessively conservative (i.e., you lose too much power); I generally use the simulate method and I always specify the random seed for reproducible results. If you need to control Type I error for multiple tests, the goal is to do so while maintaining as much power as possible; not all methods are equally good (and some are quite lousy; some don't even control Type I error appropriately).

 

I may be able to look at your results within a few days.

 

marcel
Obsidian | Level 7

sld,

 

Very insightful observations and thank you for sharing your experience with me. I appreciate it very much.

I was worried about the Bonferroni adjustment for 300 comparisons. Your advice with respect to this issue is timely.

 

I will run a few LSMESTIMATE contrasts with my data. I will check the pdf you suggest me to read.

 

Regards,

 

marcel

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 15 replies
  • 6612 views
  • 3 likes
  • 4 in conversation