Dear SAS Community,
I am running a proc logistic model to compare the rotting percentage between avocado varieties. Since the outcome variable can only take two values (0 or 100%) I am analyzing it as a binomial var. I am using the firth option because otherwise I will get this warning: There is possibly a quasi-complete separation of data points. The maximum likelihood
estimate may not exist.
title2 'PercStemEndRot: Comparing varieties within Weeks for each Harvest across Season';
proc logistic data=one desc;
class Harvest Variety Wks/param=glm;
model PercStemEndRot=Harvest*Variety*Wks/firth;
slice Harvest*Variety*Wks/sliceby=Harvest*Wks adjust=simulate(seed=1);
run;
When using the firth option I got this warning: Ridging has failed to improve the loglikelihood. You may want to use a different ridging technique (RIDGING= option), or switch to using linesearch to reduce the step size
(RIDGING=NONE), or specify a new set of initial estimates (INEST= option).
Is there anything I could do to bypass this issue other than eliminating dep variables or interactions?
Thank you very much!
First, your model specification is not producing the separate main effects and 2-way interactions. It only contains the 3-way interaction. However, all the degrees of freedom of the main effects and 2-way interactions are included in the 3-way interaction using that specification. So, it is effectively equivalent to writing
model PercStemEndRot=Harvest|Variety|Wks
which is the shorthand way to specify all of the main effects and interactions.
But the bottom line is that you'll need to simplify the model because the model complexity using all the effects (either explicitly or implicitly specified as noted above) is making the data too sparse. I suggest that you start with only the main effects model to see if it is successful:
model PercStemEndRot=Harvest Variety Wks
and then add interactions one at a time as long as the fit succeeds - initially without FIRTH and then adding it if needed.
First, your model specification is not producing the separate main effects and 2-way interactions. It only contains the 3-way interaction. However, all the degrees of freedom of the main effects and 2-way interactions are included in the 3-way interaction using that specification. So, it is effectively equivalent to writing
model PercStemEndRot=Harvest|Variety|Wks
which is the shorthand way to specify all of the main effects and interactions.
But the bottom line is that you'll need to simplify the model because the model complexity using all the effects (either explicitly or implicitly specified as noted above) is making the data too sparse. I suggest that you start with only the main effects model to see if it is successful:
model PercStemEndRot=Harvest Variety Wks
and then add interactions one at a time as long as the fit succeeds - initially without FIRTH and then adding it if needed.
What does running a proc freq of this sort tell you:
proc freq data=
tables harvest*wks*variety*PercStemEndRot/cmh;
run;
The CMH option should give you a test for association of variety with PercStemEndRot, after adjusting for harvest and wks. In addition, it should let you know where the zeroes are in your data. Consolidating categories is probably the best way to handle this.
SteveDenham
(I can't believe I am not offering some sort of exact approach to a generalized linear model, but I think this has two advantages - you will know where the zeroes are, and I believe you will still get some useful inferential information).
Thank you very much StatDave, I will do that.
Thank you Steve! So if the general association is significant if means that there is an effect of Variety in the PercStemEndRot?
Cochran-Mantel-Haenszel Statistics (Based on Table Scores) | ||||
---|---|---|---|---|
Statistic | Alternative Hypothesis | DF | Value | Prob |
1 | Nonzero Correlation | 1 | 27.1483 | <.0001 |
2 | Row Mean Scores Differ | 10 | 253.3270 | <.0001 |
3 | General Association | 10 | 253.3270 | <.0001 |
Thanks StatDave
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.