Statistical Procedures

palolix

Dear SAS Community,

I am running a proc logistic model to compare the rotting percentage between avocado varieties. Since the outcome variable can only take two values (0 or 100%) I am analyzing it as a binomial var. I am using the firth option because otherwise I will get this warning: There is possibly a quasi-complete separation of data points. The maximum likelihood
estimate may not exist.

title2 'PercStemEndRot: Comparing varieties within Weeks for each Harvest across Season';
proc logistic data=one desc;
class Harvest Variety Wks/param=glm;
model PercStemEndRot=Harvest*Variety*Wks/firth;
slice Harvest*Variety*Wks/sliceby=Harvest*Wks adjust=simulate(seed=1);
run;

When using the firth option I got this warning: Ridging has failed to improve the loglikelihood. You may want to use a different ridging technique (RIDGING= option), or switch to using linesearch to reduce the step size
(RIDGING=NONE), or specify a new set of initial estimates (INEST= option).

Is there anything I could do to bypass this issue other than eliminating dep variables or interactions?

Thank you very much!

StatDave

First, your model specification is not producing the separate main effects and 2-way interactions. It only contains the 3-way interaction. However, all the degrees of freedom of the main effects and 2-way interactions are included in the 3-way interaction using that specification. So, it is effectively equivalent to writing

model PercStemEndRot=Harvest|Variety|Wks

which is the shorthand way to specify all of the main effects and interactions.

But the bottom line is that you'll need to simplify the model because the model complexity using all the effects (either explicitly or implicitly specified as noted above) is making the data too sparse. I suggest that you start with only the main effects model to see if it is successful:

model PercStemEndRot=Harvest Variety Wks

and then add interactions one at a time as long as the fit succeeds - initially without FIRTH and then adding it if needed.

View solution in original post

StatDave

First, your model specification is not producing the separate main effects and 2-way interactions. It only contains the 3-way interaction. However, all the degrees of freedom of the main effects and 2-way interactions are included in the 3-way interaction using that specification. So, it is effectively equivalent to writing

model PercStemEndRot=Harvest|Variety|Wks

which is the shorthand way to specify all of the main effects and interactions.

But the bottom line is that you'll need to simplify the model because the model complexity using all the effects (either explicitly or implicitly specified as noted above) is making the data too sparse. I suggest that you start with only the main effects model to see if it is successful:

model PercStemEndRot=Harvest Variety Wks

and then add interactions one at a time as long as the fit succeeds - initially without FIRTH and then adding it if needed.

SteveDenham

What does running a proc freq of this sort tell you:

proc freq data=
tables harvest*wks*variety*PercStemEndRot/cmh;run;

The CMH option should give you a test for association of variety with PercStemEndRot, after adjusting for harvest and wks. In addition, it should let you know where the zeroes are in your data. Consolidating categories is probably the best way to handle this.

SteveDenham

(I can't believe I am not offering some sort of exact approach to a generalized linear model, but I think this has two advantages - you will know where the zeroes are, and I believe you will still get some useful inferential information).

StatDave

Consolidating categories is a good point... that is another way in which the model becomes more complex and makes the data sparse. So, if any of your variables has a lot of categories, then merging (or eliminating) categories can reduce the sparseness and may ultimately allow the model with all interactions to be fit.

palolix

Thank you very much StatDave, I will do that.

palolix

Thank you Steve! So if the general association is significant if means that there is an effect of Variety in the PercStemEndRot?

Cochran-Mantel-Haenszel Statistics (Based on Table Scores)
Statistic	Alternative Hypothesis	DF	Value	Prob
1	Nonzero Correlation	1	27.1483	<.0001
2	Row Mean Scores Differ	10	253.3270	<.0001
3	General Association	10	253.3270	<.0001

StatDave

Yes, it means that Variety and PercStemEndRot are significantly related adjusting for harvest and wks which are used as stratifiers.

palolix

Thanks StatDave

Statistical Procedures

WARNING: Ridging has failed to improve the loglikelihood.

Re: WARNING: Ridging has failed to improve the loglikelihood.

Re: WARNING: Ridging has failed to improve the loglikelihood.

Re: WARNING: Ridging has failed to improve the loglikelihood.

Re: WARNING: Ridging has failed to improve the loglikelihood.

Re: WARNING: Ridging has failed to improve the loglikelihood.

Re: WARNING: Ridging has failed to improve the loglikelihood.

Re: WARNING: Ridging has failed to improve the loglikelihood.

Re: WARNING: Ridging has failed to improve the loglikelihood.

Follow Us

What is...

Statistical Procedures

Our biggest data and AI event of the year.

Follow Us

What is...