Statistical Procedures

Programming the statistical procedures from SAS
BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
palolix
Pyrite | Level 9

Dear SAS Community,

 

I am running a proc logistic model to compare the rotting percentage between avocado varieties. Since the outcome variable can only take two values (0 or 100%) I am analyzing it as a binomial var. I am using the firth option because otherwise I will get this warning: There is possibly a quasi-complete separation of data points. The maximum likelihood
estimate may not exist.

 

title2 'PercStemEndRot: Comparing varieties within Weeks for each Harvest across Season';
proc logistic data=one desc;
class Harvest Variety Wks/param=glm;
model PercStemEndRot=Harvest*Variety*Wks/firth;
slice Harvest*Variety*Wks/sliceby=Harvest*Wks  adjust=simulate(seed=1);
run;

 

When using the firth option I got this warning: Ridging has failed to improve the loglikelihood. You may want to use a different ridging technique (RIDGING= option), or switch to using linesearch to reduce the step size
(RIDGING=NONE), or specify a new set of initial estimates (INEST= option).

 

Is there anything I could do to bypass this issue other than eliminating dep variables or interactions?

 

Thank you very much!

 

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

First, your model specification is not producing the separate main effects and 2-way interactions. It only contains the 3-way interaction. However, all the degrees of freedom of the main effects and 2-way interactions are included in the 3-way interaction using that specification. So, it is effectively equivalent to writing 

    model PercStemEndRot=Harvest|Variety|Wks

which is the shorthand way to specify all of the main effects and interactions.

 

But the bottom line is that you'll need to simplify the model because the model complexity using all the effects (either explicitly or implicitly specified as noted above) is making the data too sparse. I suggest that you start with only the main effects model to see if it is successful:

    model PercStemEndRot=Harvest Variety Wks

and then add interactions one at a time as long as the fit succeeds - initially without FIRTH and then adding it if needed.

View solution in original post

7 REPLIES 7
StatDave
SAS Super FREQ

First, your model specification is not producing the separate main effects and 2-way interactions. It only contains the 3-way interaction. However, all the degrees of freedom of the main effects and 2-way interactions are included in the 3-way interaction using that specification. So, it is effectively equivalent to writing 

    model PercStemEndRot=Harvest|Variety|Wks

which is the shorthand way to specify all of the main effects and interactions.

 

But the bottom line is that you'll need to simplify the model because the model complexity using all the effects (either explicitly or implicitly specified as noted above) is making the data too sparse. I suggest that you start with only the main effects model to see if it is successful:

    model PercStemEndRot=Harvest Variety Wks

and then add interactions one at a time as long as the fit succeeds - initially without FIRTH and then adding it if needed.

SteveDenham
Jade | Level 19

What does running a proc freq of this sort tell you:

 

proc freq data=
tables harvest*wks*variety*PercStemEndRot/cmh;run;

The CMH option should give you a test for association of variety with PercStemEndRot, after adjusting for harvest and wks. In addition, it should let you know where the zeroes are in your data. Consolidating categories is probably the best way to handle this. 

 

SteveDenham

(I can't believe I am not offering some sort of exact approach to a generalized linear model, but I think this has two advantages - you will know where the zeroes are, and I believe you will still get some useful inferential information).

 

StatDave
SAS Super FREQ
Consolidating categories is a good point... that is another way in which the model becomes more complex and makes the data sparse. So, if any of your variables has a lot of categories, then merging (or eliminating) categories can reduce the sparseness and may ultimately allow the model with all interactions to be fit.
palolix
Pyrite | Level 9

Thank you very much StatDave, I will do that.

palolix
Pyrite | Level 9

Thank you Steve! So if the general association is significant if means that there is an effect of Variety in the PercStemEndRot?

 

Cochran-Mantel-Haenszel Statistics (Based on Table Scores)
Statistic Alternative Hypothesis DF Value Prob
1 Nonzero Correlation 1 27.1483 <.0001
2 Row Mean Scores Differ 10 253.3270 <.0001
3 General Association 10 253.3270 <.0001

 

 

 
StatDave
SAS Super FREQ
Yes, it means that Variety and PercStemEndRot are significantly related adjusting for harvest and wks which are used as stratifiers.
palolix
Pyrite | Level 9

Thanks StatDave

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 675 views
  • 8 likes
  • 3 in conversation