BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
pvareschi
Quartz | Level 8

Re: Predictive Modeling Using Logistic Regression

Is it correct to say that, for a logistic regression model, the log-likelihood function always has a maximum value and that maximum value is unique (see page 2-8 of course notes)?
In other words, unless a complete separation problem occurs, the iterative algorithm(s) always find a solution and there is no risk of converging on a local optimum rather than a global optimum.

More generally, does the same property apply to any Generalized Linear Model (assuming a distribution from the exponential family)?

1 ACCEPTED SOLUTION

Accepted Solutions
gcjfernandez
SAS Employee

Because this is a complex statistical problem and the answers to your question is not straight forward, I spent some times reviewing the statistical theory, got feed back from my colleagues at SAS education in answering your question.

 

Below I will give some summary answer and also provide you with links for extended descriptions of this topic.

I hope this will clear your doubts.

 

Existence of Maximum Likelihood Estimates in Logistic regression (Proc Logistic):

https://go.documentation.sas.com/?docsetId=statug&docsetTarget=statug_logistic_details10.htm&docsetV...

The likelihood equation for a logistic regression model does have a finite solution in the absence of complete or quasi-complete separation of data. However, keep in mind that the existence of a finite solution does not mean that an algorithm will find the exact solution on a finite precision digital computer.

Exceptions:

The likelihood equation for a logistic regression model does not always have a finite solution. Sometimes there is a nonunique maximum on the boundary of the parameter space, at infinity. The existence, finiteness, and uniqueness of maximum likelihood estimates for the logistic regression model depend on the patterns of data points in the observation space.

If neither complete nor quasi-complete separation exists in the sample points, there is an overlap of sample points. In this configuration, the maximum likelihood estimates exist and are unique.

To address the separation issue, you can change your model, specify the FIRTH option to use Firth’s penalized likelihood method, or for small data sets specify an EXACT statement to perform an exact logistic regression.

 

Maximum Likelihood Fitting in generalized linear model (in the absence of overdispersion and for uncorrelated data): https://go.documentation.sas.com/?docsetId=statug&docsetTarget=statug_genmod_overview07.htm&docsetVe...

 

The GENMOD procedure uses a ridge-stabilized Newton-Raphson algorithm to maximize the log-likelihood function with respect to the regression parameters. By default, the procedure also produces maximum likelihood estimates of the scale parameter as defined in the section Response Probability Distributions for the normal, inverse Gaussian, negative binomial, and gamma distributions.

Exception: The function obtained by dividing a log-likelihood function for the binomial or Poisson distribution by a dispersion parameter is not a legitimate log-likelihood function. It is an example of a quasi-likelihood function. Most of the asymptotic theory for log likelihoods also applies to quasi-likelihoods, which justifies computing standard errors and likelihood ratio statistics by using quasi-likelihoods instead of proper log likelihoods. For details on quasi-likelihood functions, see McCullagh and Nelder (1989, Chapter 9), McCullagh (1983); Hardin and Hilbe (2003).

 

View solution in original post

4 REPLIES 4
gcjfernandez
SAS Employee

Re: Predictive Modeling Using Logistic Regression

Is it correct to say that, for a logistic regression model, the log-likelihood function always has a maximum value and that maximum value is unique (see page 2-8 of course notes)?
In other words, unless a complete separation problem occurs, the iterative algorithm(s) always find a solution and there is no risk of converging on a local optimum rather than a global optimum.

More generally, does the same property apply to any Generalized Linear Model (assuming a distribution from the exponential family)?

My response:

Always check the SAS log window or Log messages to check for any unexpected error messages related to Maximum Likelihood based computation errors due to poor data quality before finalizing your model  

pvareschi
Quartz | Level 8

Yes, of course it is always good practice to check the log for error messages.

What I meant with my question was more about the theoretical properties of the log-likelihood function, i.e. whether for logistic regression (and Generalised Linear Models) is always a concave function, therefore displaying a single global maximum point.

gcjfernandez
SAS Employee

Because this is a complex statistical problem and the answers to your question is not straight forward, I spent some times reviewing the statistical theory, got feed back from my colleagues at SAS education in answering your question.

 

Below I will give some summary answer and also provide you with links for extended descriptions of this topic.

I hope this will clear your doubts.

 

Existence of Maximum Likelihood Estimates in Logistic regression (Proc Logistic):

https://go.documentation.sas.com/?docsetId=statug&docsetTarget=statug_logistic_details10.htm&docsetV...

The likelihood equation for a logistic regression model does have a finite solution in the absence of complete or quasi-complete separation of data. However, keep in mind that the existence of a finite solution does not mean that an algorithm will find the exact solution on a finite precision digital computer.

Exceptions:

The likelihood equation for a logistic regression model does not always have a finite solution. Sometimes there is a nonunique maximum on the boundary of the parameter space, at infinity. The existence, finiteness, and uniqueness of maximum likelihood estimates for the logistic regression model depend on the patterns of data points in the observation space.

If neither complete nor quasi-complete separation exists in the sample points, there is an overlap of sample points. In this configuration, the maximum likelihood estimates exist and are unique.

To address the separation issue, you can change your model, specify the FIRTH option to use Firth’s penalized likelihood method, or for small data sets specify an EXACT statement to perform an exact logistic regression.

 

Maximum Likelihood Fitting in generalized linear model (in the absence of overdispersion and for uncorrelated data): https://go.documentation.sas.com/?docsetId=statug&docsetTarget=statug_genmod_overview07.htm&docsetVe...

 

The GENMOD procedure uses a ridge-stabilized Newton-Raphson algorithm to maximize the log-likelihood function with respect to the regression parameters. By default, the procedure also produces maximum likelihood estimates of the scale parameter as defined in the section Response Probability Distributions for the normal, inverse Gaussian, negative binomial, and gamma distributions.

Exception: The function obtained by dividing a log-likelihood function for the binomial or Poisson distribution by a dispersion parameter is not a legitimate log-likelihood function. It is an example of a quasi-likelihood function. Most of the asymptotic theory for log likelihoods also applies to quasi-likelihoods, which justifies computing standard errors and likelihood ratio statistics by using quasi-likelihoods instead of proper log likelihoods. For details on quasi-likelihood functions, see McCullagh and Nelder (1989, Chapter 9), McCullagh (1983); Hardin and Hilbe (2003).

 

pvareschi
Quartz | Level 8

Wow!...I must take my hat off for the effort and time spent on this...thank you...amazing help!