BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DDK
Obsidian | Level 7 DDK
Obsidian | Level 7

Hello,

 

Is it possible to assess which distribution fits better using a likelihood ratio? For example, if you want to assess if the model better fits under a negative binomial distribution than a poisson distribution, can you use the log likelihoods under the 'fit statistics' section of the output to perform a test such as explained under http://support.sas.com/kb/24/474.html   

 

In the example of the internet site it shows the difference in df between the 2 different models where in 1 model some variables are removed and thus results in a difference of df. But what should you specify when you want to compare the fit of 2 distributions? The df then stay almost the same (in several things I tried there is a difference of 0.1 or less).

 

Thanks in advance for the help.

1 ACCEPTED SOLUTION

Accepted Solutions
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

I didn't notice in your original post, but it looks like you are using GLIMMIX.  The df used in the PearsonChiSq/df calculation does not involve the scale parameter. The 0.1 or so difference you noticed in the df calculation is just rounding. The proc (also GENMOD) uses the same df for Poisson and NB. The LR test to compare distributions has to be done by hand (or in a data step using ODS output), using df=1. Use -2LL from two runs of the procedure.

View solution in original post

8 REPLIES 8
Rick_SAS
SAS Super FREQ

This does not directly answer your question, but you might find it helpful to read the documentation for teh SEVERITY procedure in SAS/ETS software. The SEVERITY procedure fits multiple models to data and provides statistics that you can assess to determine which model you want to use. It provides several likelihood statistics (-2LL, AIC, AICC, BIC) as well as ECDF statistics. It also provides graphical diagnostic plots to accompany the statistics.

 

 

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

You can conduct a LR test based on log-likelihoods if the two distributions are nested (i.e., if one is a special case of the other). For instance, the Poisson is a special case of the negative binomial (as 1/k =0, negative binomial = Poisson). In this example, the negative binomial has one more parameter than the Poisson (many sources use k as the overdispersion parameter of the negative binomial, but sas uses scale = 1/k in several procedures). The df for the LR is 1 because of the difference of parameters. LR is -2 times the difference in log-likelihoods. Under the null hypothesis (H0: distribution is the simpler one), the test statistic nominally has a chi-squared distribution. Caution: when the scale parameter is on boundary in order to get the simpler distribution, then the the test statistic may have a more complex distribution than a simple chi-squared (with 1 df). For instance, scale parameter ranges from 0 to infinity, and scale=0 gives you the simpler distribution. Thus, the more complex test statistic distribution. Many ignore this issue.  

Be careful with different procedures. Some programs may not give the the actual log-likelihood. For instance, many log-likelihoods can be written as sum of terms, where some terms invovle parameters and data, and some terms involve only the data (not the parameters). To be computationally efficient, the term not involving parameters may not be calculated or displayed. This is fine when one is comparing log-likelihoods all for the same distribution (with the same procedure), but could cause trouble if you are comparing distributions. 

 

Be careful with different procdures. If you use GLIMMIX (say, with different choices of distributions), make sure you are not using one of the conditional log-likelihood methods (rspl, mspl, ...). You need to be using the actual log-likelihood (method=quad). 

DDK
Obsidian | Level 7 DDK
Obsidian | Level 7

Just a quick question. I understand that there is an extra parameter with the negative binomial and therefore should be df=1. Why is this not reflected in the Pearson Chi-square/DF statistic in the 'fit statistics for conditional distribution' section. The section mentions Pearson Chi-square and the result of the Pearson Chi-square/DF so I should be able to calculate the df. Or is this referring to a different df?

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

I didn't notice in your original post, but it looks like you are using GLIMMIX.  The df used in the PearsonChiSq/df calculation does not involve the scale parameter. The 0.1 or so difference you noticed in the df calculation is just rounding. The proc (also GENMOD) uses the same df for Poisson and NB. The LR test to compare distributions has to be done by hand (or in a data step using ODS output), using df=1. Use -2LL from two runs of the procedure.

DDK
Obsidian | Level 7 DDK
Obsidian | Level 7

Ah, thanks, that clarifies it. What if you have repeated measurements (r side variance). Sas documentation states that this is not supported for method=quad. Is there a way around that in glimmix?

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

You have to use G-side covariance structure for the repeated measure with-normal distributions, when you use quadrature of Laplace estimation methods. The book by Walt Stroup on GLMMs is excellent on this topic (with lots of SAS code available on-line).

DDK
Obsidian | Level 7 DDK
Obsidian | Level 7

Thanks for all the help. It is clarified now. Will look at your book suggestion.

SteveDenham
Jade | Level 19

The idea of testing for a better fit for a distribution is intriguing, but sounds like a lot of work when comparison of information criteria ought to do the trick on its own.  So long as the data, model and any random statements are the same, and the same link is used (and appropriate) for both distributions, AIC provides an excellent choice for distribution selection, in my experience. 

 

Steve Denham

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 3365 views
  • 5 likes
  • 4 in conversation