The Poisson regression should have a response variable of a natural number (e.g., 0, 1, 2, ..., n). Hypothesis: If we include a continuous independent variable in the Poisson regression model, the result probably would be that there is only one observation that has the exact value of that independent variable (because the value of that independent is so precision, e.g., two or more numbers after the decimal). This causes a problem because the response variable, given my hypothesis, is impossible to be any natural numbers except 0 and 1. This results in a binomial distribution instead of a Poisson distribution.
Above is my idea. I hope to hear your ideas. You're welcome.
Thanks.
@TomHsiung wrote:
This causes a problem because the response variable, given my hypothesis, is impossible to be any natural numbers except 0 and 1.
I do not get this statement / claim. Can you explain further?
Koen
Hello, sir or madam
Thank you for your reply and I think a scenario would help to express more clearly of my idea.
The maximum likelihood of Poisson regression requires the calculation of the joint likelihood function. For Poisson regression, observations are categorized by their characteristics into several groups and each has a likelihood function expressed via the probability mass function of Poisson distribution.
Saying, we have a model of log(λ) = a +bX and X is the independent variable. Assume we have 100 observations and we designed X for the representation of age groups in the unit of decade. Therefore, the observations are categorized by X into several groups (a discrete variable), and each group could have sample λ of integrals other than 0 and 1, which fits the Poisson distribution.
However, if we use designed X for the representation of age in the unit of second (similar to a continuous variable), for these 100 observations, it is very likely they will be categorized into 100 groups. For each group, the λ could be only 0 or 1, which violates the assumption of Poisson distribution.
A second reply here - the dividing the bin size to something such that each bin is either a zero or a one is the first part of proving that the binomial distribution converges to the Poisson as the bin size increases (number of trials per bin is greater than one. So what you do here is sum the binary likelihood functions within a bin, then optimize over all bins and get a value that is equivalent to the Poisson. See this: https://en.wikipedia.org/wiki/Poisson_binomial_distribution for a very good discussion.
SteveDenham
The continuous distribution analog of the Poisson is the gamma distribution. If you are concerned about not meeting Poisson assumptions (discrete response with mean equal to the variance), you should consider shifting your analysis to reflect a gamma distribution.
Stolen from an online source:
There is an interesting relationship between the gamma and Poisson distributions. If X is a gamma(α, β) random variable, where α is an integer, then for any x, P(X ≤ x) = P(Y ≥ α), where Y ∼ Poisson(x/β).
SteveDenham
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.