Hi SAS Community,
I am modelling claim frequencies with 20 variables using Proc Genmod and a Poisson distribution. Below is my code:
Proc genmod data=sasuser.data1; | ||||
Class | ||||
var1 (param=ref) | ||||
var2 (param=ref) |
.
.
var20 | (param=ref) | |||
; | ||||
Model |
claim_freq = var1 var2 ..var20
/ dist=poisson | scale=pearson | |||
link=log | ||||
type1 | ||||
type3 | ||||
offset=_exp | ; |
Run;
My question is whether I should set the scale=Pearson or scale=Deviance or leave out the scale specification altogether?
What criteria should I use when determining the best course of action? Should I perhaps choose the scaling that provides the largest amount of influential variables? Or should I use the Likelihood Ratio Analyses Type1 and Type2 to find the best scale?
Thanks for your help!
-Louis
The SCALE= option allows you to add a dispersion parameter to the Poisson distribution. This allows you to handle data that shows more variability than the Poisson distribution allows. Note that the mean and variance of the Poisson distribution are the same, so if the variance exceeds the mean then the data are overdispersed. Evidence of overdispersion is if the "Value/DF" for Pearson or deviance in the "Criteria For Assessing Goodness Of Fit" table is much larger than 1. One way of dealing with overdispersed data is by adding the SCALE=P (or D) option. Another way is by using the GEE method by adding the REPEATED statement. See this note for more details.
Note that you don't have to repeat PARAM=REF for every variable in your CLASS statement. This is easier:
class var1 var2 ... / param=ref;
I moved your question to the Statistical Procedures community. Even if you are using SAS University Edition, the experts in here can probably help you best.
Not sure . I would choose scale=Deviance
The SCALE= option allows you to add a dispersion parameter to the Poisson distribution. This allows you to handle data that shows more variability than the Poisson distribution allows. Note that the mean and variance of the Poisson distribution are the same, so if the variance exceeds the mean then the data are overdispersed. Evidence of overdispersion is if the "Value/DF" for Pearson or deviance in the "Criteria For Assessing Goodness Of Fit" table is much larger than 1. One way of dealing with overdispersed data is by adding the SCALE=P (or D) option. Another way is by using the GEE method by adding the REPEATED statement. See this note for more details.
Note that you don't have to repeat PARAM=REF for every variable in your CLASS statement. This is easier:
class var1 var2 ... / param=ref;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.