i am using motor insurance data , the claim severity is given to me in that data set. based on that i have to model severity. data base is attached with question , i have tried proc severity procedure using gamma distribution:
proc severity data=libish.simulatedwithlog crit=aicc ;
loss severity;
scalemodel logduration / dfmixture =full ;
dist gamma ;
run;
and also applied usual gamma model using genmod procedure .
proc genmod data=libish.dataset2;
class premiumclass age zone;
model severity=premiumclass age /dist=gamma link=log type3;
run;
and the it is giving WARNING: Some observations with invalid response values have been deleted. The response was less than or equal to zero for the Gamma or Inverse Gaussian distributions or less than zero for the Negative Binomial or Poisson distributions.
actually how to conclude gamma model is suitable for this data?
Rick has addressed GENMOD's warning. I will comment on why PROC SEVERITY doesn't throw a similar warning. PROC SEVERITY supports multiple distributions including your own distributions, so it allows 0 values for the "loss" (response) variable. It takes care of the 0 values in the distribution definition functions. In particular, for the gamma distribution, it uses the following defintion of the PDF function (you can see other functions of PROC SEVERITY's predefined gamma distribution here and all model definitions here😞
function GAMMA_PDF(x, Theta, Alpha); /* Theta : Scale */ /* Alpha : Shape */ minVal = 2.220446E-16; /* alternatives: MACEPS = 2.220446E-16 sqrt(SMALL)= 0.1491668147e-153 */ if (x < minVal) then do; x1 = minVal; /* assume exp(-x1/Theta)~1, because x1/Theta is too small */ p = x1**(Alpha-1) / (gamma(Alpha) * (Theta**Alpha)); end; else p = pdf("GAMMA", x, Alpha, Theta); return(p); endsub;
If you do not want this definition, you can always define your own version of gamma distribution that returns missing PDF and CDF values for 0-valued losses and try fitting it. See PROC SEVERITY documentation to find out how to define and fit your own distributions.
Now, coming back to your question, with your data that contains 0-valued losses, you will probably get some estimates from PROC SEVERITY because its standard gamma definition treats 0 values as very small values (=constant('MACEPS')), but you will need to look at the parameter estimates, fit statistics, and plots to see if it is indeed a good fit. In general, if you have lot of 0-valued response values, you should use a different distribution. The zero-inflated models mentioned by Rick are one option, but I would also suggest looking at the Tweedie distribution.
Hope this helps,
Mahesh
By default, the gamma distribution has a threashold parameter of zero, which means that a random variate from the gamma distribution will always be positive. In your data, you have three observations for which severity=0. The warning is telling you that those observations are dropped from the model, since they can't possibly come from a gamma-distributed variable.
For a similar question and some responses, see the thread "Zero-Inflated Gamma Model".
The options in that thread include slightly modifying the gamma deviance or changing to a different model. Model options include a zero-inflated gamma model or a Tweedie distribution.
Thanks Rick
Rick has addressed GENMOD's warning. I will comment on why PROC SEVERITY doesn't throw a similar warning. PROC SEVERITY supports multiple distributions including your own distributions, so it allows 0 values for the "loss" (response) variable. It takes care of the 0 values in the distribution definition functions. In particular, for the gamma distribution, it uses the following defintion of the PDF function (you can see other functions of PROC SEVERITY's predefined gamma distribution here and all model definitions here😞
function GAMMA_PDF(x, Theta, Alpha); /* Theta : Scale */ /* Alpha : Shape */ minVal = 2.220446E-16; /* alternatives: MACEPS = 2.220446E-16 sqrt(SMALL)= 0.1491668147e-153 */ if (x < minVal) then do; x1 = minVal; /* assume exp(-x1/Theta)~1, because x1/Theta is too small */ p = x1**(Alpha-1) / (gamma(Alpha) * (Theta**Alpha)); end; else p = pdf("GAMMA", x, Alpha, Theta); return(p); endsub;
If you do not want this definition, you can always define your own version of gamma distribution that returns missing PDF and CDF values for 0-valued losses and try fitting it. See PROC SEVERITY documentation to find out how to define and fit your own distributions.
Now, coming back to your question, with your data that contains 0-valued losses, you will probably get some estimates from PROC SEVERITY because its standard gamma definition treats 0 values as very small values (=constant('MACEPS')), but you will need to look at the parameter estimates, fit statistics, and plots to see if it is indeed a good fit. In general, if you have lot of 0-valued response values, you should use a different distribution. The zero-inflated models mentioned by Rick are one option, but I would also suggest looking at the Tweedie distribution.
Hope this helps,
Mahesh
thanks Mahesh
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.