turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Forecasting
- /
- glm model for severity

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-14-2016 05:40 AM

i am using motor insurance data , the claim severity is given to me in that data set. based on that i have to model severity. data base is attached with question , i have tried proc severity procedure using gamma distribution:

proc severity data=libish.simulatedwithlog crit=aicc ;

loss severity;

scalemodel logduration / dfmixture =full ;

dist gamma ;

run;

and also applied usual gamma model using genmod procedure .

proc genmod data=libish.dataset2;

class premiumclass age zone;

model severity=premiumclass age /dist=gamma link=log type3;

run;

and the it is giving WARNING: Some observations with invalid response values have been deleted. The response was less than or equal to zero for the Gamma or Inverse Gaussian distributions or less than zero for the Negative Binomial or Poisson distributions.

actually how to conclude gamma model is suitable for this data?

Accepted Solutions

Solution

11-21-2016
06:19 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ishakamboj1230

11-15-2016 01:01 PM

Rick has addressed GENMOD's warning. I will comment on why PROC SEVERITY doesn't throw a similar warning. PROC SEVERITY supports multiple distributions including your own distributions, so it allows 0 values for the "loss" (response) variable. It takes care of the 0 values in the distribution definition functions. In particular, for the gamma distribution, it uses the following defintion of the PDF function (you can see other functions of PROC SEVERITY's predefined gamma distribution here and all model definitions here):

function GAMMA_PDF(x, Theta, Alpha); /* Theta : Scale */ /* Alpha : Shape */ minVal = 2.220446E-16; /* alternatives: MACEPS = 2.220446E-16 sqrt(SMALL)= 0.1491668147e-153 */ if (x < minVal) then do; x1 = minVal; /* assume exp(-x1/Theta)~1, because x1/Theta is too small */ p = x1**(Alpha-1) / (gamma(Alpha) * (Theta**Alpha)); end; else p = pdf("GAMMA", x, Alpha, Theta); return(p); endsub;

If you do not want this definition, you can always define your own version of gamma distribution that returns missing PDF and CDF values for 0-valued losses and try fitting it. See PROC SEVERITY documentation to find out how to define and fit your own distributions.

Now, coming back to your question, with your data that contains 0-valued losses, you will probably get some estimates from PROC SEVERITY because its standard gamma definition treats 0 values as very small values (=constant('MACEPS')), but you will need to look at the parameter estimates, fit statistics, and plots to see if it is indeed a good fit. In general, if you have lot of 0-valued response values, you should use a different distribution. The zero-inflated models mentioned by Rick are one option, but I would also suggest looking at the Tweedie distribution.

Hope this helps,

Mahesh

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ishakamboj1230

11-15-2016 09:00 AM

By default, the gamma distribution has a threashold parameter of zero, which means that a random variate from the gamma distribution will always be positive. In your data, you have three observations for which severity=0. The warning is telling you that those observations are dropped from the model, since they can't possibly come from a gamma-distributed variable.

For a similar question and some responses, see the thread "Zero-Inflated Gamma Model".

The options in that thread include slightly modifying the gamma deviance or changing to a different model. Model options include a zero-inflated gamma model or a Tweedie distribution.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Rick_SAS

11-21-2016 04:14 AM

Thanks Rick

Solution

11-21-2016
06:19 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ishakamboj1230

11-15-2016 01:01 PM

Rick has addressed GENMOD's warning. I will comment on why PROC SEVERITY doesn't throw a similar warning. PROC SEVERITY supports multiple distributions including your own distributions, so it allows 0 values for the "loss" (response) variable. It takes care of the 0 values in the distribution definition functions. In particular, for the gamma distribution, it uses the following defintion of the PDF function (you can see other functions of PROC SEVERITY's predefined gamma distribution here and all model definitions here):

function GAMMA_PDF(x, Theta, Alpha); /* Theta : Scale */ /* Alpha : Shape */ minVal = 2.220446E-16; /* alternatives: MACEPS = 2.220446E-16 sqrt(SMALL)= 0.1491668147e-153 */ if (x < minVal) then do; x1 = minVal; /* assume exp(-x1/Theta)~1, because x1/Theta is too small */ p = x1**(Alpha-1) / (gamma(Alpha) * (Theta**Alpha)); end; else p = pdf("GAMMA", x, Alpha, Theta); return(p); endsub;

If you do not want this definition, you can always define your own version of gamma distribution that returns missing PDF and CDF values for 0-valued losses and try fitting it. See PROC SEVERITY documentation to find out how to define and fit your own distributions.

Now, coming back to your question, with your data that contains 0-valued losses, you will probably get some estimates from PROC SEVERITY because its standard gamma definition treats 0 values as very small values (=constant('MACEPS')), but you will need to look at the parameter estimates, fit statistics, and plots to see if it is indeed a good fit. In general, if you have lot of 0-valued response values, you should use a different distribution. The zero-inflated models mentioned by Rick are one option, but I would also suggest looking at the Tweedie distribution.

Hope this helps,

Mahesh

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MaheshJoshi

11-21-2016 04:14 AM

thanks Mahesh