BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ishakamboj1230
Obsidian | Level 7

 

i am using motor insurance data , the claim severity is given to me in that data set. based on that i have to model severity. data base is attached with question , i have tried  proc severity procedure using gamma distribution:

 

proc severity data=libish.simulatedwithlog  crit=aicc ;

loss severity;

scalemodel logduration / dfmixture =full ;

dist gamma ;

run;

 

and also applied  usual gamma model using genmod procedure .

 

proc genmod data=libish.dataset2;

class premiumclass age zone;

model severity=premiumclass age /dist=gamma link=log type3;

run;

 and the it is giving WARNING: Some observations with invalid response values have been deleted. The response was less than or equal to zero for the Gamma or Inverse Gaussian distributions or less than zero for the Negative Binomial or Poisson distributions.

 

actually  how to conclude gamma model is suitable for this data?

1 ACCEPTED SOLUTION

Accepted Solutions
MaheshJoshi
SAS Employee

Rick has addressed GENMOD's warning. I will comment on why PROC SEVERITY doesn't throw a similar warning. PROC SEVERITY supports multiple distributions including your own distributions, so it allows 0 values for the "loss" (response) variable. It takes care of the 0 values in the distribution definition functions. In particular, for the gamma distribution, it uses the following defintion of the PDF function (you can see other functions of PROC SEVERITY's predefined gamma distribution here and all model definitions here😞

    function GAMMA_PDF(x, Theta, Alpha);
        /* Theta : Scale */
        /* Alpha : Shape */
        minVal = 2.220446E-16;      /* alternatives:
                                       MACEPS     = 2.220446E-16
                                       sqrt(SMALL)= 0.1491668147e-153 */
        if (x < minVal) then do;
            x1 = minVal;
            /* assume exp(-x1/Theta)~1, because x1/Theta is too small */
            p = x1**(Alpha-1) / (gamma(Alpha) * (Theta**Alpha));
        end;
        else
            p = pdf("GAMMA", x, Alpha, Theta);
        return(p);
    endsub;

If you do not want this definition, you can always define your own version of gamma distribution that returns missing PDF and CDF values for 0-valued losses and try fitting it. See PROC SEVERITY documentation to find out how to define and fit your own distributions.

 

Now, coming back to your question, with your data that contains 0-valued losses, you will probably get some estimates from PROC SEVERITY because its standard gamma definition treats 0 values as very small values (=constant('MACEPS')), but you will need to look at the parameter estimates, fit statistics, and plots to see if it is indeed a good fit. In general, if you have lot of 0-valued response values, you should use a different distribution. The zero-inflated models mentioned by Rick are one option, but I would also suggest looking at the Tweedie distribution.

 

Hope this helps,

Mahesh

View solution in original post

4 REPLIES 4
Rick_SAS
SAS Super FREQ

By default, the gamma distribution has a threashold parameter of zero, which means that a random variate from the gamma distribution will always be positive. In your data, you have three observations for which severity=0. The warning is telling you that those observations are dropped from the model, since they can't possibly come from a gamma-distributed variable.

 

For a similar question and some responses, see the thread  "Zero-Inflated Gamma Model".

The options in that thread include slightly modifying the gamma deviance or changing to a different model. Model options include a zero-inflated gamma model or a Tweedie distribution. 

ishakamboj1230
Obsidian | Level 7

Thanks Rick

MaheshJoshi
SAS Employee

Rick has addressed GENMOD's warning. I will comment on why PROC SEVERITY doesn't throw a similar warning. PROC SEVERITY supports multiple distributions including your own distributions, so it allows 0 values for the "loss" (response) variable. It takes care of the 0 values in the distribution definition functions. In particular, for the gamma distribution, it uses the following defintion of the PDF function (you can see other functions of PROC SEVERITY's predefined gamma distribution here and all model definitions here😞

    function GAMMA_PDF(x, Theta, Alpha);
        /* Theta : Scale */
        /* Alpha : Shape */
        minVal = 2.220446E-16;      /* alternatives:
                                       MACEPS     = 2.220446E-16
                                       sqrt(SMALL)= 0.1491668147e-153 */
        if (x < minVal) then do;
            x1 = minVal;
            /* assume exp(-x1/Theta)~1, because x1/Theta is too small */
            p = x1**(Alpha-1) / (gamma(Alpha) * (Theta**Alpha));
        end;
        else
            p = pdf("GAMMA", x, Alpha, Theta);
        return(p);
    endsub;

If you do not want this definition, you can always define your own version of gamma distribution that returns missing PDF and CDF values for 0-valued losses and try fitting it. See PROC SEVERITY documentation to find out how to define and fit your own distributions.

 

Now, coming back to your question, with your data that contains 0-valued losses, you will probably get some estimates from PROC SEVERITY because its standard gamma definition treats 0 values as very small values (=constant('MACEPS')), but you will need to look at the parameter estimates, fit statistics, and plots to see if it is indeed a good fit. In general, if you have lot of 0-valued response values, you should use a different distribution. The zero-inflated models mentioned by Rick are one option, but I would also suggest looking at the Tweedie distribution.

 

Hope this helps,

Mahesh

ishakamboj1230
Obsidian | Level 7

thanks Mahesh

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 2440 views
  • 5 likes
  • 3 in conversation