BookmarkSubscribeRSS Feed
gabon
Calcite | Level 5

Dear all,

 

I have a dataset of  yearly historical data containing independent variables and also claim occurence (y/n), frequency/num of claims during that year (if occured) and average amount per claim(if occured). These last three are dependent variables. Also, claims occured only in very small number of cases (6-7%). According to my internet research, claim frequency usually follows poisson distribution and claim amount gamma distribution. However, this seems to be not my case, because I tried using HP GLM node in Enterprise Miner 14.1 with several options - poisson, negative binomial, zero inflated poisson and zero inflated negative binomial for claim frequency and gamma distribution for claim amount. In both cases I used log link function. I tried also using interaction and polynomial terms. Moreover I tried different selection procedures - backward and stepwise.

 

The resulting models seem not good at all. Those predicting claim frequency will always predict a number close to zero (the highest is around 0.30 - how can it even predict decimal values when the dependent variable is supposed to be integer?) and those predicting claim amount will predict very wrong, and on average the predictions are lower then the real values.

 

Could you please help me find what am I missing? Should I do undersampling in order to increase the occurence of claims, before fitting the models? Am I setting something wrong for the HP GLM node? Should I leave those GLMs alltogether and try different predictive models? I understand that I could use classification models for predicting claim occurence, but I have no idea what other models could be used to predict claim frequency (number between 0 and 3) and claim amount (when I ignore zero values, the rest has log-normal distribution, could I leverage that somehow? Otherwise, I can't identify proper distribution when taking into account zero values - it's not gamma as it usually is).

 

Any input would be much appreciated, thanks a lot 🙂

1 REPLY 1

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 772 views
  • 1 like
  • 2 in conversation