SAS Data Science

Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Viya (Machine Learning), SAS Visual Text Analytics, with point-and-click interfaces or programming
BookmarkSubscribeRSS Feed
gabon
Calcite | Level 5

Dear all,

 

I have a dataset of  yearly historical data containing independent variables and also claim occurence (y/n), frequency/num of claims during that year (if occured) and average amount per claim(if occured). These last three are dependent variables. Also, claims occured only in very small number of cases (6-7%). According to my internet research, claim frequency usually follows poisson distribution and claim amount gamma distribution. However, this seems to be not my case, because I tried using HP GLM node in Enterprise Miner 14.1 with several options - poisson, negative binomial, zero inflated poisson and zero inflated negative binomial for claim frequency and gamma distribution for claim amount. In both cases I used log link function. I tried also using interaction and polynomial terms. Moreover I tried different selection procedures - backward and stepwise.

 

The resulting models seem not good at all. Those predicting claim frequency will always predict a number close to zero (the highest is around 0.30 - how can it even predict decimal values when the dependent variable is supposed to be integer?) and those predicting claim amount will predict very wrong, and on average the predictions are lower then the real values.

 

Could you please help me find what am I missing? Should I do undersampling in order to increase the occurence of claims, before fitting the models? Am I setting something wrong for the HP GLM node? Should I leave those GLMs alltogether and try different predictive models? I understand that I could use classification models for predicting claim occurence, but I have no idea what other models could be used to predict claim frequency (number between 0 and 3) and claim amount (when I ignore zero values, the rest has log-normal distribution, could I leverage that somehow? Otherwise, I can't identify proper distribution when taking into account zero values - it's not gamma as it usually is).

 

Any input would be much appreciated, thanks a lot 🙂

1 REPLY 1

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1178 views
  • 1 like
  • 2 in conversation