BookmarkSubscribeRSS Feed

GLMs in SAS Dynamic Actuarial Modeling Solution

Started ‎03-14-2024 by
Modified ‎03-14-2024 by
Views 208

Introduction

 

Insurance policies are designed to provide coverage for losses incurred by policyholders as a result of unforeseen circumstances. These occurrences, which occur sporadically over time, must occur while the policy is active to be considered eligible for coverage.

 

To establish a fair premium rate for a policy, actuaries need to quantify the stochastic elements inherent in the underlying claims process. This entails developing suitable probability models to assess both the frequency and magnitude of claims.

 

Random Variables, Probability Functions and Statistical Distributions

 

When dealing with a random phenomenon represented by a probability space, the focus often shifts towards specific numerical representations of outcomes within the sample space rather than the actual outcomes themselves. For instance, when viewing an insurance claim as the outcome of a random event, actuaries typically prioritize the financial value associated with the claim. Alternatively, they might emphasize the frequency of claims over the duration of the policy period.

 

The majority of random variables encountered in insurance modeling can generally be categorized into two main types – discrete (like claim frequency) or continuous (like claim severity). However, actuaries also encounter mixed variables, which incorporate elements of both discrete and continuous variables – for example, insurance payouts subjected to a deductible.

 

Given the nature of a random variable, we can also define mathematical functions that describes the likelihood of different outcomes of a random variable in a given random process. These functions are known as probability functions and they assign probabilities to each possible outcome of a random variables, indicating how likely they are to occur.

 

These probability functions are termed as probability distribution functions (pdf) in case of continuous random variables, and probability mass functions (pmf) in case of discrete random variables.

 

This brings us to the notion of a probability distribution or statistical distribution, which can be visualized as a comprehensive representation of the possible outcomes of a random variable along with their corresponding probabilities, derived from the previously outlined probability functions.

 

Significance of Probability Distributions or Statistical Distributions for Actuaries

 

Actuaries use probability distributions to model and analyze risks in a variety of contexts. For instance, in the insurance industry, actuaries use probability distributions to model the likelihood of different events, such as accidents or natural disasters, and to determine the appropriate premiums for insurance policies. In finance, actuaries use probability distributions to model the likelihood of different investment outcomes and to determine the appropriate levels of risk for different portfolios.

 

One of the key challenges of insurance modeling is to choose the most appropriate statistical or probability distribution to represent the frequency and severity of insurance claims. Different types of insurance claims may have different characteristics and patterns, such as the occurrence rate, the claim size distribution, and the presence of outliers or extreme values. Therefore, different statistical distributions may be more suitable for different types of insurance claims.

 

For insurance risk management, prudent choices about statistical distributions can help to measure and monitor the expected losses and the variability of losses for a given portfolio of policies. Appropriate assumptions about probability distributions can also help to control the risks by adjusting the policy terms, such as the deductible, the limit, and the coinsurance. These results can also help to determine the optimal reinsurance strategy, such as the type, the level, and the cost of reinsurance.

 

To achieve this, insurance companies rely on sophisticated modeling techniques that can handle the complexities and nuances of insurance data. One such powerful tool is the Generalized Linear Model (GLM).

 

Generalized Linear Models in Insurance Modeling

 

Suppose we want to develop pricing models for insurance products. We want to develop a model that can be used to predict quantifiable amounts of insurance risk. Most commonly used risk factors are claim frequency and claim severity.

 

Once we have determined the nature of the variable measuring risk and the factors that contribute to such risks, the next step involves establishing a structured framework to capture the relationship between the two.

 

For example, if we assume that there exists a linear relationship (in terms of parameters) between our measure of risk (let’s denote this as Y) and a set of contributing factors or predictors (collectively denoted as X) and if the measure of risk (Y) follows a statistically normal distribution, then we can estimate the relationship between risk and its contributing factors using a linear regression model:

 

01_SoumitraDas_bl01_2024_Eq01.png

 

Note: Linear regression is not commonly used for modeling insurance data. The reference here is primarily intended to conceptualize the modeling framework.

 

The primary objective of such a model would be to estimate the value of unknown parameters, β, from present and past data using a suitable estimation technique like Least Squares, Maximum Likelihood, etc. Once, the parameters (βs) are estimated we can then use them along with the chosen modeling framework to predict the values of Y.

 

The preliminary choice of models relies on prior understanding of the characteristics and structure of claims data. Furthermore, the commonly used measures of risk in such claim data do not conform to normal distributions. Typically, claim frequency is represented using non-negative discrete probability distributions, given that the count of claims is discrete and positive. On the other hand, it's commonly acknowledged that modeling claim severity is most effective when utilizing non-zero continuous distributions that exhibit right-skewness and heavy-tailed characteristics.

 

GLMs offer a more flexible approach compared to traditional linear regression models. They can accommodate non-normal distributions and handle different types of responses. This makes them particularly suitable for insurance modeling, where variables often exhibit skewed or categorical behavior.

 

GLMs deviate from linear regression modeling in three significant aspects:

 

  1. The distribution of the response variable is drawn from the exponential family. Consequently, the response distribution does not necessarily adhere to normality and can explicitly manifest non-normal characteristics.
  2. The relationship of interest between the transformed mean of the response and the explanatory variables is linear (in terms of the parameter).
  3. The variance of the response variable is a function of the response variables expected value. This allows for non-constant variance of the response variable (heteroskedastic) as opposed to homoscedastic assumption of linear regressions.

 

GLMs can be represented with the following equation:

 

02_SoumitraDas_bl01_2024_Eq02.png

 

Where g( ) is the link function that establishes the linear relation between the transformed mean of the response E[Y] and the explanatory variables (X). Furthermore, the link function g( ) must be differentiable and strictly monotonic such that the inverse -1 ( ) exists, and

 

03_SoumitraDas_bl01_2024_Eq03.png

 

The above equation implies that the response variable E[Y] can be a nonlinear function of this linear combination explanatory variables (X) – that is, g( ) can be a non-linear function.

 

The distributions frequently employed by actuaries often exhibit a similar structure, allowing them to be classified into the exponential family. This characteristic has facilitated the development of a unified analytical framework known as Generalized Linear Models. For GLMs, response variable is assumed to have a probability distribution function from the exponential family that is given by:

 

04_SoumitraDas_bl01_2024_Eq04.png

 

θ is the parameter of interest (also termed as canonical parameter or natural parameter), Φ is the dispersion parameter. The functions b(θ), a() and C(Y, Φ) determine the type of distribution.

 

GLM Distribution Functions in SAS Dynamic Actuarial Modeling Ratemaking Node

 

Ratemaking involves the intricate task of establishing appropriate rates or premiums for individual insurance customers. Unlike conventional methods which may lack statistical sophistication, the Ratemaking node employs GLMs.

 

The Ratemaking node is specifically engineered for constructing GLMs tailored for insurance applications. It serves as a dedicated tool for actuarial pricing, offering a range of distributions from the exponential family suitable for various target variables. These variables often include frequency, severity, and pure premium, with the selection of distributions tailored to the nature of the target variables.

 

Claim Severity/Pure Premium Distributions

 

Distribution Range Requirements Data Type
Burr Nonnegative real values Well-suited for modeling extreme values as a heavy-tailed distribution; effective for modeling insurance claim amounts that occur extremely infrequently.
Exponential Nonnegative real values Special case of the gamma distribution; appropriately suited for modeling thin-tailed distributions.
Gamma Nonnegative real values Most suited for modeling insurance severity; default choice for severity modeling in the Ratemaking node.
Generalized Pareto Positive real values Suitable for modeling extreme claim severity amounts; focuses on upper tail value beyond a threshold value.
Inverse Gaussian Nonnegative real values A mixed discrete-continuous model, with a probability mass at zero and an Inverse Gaussian continuous component; suitable for modeling insurance claim sizes, including zero claims.
Lognormal Positive real values Suitable in situations (like fire, automobile collision) where the individual claim values can increase almost without limits but cannot fall below zero, with most of the values near the lower limit.
Pareto Nonnegative real values A special case of the Generalized Pareto Distribution (σ = Threshold). Suitable for modeling extreme claim severity values; default choice for modeling pure premium in the Ratemaking node.
Scaled Tweedie Nonnegative real values Suitable for modeling zero-inflated insurance claim data; uses a scale parameter to explain the influence of regressions on the scale parameter.
Weibull Nonnegative real values Suitable for left-truncated data; threshold value set by the deductible; if the claim values are less than the deductible, then it does not get recorded in the data.

 

05_DAM_PurePremiumED.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

06_DAM_SeverityED.png

 

The ratemaking node for claim severity and pure premium modeling allows for multiple distribution specifications and various model selection criteria are available for selection of the champion model. The default distribution in case of claim severity modeling is the Gamma distribution and that in the case of pure premium modeling is Scaled Tweedie.

 

Claim Frequency Distributions

 

Distribution Range Requirements Data Type
Poisson Nonnegative integers Suitable for modeling count of events occurring within a fixed time interval.
Conway-Maxwell-Poisson (CMP) Nonnegative integers Suitable for modeling count of events when claims frequency is often found to have a variance that is greater than the mean.
Negative Binomial (NB) Positive integers Suitable for modeling count of events when the mean is treated as a random variable.
Zero-inflated Nonnegative integers Suitable for Modeling zero-inflated count events; provides for a lower average premium for insurance customers with less risk as they are considered to have a high probability of making zero claims.

 

07_DAM_FrequencyED.png

 

For claim frequency in the ratemaking node, we cannot specify multiple distributions (as in claim severity or pure premium modeling). The default distribution is Poisson.

 

Interpretability of GLM Models

 

The transparent structure and diagnostic capabilities of GLMs contribute to their interpretability. This makes it a valuable tool for communicating with the stakeholders.

 

  1. Parameter Interpretation: The coefficients in GLMs represent the relationship between the predictor variables and the response variable.
  2. Predictor Importance: GLMs allow for the assessment of predictor importance through the examination of coefficient magnitudes and significance levels.
  3. Model Diagnostics: GLMs offer various diagnostic tools such as residual analysis, leverage plots, and influence measures, which facilitate the evaluation of model fit and the identification of influential data points.
  4. Model Assumptions: GLMs come with well-defined assumptions, such as the linearity of predictors (in relation to the link function) and the independence of observations.

 

Challenges and Limitations of using GLMs

 

One hurdle lies in assuming a linear relationship between predictors and the response variable. However, this assumption may not hold true in all cases, necessitating either transformations of the data or the adoption of more adaptable modeling techniques.

 

Another issue is that of selecting the correct distribution family and link function. An erroneous pairing can result in biased estimates. Insurers must meticulously evaluate the attributes of the response variable and opt for the distribution family and link function that best align with its characteristics. This is somewhat mitigated in SAS DAM with the solution providing the option to compare various distributions and selecting the distribution (only for severity and pure premium models) that best fits the given data.

 

Furthermore, GLMs might encounter challenges related to overdispersion, wherein the variance of the response variable surpasses what is anticipated under the presumed distribution. Such circumstances can result in exaggerated standard errors and skewed estimates.

 

Conclusion

 

As the insurance landscape undergoes continual transformation, the utilization of GLMs in insurance modeling is likely to expand. Prospective avenues may entail integrating intricate interactions and non-linear associations, integrating external data sources for enhanced predictive efficacy, and crafting models that are adept at handling high-dimensional and unstructured data.

 

Additional Information

 

For more information on SAS Dynamic Actuarial Modeling visit the software information page here.

 

For more information on curated learnings paths on SAS Solutions and SAS Viya, visit the SAS Training page. You can also browse the catalog of SAS courses here.

 

 

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎03-14-2024 09:27 AM
Updated by:
Contributors

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Labels