From Black Boxes to Transparent Insights: SAS Credit Scoring

1 Like

Traditionally, the financial industry has used conventional credit scoring methods, such as logistic regression, to evaluate borrowers' creditworthiness. While these methods have been fundamental for lenders, regulators, and consumers, they often fall short in capturing intricate patterns within large-scale, modern datasets. In response, machine learning (ML) models have emerged, offering superior predictive capabilities and adaptability. However, regulatory concerns have slowed their adoption. Recently, improvements in model transparency and fairness have led regulators to adopt a more receptive approach toward these advanced techniques.

The Evolution of Credit Scoring and Machine Learning

Conventional credit scoring models utilize predefined variables like income, debt-to-income ratio, and payment history, typically analyzed through linear or logistic regression techniques. These models are valued for their transparency and ease of interpretation, aligning with regulatory standards. However, their ability to detect nonlinear patterns and integrate alternative data sources is limited, which constrains their predictive effectiveness.

Emergence of Machine Learning Models

Machine learning leverages advanced algorithms to process vast and complex datasets, revealing subtle patterns that traditional techniques often miss. Techniques such as decision trees, gradient boosting machines, and neural networks are gaining prominence for their ability to predict credit risk with exceptional precision. By integrating diverse data sources—including transactional records, social media activity, and alternative financial behaviors—Machine Learning models offer a more holistic and insightful evaluation of creditworthiness.

Despite these advantages, a significant challenge remains: the inherent complexity of ML models can obscure their decision-making processes, leading to their characterization as black-box systems.

Challenges with Machine Learning Adoption

The "Black-Box" Problem in Machine Learning
The limited interpretability of early ML models presented a major obstacle to gaining regulatory approval. Financial institutions and regulators prioritize transparency to uphold accountability, identify biases, and preserve consumer trust. Unlike traditional models, which offer clear and easily understood outcomes, ML models often rely on intricate computations that can be challenging to explain in plain terms.

Risks of Bias and Discrimination
Biases present in training data can carry over into ML models, resulting in potentially discriminatory outcomes. For example, if historical data embeds systemic inequalities, ML algorithms may unintentionally reinforce those patterns. Such outcomes raise concerns under fair lending regulations (for example, Equal Credit Opportunity Act (ECOA) and the Fair Housing Act in the US), prompting heightened regulatory scrutiny.

Regulatory Compliance and Auditability
Financial regulators emphasize the importance of transparency, auditability, and compliance with established standards. Early ML models often fell short of these expectations due to their complexity, which made it difficult to perform thorough validations, interpret results clearly, and demonstrate adherence to consumer protection regulations.

The Shift Towards Transparency

Advancements in Explainable AI (XAI)
The emergence of Explainable AI (XAI) has been pivotal in enhancing transparency. Methods such as SHAP (Shapley Additive Explanations), LIME (Local Interpretable Model-Agnostic Explanations), and counterfactual analysis offer clear insights into ML model decision-making. SHAP values measure each input's impact on a model's prediction. In a credit scoring model, SHAP can illustrate the extent to which factors like income or credit history influenced a borrower’s credit score. LIME simplifies complex models by creating local, interpretable approximations around individual predictions. For example, in a credit denial case, LIME can highlight the key factors—such as income or payment history—that influenced the outcome, enabling institutions to provide clear and transparent justifications. Counterfactual reasoning explores the question: “What changes would lead to a different outcome?” In credit scoring, this could mean identifying specific adjustments—such as lowering the debt-to-income ratio—that might increase a borrower's approval chances. This approach not only promotes transparency but also provides consumers with practical, actionable insights.

Ethical AI Frameworks
International frameworks, including the European Union’s AI Act and the Federal Reserve’s model risk management guidelines set clear benchmarks for the ethical deployment of AI in financial services, helping regulators gain confidence in the adoption of ML technologies.

Validation and Monitoring Practices
The implementation of robust model validation and testing frameworks has eased regulatory concerns. Techniques such as stress testing, scenario analysis, and fairness audits have become standard practices to ensure that ML models uphold accuracy, fairness, and reliability.

SAS Viya Scorecard Node and Black-Box Model Integration

The Scorecard Node (included as part of SAS Risk Modeling add-on for SAS Visual Data Mining and Machine Learning) in SAS Viya is a dedicated tool for developing, validating, and deploying credit scoring models, combining traditional statistical techniques with modern machine learning approaches. It allows users to create interpretable scorecards using methods such as Weight of Evidence (WOE) and Information Value (IV), which are essential for credit risk assessment.

In addition to traditional methods, the Scorecard Node in SAS Viya supports the integration of black-box models such as gradient boosting, random forests, neural networks, and support vector machines.

It also includes Bayesian Networks and Decision Trees as part of the black-box model offerings even though these are not usually considered as uninterpretable because they provide interpretable, probabilistic representations of relationships between variables.

However, Bayesian Networks use probabilistic relationships and prior knowledge to assess credit risk, and their complex reasoning process makes it difficult to pinpoint how specific input variables influence the final credit score. This lack of clarity poses a challenge for meeting regulatory requirements, which mandate clear and simple explanations for credit denials.

For regulators, credit scoring models must remain reliable across different economic conditions. Decision Trees, especially when grown deep, can be highly sensitive to economic changes, making them less stable during downturns or crises.

To address interpretability challenges, the Scorecard node in SAS Viya employs a logistic regression model approximation of the black-box model. It then generates a scorecard based on this approximation, providing a clear and interpretable representation of the black-box model’s predictions.

Enabling Black-Box Models in SAS Credit Scoring Node

Enabling black-box models in the SAS credit scoring node involves selecting the check box to Use a black-box model. This property is not enabled by default.

When a black-box model is defined, it generates predictions for the target variable based on specified parameters, such as the classification cutoff in a gradient boosting model. The original dataset, now enriched with the predicted target values from the black-box model, is then used as input for the logistic regression model.

The original target values are replaced with predictions from the black-box model, and then the logistic regression is executed on these new target values to generate the scorecard and related reports.

Note, the results from the scorecard node reflect the black-box model’s predictions rather than the logistic regression model itself, as the logistic regression now serves as an explainer for the black-box model.

Importantly, the input variables for the logistic regression are not limited to those used in the black-box model, allowing it to process the black-box scored data without added constraints or difficulties.

We can choose which black-box model to use from the list of six available model types. Gradient Boosting is the default model type.

Employing Black-Box models to create a PD Application Scorecard

The RM_ACCEPTS data is used to generate an application scorecard. In fact, it is one of the primary datasets for building PD (Probability of Default) scorecards at the origination stage because it contains information collected during the loan application process.

The task involved developing a PD scorecard using four different black-box models—Gradient Boosting, Random Forest, Neural Network, and SVM—and comparing their performance against a scorecard generated from a traditional logistic regression model (renamed as No Black-Box Scorecard in the diagram below).

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

The model comparison results with the default settings for all black-box models are displayed below.

The results show that the Random Forest model emerged as the champion based on the default Kolmogorov-Smirnov (KS) statistic selection criterion. All models were executed with their default node settings on SAS Viya, and stepwise selection was applied to all logistic regression approximations of the black-box models. To ensure consistency in model settings, the standard logistic regression (No Black-Box Scorecard) also utilized stepwise input selection.

Model Interpretability

The logistic regression approximations, based on predicted values of the black-box models, offer a crucial interpretive framework to satisfy regulatory requirements.

WOE values are defined as

where

As observed, the definitions of Good and Bad are derived from the target variable. Positive WOE implies the proportion Good customers are higher than the proportion of Bad customers, and vice versa.

WOE serves as a measure of risk, where positive values indicate a lower probability of default. Additionally, in a logistic regression model, the coefficients (β) determine the strength of this relationship. A negative β implies that each unit increase in WOE reduces the odds of default by a factor of e^β, whereas a positive β means that an increase in WOE raises the odds of default by e^β.

Partial Dependence (PD) Plot

A PD plot (Partial Dependence plot) is a visualization technique used to help understand how a particular feature affects the model’s (in this case, the champion random forest model) predictions on average.

The PD plot below depicts the effect of the WOE transformed AGE input (WOE_AGE) on the predicted target, holding all other effects at their average values.

PD and ICE Overlay Plot

An ICE plot (Individual Conditional Expectation plot) is an extension of the Partial Dependence (PD) plot that provides a more granular, instance-level interpretation of how a feature influences a model’s predictions. Unlike PD plots, which show the average effect, ICE plots display the impact on individual data points.

The plot below illustrates how WOE_AGE influences the predicted target for six randomly selected observations, while holding the other effects constant at their values for each observation.

LIME Explanations

A LIME (Local Interpretable Model-agnostic Explanations) plot is used to explain the predictions of a machine learning model by approximating it with an interpretable local surrogate linear regression model.

The LIME plot below illustrates the regression coefficients of selected input features (this is done employing the LASSO technique) in the local surrogate linear regression model, which approximates the predicted probability of event '1' for the target GB at the individual observation level.

A positive estimate implies that a higher input value increases the predicted probability of the event. For instance, a WOE_INCOME estimate of -0.141457553 reduces the predicted probability of event 1 for the target by 0.029 compared to an individual with a different WOE_INCOME value.

HyperSHAP Values

SHAP (SHapley Additive exPlanations) assigns a value to each feature or input, displaying how much it contributes to the predicted target for a specific observation. For each individual observation, an input's Shapley value represents its contribution to the predicted probability of the target event 1 (or 0, if that is the event of interest). The inputs in the chart are ranked by importance based on their absolute HyperSHAP values.

Additional Information

For more information on SAS Risk Modeling visit the software information page here. For information on SAS Viya Machine Learning with Risk Modeling visit the software information page here.

For more information on curated learnings paths on SAS Solutions and SAS Viya, visit the SAS Training page. You can also browse the catalog of SAS courses here.

Find more articles from SAS Global Enablement and Learning here.