In today’s post, we'll finish our assessment of a logistic regression model built in SAS Viya by examining lift and ROC charts. In my previous post of this series, we began our assessment of a logistic model in SAS Visual Statistics. We examined the confusion matrix, the misclassification plot and the cutoff plot. We want to cap off this discussion by returning to the set of outputs from that logistic regression model known as assessment plots and focusing on lift and ROC charts. We will continue to focus on the part of the AI and Analytics Lifecycle that involves developing and interpreting robust models. Specifically, let’s examine the remaining pieces of output from the logistic regression model that was built using variable annuity (insurance product) data.
Let's keep in mind that the business challenge is trying to identify customers who are likely to respond to a variable annuity marketing campaign and make a purchase. It's very likely during the AI and Analytics Lifecycle we will build more than just a logistic regression model. In fact, in my next post, we will build a decision tree with the same annuity data. With multiple models, how will we decide which is the "best" model. This is where model assessment comes into the picture, including lift and ROC charts. These assessment charts and others will help us evaluate model performance along with aiding us in selecting the best competing model that meets our business goals.
Let's begin by examining the lift chart. First, a little history on lift charts. Lift charts were developed as a practical tool for evaluating predictive models, especially in fields like direct marketing, where businesses needed to identify high-value customers for targeted campaigns. Originating in the 80's and 90's, they addressed the need to visualize how well models could prioritize likely responders compared to random selection. Even though traditional metrics like accuracy were available, unfortunately that statistic doesn't fully capture this concept of prioritization. Lift charts plot the improvement of a model over random selection, helping to show the added value of targeting top segments of the model's predictions. As data mining and machine learning advanced, lift charts became a standard tool in model evaluation. They offer clear insights into a model's effectiveness in various applications and have the benefit of being a visualization tool. To summarize, the lift chart is a graphical representation of the advantage (or lift) of using a predictive model to improve upon the target response versus not using a model at all.
Select any image to see a larger version. Mobile users: To view the images, select the "Full" version at the bottom of the page.
To create a lift chart of our annuity results, we rank-order our customers based on their likelihood of a positive outcome or purchase. Next, divide the ranked data into equal-sized groups. In our chart we will use percentiles. For each group, the cumulative percentage of actual positive outcomes is calculated and compared to the baseline percentage expected from random selection. Finally, chart the cumulative percentages with the x-axis representing the percentage of the population targeted and the y-axis showing the cumulative lift. This allows us to see how effectively the model outperforms random selection. And this makes sense as an assessment statistic because if our model cannot out-perform finding purchasers at random, then why bother using the model?
Let's examine the lift chart from our logistic regression model.
The baseline model is not actually plotted on the chart, but it is easy to visualize. It is the horizontal line that would lay flat on the x-axis and represents a cumulative lift value of 1.0 for all percentiles. That baseline model reflects the behavior we would see if we just went in and randomly guessed who the purchasers were without the help of a model. In other words, if we were to target 10% of the population, we would expect to find 10% of the purchasers. The lift value of 1 serves as a benchmark and reflects no advantage from using a predictive model like a logistic regression. The blue line represents the performance of our logistic regression model. Higher lift (especially at the lower percentiles) is better. We could actually compare the lift line of one model to the lift line of another model on the same chart. The model with the higher lift would be the better performing model. We will discuss exactly why in just a minute. The other line plotted here is the yellow line and it represents the best model achievable, or a perfect classifier. It can be useful to us because it shows us the possibilities of performance if we had a stronger model. Think of it as the upper limit of where the blue line could reach at each of the percentiles.
Let's focus in on just one piece of the lift chart to further explain how this plot may be used.
If we mouse over the blue line at the 5 th percentile, we get information about the model performance. The logistic regression model has a lift of approximately 2.25 at this percentile. Another way to think about this is as follows: if we were to contact the top 5 percent of our customers, we are over twice as likely to reach a responder versus just picking customers at random. Not bad at all! If we were to compare our logistic regression model against another model that had a higher lift at the 5 th percentile, we would consider the other model to be the better performer. Since the data has been rank-ordered by likelihood of responding, the lower percentiles are more meaningful than the higher percentiles. At this point it should be clear that all other things being equal, higher lift is preferable. One final note on lift charts is since these lift calculations do not depend at all on a model's cutoff value, lift charts are unaffected by a change in cutoff value.
Now let's focus on the ROC or receiver operating characteristic chart. The history of ROC charts is pretty fascinating. These charts were originally developed during World War II (during the 40's) to identify true signals (versus noise) for radar data. This allowed radars to be designed to detect enemy aircraft, ship, or missile against the real-world background noise of clouds, birds, or other non-threatening equipment. The ROC curve provides a method to evaluate the trade-off between true positive rates (correctly identifying a signal) and false positive rates (incorrectly identifying noise as a signal). A ROC analysis can help optimize a radar's sensitivity to maximize real threat detection and minimize false alarms. Since that time, in the 60's and 70's, ROC charts were adapted for the medical field. As an example, diagnostic tests could be evaluated on their ability to correctly detect disease while avoiding false alarms. In the 80's and beyond, the ROC curve became a well-used tool for evaluating binary classification models.
Since the ROC or receiver operating characteristic chart is a plot of the True Positive Rate against the False Positive Rate, you can read this previous post to review the basic definitions of true and false positive counts as well as true and false negative counts. The True Positive Rate (TPR) is defined as the number of True Positives divided by the total of both the True Positives and the False Negatives. TPR is also known as sensitivity or recall. You can think of it as the proportion of actual positives that were correctly identified by the model. The False Positive Rate or FPR is defined by the number of False Positives divided by the total of both the False Positives and the True Negatives. It is also known as 1 - specificity, where specificity is also known as the True Negative Rate. Think of FPR as the proportion of actual negatives that were incorrectly classified as positives.
To create a ROC chart of our annuity results, we calculate both the TPR and the FPR at each cutoff value (over the entire range from 0 to 1). The False Positive Rate is plotted on the x-axis and the True Positive Rate is plotted on the y-axis for each cutoff value. This typically results in a curve starting at (0,0) and ending at (1,1). Let's go ahead and look at the ROC chart from the logistic regression.
The blue line represents the performance of our logistic regression model. You can think of the ROC chart as a representation of how well our model is avoiding misclassifications for the "events" or with our data, purchasers. The "bigger" the curve, the "better" the model. In the ideal world, the curve would stretch out to reach the upper left-hand corner of the graph at (0,1). In fact, the perfect classifier would start at the top left corner (0,1) and continue as a horizontal line until the top right corner of (1,1). The blue curve would completely fill up the upper left corner of the chart, representing a TPR of 1 and FPR of 0 for all cutoff values. The diagonal dashed line plotted from (0,0) to (1,1) represents a random classifier and should be considered for a baseline comparison. A nice summary statistic that is used to assess a model's performance is the AUC or Area Under the Curve. The AUC (also known as the c-statistic or concordance statistic for binary models) typically ranges from .5 (indicating random guessing) to 1.0 (indicating the perfect classifier). Thus, for AUC, higher is better.
You also might have noticed a dashed blue vertical line reaching from the baseline model to the model's curve.
This vertical line represents the maximum vertical difference between the baseline model curve and the model's ROC curve (in our case, the logistic regression). This distance represents Youden's Index and corresponds to the optimal balance between sensitivity and specificity. From this one point on the curve, we are getting two good pieces of information. We can use Youden's (not unlike the Kolmogorov-Smirnov) statistic to compare models. Of course, higher is better with this index. And we are also getting a suggested cutoff value. In this example, a cutoff of 0.32 is where the model achieves the best trade-off between correctly identifying positive cases (high TPR) and correctly rejecting negative cases (high TNR). Just in case you were wondering, the KS statistic and Youden's Index both measure a model's ability to distinguish between two classes, but they are slightly different. They focus on different aspects of the model's performance. Youden's is calculated from the sensitivity and specificity, while KS is calculated from the cumulative distribution functions of predicted scores. One final note on the ROC chart is that (like the lift chart) it is unaffected by a change in the cutoff value because it already contains the entire range of cutoff values.
We’ve done a great job today of discussing the remaining two of the five assessment plots that are available for a logistic regression model built in SAS Visual Statistics. And that completes our examination of building and interpreting a logistic regression model in SAS Viya. In my next post we’ll finish up with investigating categorical targets and look at the decision tree model. We'll use the same annuity data to keep things consistent and learn all about the commonly used and easily interpretable decision tree.
Find more articles from SAS Global Enablement and Learning here.
... View more