SAS for DIY Champion-Challenger Customer Recommendation Systems

2 Likes

In part one of this SAS for Customer Recommendation Systems article series, we took an introductory tour of Do-It-For-Me (DIFM) and Do-It-Yourself (DIY) recommendation analysis use cases applied in martech leveraging SAS Customer Intelligence 360 and SAS Visual Data Mining and Machine Learning on SAS Viya.

Helping users of your brand's owned digital properties find items of interest is useful in almost any situation. In part two of this article series, we will:

Demystify how champion-challenger DIY recommendation analysis elevates support of personalized marketing.
Provide details of the DIY analytical techniques (algorithms) generally available in SAS that can be applied to recommendations.
Transparently demonstrate how SAS users can perform DIY recommendation analysis and scoring for customer experience orchestration.

Image 1: Champion-Challenger Algorithms for DIY Recommendation Analysis

Demystifying Champion-Challenger Do-It-Yourself (DIY) Recommendation Analysis For Martech

In general, it is a good practice to develop multiple AI models that support the same task. The reason for this is simple: if one model fails or the performance of that model degrades over time, there is always another model that can take over. For those unfamiliar with this approach, the champion model is the best model that is chosen from a pool of candidate models. In the machine learning ecosystem, this approach is often referred to as the champion-challenger approach, where the champion model is the model that currently has the best performance for the AI task at hand. Before users identify the champion model, they can evaluate the structure, performance, and resilience of candidate models. Users leverage challenger models to test the strength of champion models.

The champion model is the model that typically runs in production and is continuously challenged by the challenger models. As soon as the champion model fails or one of the challenger models defeats the champion model, the current champion model can be quickly replaced, and the continuity of the AI system can be guaranteed.

Recommendation engines help brands gain valuable insights hidden within massive data. A fact-based method that contrasts traditional marketing, which generally relies on intuition, provides businesses with solutions that are not just mere assumptions. Recommendation engines help analyze and predict whether a particular user would prefer a product or not, based on the particular user’s profile and historical information. The ROI (return-on-investment) for recommendation engines within martech is frequently observed in improving cart values, consumer engagement and customer retention.

What does the difference between various types of recommender algorithms look like when it comes to metrics? There are several metrics to evaluate the performance of models in a champion-challenger context. In the forthcoming demo video below, we will use the following metrics to make comparisons:

Area Under The Curve (AUC): AUC measures the likelihood that a random relevant item is ranked higher than a random irrelevant item. Higher the likelihood of this happening implies a higher AUC score meaning a better recommendation system.
Hit Rate (HR): The hit ratio is simply the fraction of users for which the correct answer is included in the generated recommendation list (top 10 for example) extracted from all users in the test (or validation) modeling data.
Mean Reciprocal Rank (MRR): MRR calculates an average of reciprocal of ranks given to the relevant items. So if the relevant items are ranked higher, the reciprocal of the ranks would be lower leading to a lower metric score, as desired. Essentially, the idea behind evaluating a recommendation system is to make use of ranks given to the relevant items and translate into a single number indicating how good or bad the ranks are.

Please keep in mind, comparing the performance of recommendation models is not limited to these three metrics only.

Recommendation Algorithm Candidates

The demo will exemplify using three algorithmic approaches leveraging factorization machines (FMs), bayesian personalized ranking (BpR) & data translation w/ optimal step-size (DTOS). Given we have covered FMs earlier, let's briefly describe the others:

Bayesian Personalized Ranking (BPR)

BpR as an algorithm can be applied to creating personalized recommendations of items for users on the basis of the users’ implicit feedback (such as web/mobile clicks or purchase history). BPR is a common method that is designed specifically to optimize recommendation ranking and has shown superior performance compared to other standard analysis techniques that are widely used to analyze explicit feedback. In the scenario of implicit feedback, an observation or event from an instrumented website or mobile app using SAS Customer Intelligence 360 captures the required input data which simply consists of a user (or customer) and an item (or product/service).

Among the available recommendation methods, collaborative filtering, matrix factorization and factorization machines have shown to be effective approaches, and many researchers and brands have focused on these methods. For example, a factorization machine model is a general factorization model that considers both latent and auxiliary features, and it includes and mimics many basic collaborative filtering methods under various scenarios. Although factorization machines have shown good performance in both model prediction and computational complexity, the majority of factorization machine methods are designed for data that contains explicit feedback, whereas only a limited number of approaches have been proposed for data that contains implicit feedback. An alternative is to model the likelihood of ranking between items to utilize a new optimization criterion, Bayesian personalized ranking (BPR), for analyzing implicit feedback with item features which can have a significant influence on model performance.

Data Translation With Optimal Step Size (DTOS)

From the perspective of customers, a recommender provides personalized recommendation by helping users to find interesting items (products, movies, music, etc). From the perspective of products, a recommender performs targeted advertising by identifying potential users that would be interested in a particular item. The information about users, items, and user-item interactions constitute the data that are used to achieve the goal of recommenders. Among the three types of information, user-item interactions are essential. Recommenders employing user-item interactions alone, without requiring the information of users or items, is based on collaborative filtering. Typically, each user rates only a fraction of items and each item receive ratings from only a fraction of users, making an incomplete data matrix with only a fraction of entries observed. In this matrix formulation, the goal of recommenders, specifically collaborative filtering, becomes predicting the missing entries so as to locate the interesting items or potential users. A major bottleneck is the reliance on singular value decomposition (SVD), limiting its use in large-scale applications.

An alternative approach to collaborative filtering is matrix factorization (MF), which models the user-item interactions as a product of two factor matrices. Each user or item is represented by a vector, and a rating entry is represented by the inner product of two vectors. These vectors can be considered as a feature representation of the users and items. As they are not observed, but rather are inferred from user-item interactions, these vectors are commonly referred to as latent features or factors. Moreover, the latent features of all users and all items may be inferred simultaneously, making it possible to incorporate the benefit of multitask learning (MTL). By the principle of MTL, the feature vector of each user is not only influenced by its own rating history, but also by the rating histories of other users, with the extent of influence dictated by the similarity between users. For this reason, a user may discover new interesting items from the rating histories of its peers who share similar interests, with the similarity identified from all users’ rating histories.

A widely adopted algorithm for learning MF models is Alternating Least Squares (ALS), which updates the two factor matrices alternately, keeping one fixed while updating the other. Given one matrix, ALS optimizes the other by solving a least squares (LS) problem for each user or item. As the LS solution is optimal, ALS can improve the learning objective aggressively in each iteration, leading to convergence in a small number of iterations. However, different users may have rated different items and, similarly, different items may have been rated by different users; thus, this leads to high computational cost in each iteration of ALS.

This issue can addressed with a softImpute-ALS algorithm, but sub-optimal results in applying this method has led SAS to introduce a new algorithm, termed Data Translation with Optimal Step-size (DTOS), to alleviate these drawbacks. As the name indicates, DTOS first performs data augmentation (or translation), an equivalent to the imputation step of softImpute-ALS. However, DTOS goes one step further to construct a set of solutions, with the softImpute-ALS solution included in the set as special element. The solutions are parameterized by a scalar that plays the role of step-size in gradient descent. The step-size is optimized by DTOS to find the solution that maximizes the original objective. The optimization guarantees a larger improvement of the original objective compared to the improvement achieved by softImpute-ALS, with this helping to alleviate the issue of slow progress per iteration and thus to speed up convergence. Thanks to the quadratic objective, the optimal step-size can be obtained in closed-form and its calculation does not introduce significant additional cost of computation; thus, DTOS has almost the same per-iteration computational complexity as softImpute-ALS. With the low cost per iteration and more aggressive improvement of the learning objective, DTOS blends the advantage of softImpute-ALS into that of ALS, and is expected to achieve a high performance-to-cost ratio.

In other words, DTOS is a fast algorithm for training recommender systems on implicit feedback. Users can leverage the DTOS action in SAS as a distributed, multithreaded implementation. The recommender model that the DTOS algorithm trains is represented by matrix factorization with partially defined factors (MF-PDF), a model that generalizes matrix factorization (MF) to include predefined factors (PDF) of users and/or items.

Chapter 3: Champion-Challenger Do-It-Yourself (DIY) Recommendation Analysis Demo

The video below will provide a SAS software demonstration on how to perform champion-challenger DIY recommendation analysis, scoring and customer experiential orchestration. For readers who have not viewed the Chapter 1 and 2 demos, they are available here.

(view in My Videos)

The use cases for recommendation systems are expanding every day, across the entire martech industry. We look forward to what the future brings in our development process – as we enable technology users to access all of the most recent SAS analytical developments. Learn more about how SAS can be applied for customer analytics, journey personalization and integrated marketing here.

SAS for DIY Champion-Challenger Customer Recommendation Systems

Free course: Data Literacy Essentials

Get Started