Watch this Ask the Expert session to learn about the analytics process model.
Watch the webinar
Join Professor Bart Baesens as he discusses the key requirements of a good analytical model including accuracy, interpretability, profitability, operational efficiency and compliance. He’ll also cover how to boost the performance of analytical models and review emerging challenges in analytics. During this webinar you will learn:
The key requirements of a good analytical model.
How to boost the performance of analytical models.
The emerging challenges in analytics.
The questions from the Q&A segment held at the end of the webinar are listed below and the slides from the webinar are attached.
Q&A
For the Call Detail Record, is it fair to score customers in terms of their network?
I think it is very important to ask customer consent first such that they are aware of what data is being collected and how it is used. Do note that a privacy concern can also create a new opportunity such as financial inclusion in this case.
If interpretability is important, can you please discuss the usefulness of neural networks.
If you want to use neural networks, you can make them interpretable in various ways. We cover this in our Fraud Analytics Course. You can use several different techniques such as Partial Dependence plots or ICE (Individual Conditional Expectation) plots. They allow you to see the impact of one particular variable on the target as modeled by the neural network on a variable by variable basis. You can also use Shapley Values. It’s a technique that allows you to understand how a neural network reasons and the patterns it captures inside and what it models. It looks at how individual variables contribute one by one to impact the output. This allows you to open up neural network black boxes with the techniques mentioned above.
Did any of the books you wrote talk extensively about feature engineering? If not, can you recommend a good book on this topic?
My book on Fraud Analytics has a whole chapter on how you can featurize a social network. It also has information on how to model the effect of time on an output.
What are the steps you usually take to define which transformation should be applied for each feature?
You can start off with easy transformations like logistic regression. That gives you a benchmark and then you look a complex model. I usually use XGBoost. I look at the difference between the two in terms of profit and AUC. If it’s negligible, we ignore it. If it’s not (usually greater than 3% AUC), then we try to bridge the gap between the logistic regression model and XGBoost by applying Yeo-Johnson transformations on each of the variables individually, limiting them as much as possible.
How do you choose lambda in the Yeo-Johnson transformation?
Split your data into three sets: training, validation, and test. You specify a grid of lambda values from -5 to +5. You build each model with the Yeo-Johnson transformation on your training set and look at the AUC (area under the ROC curve) or profit on the validation set. Select the lambda that gives you the best performance of AUC or profit on the validation set. Then you build the final model on your training set, combined with the validation set and evaluate its performance on your independent test set.
Do you suggest that systems be designed to meet security requirements in locales where the organization may not yet be conducting business in case it does so in the future?
It’s very important to anticipate, especially in fraud with fraudsters are trying to outsmart your model. You should continually backtest your models so can you check their performance. When you see the performance starts to degrade significantly, you can rebuild the model or tweak it to capture the new patterns using incremental learning facilities for example.
How can we automate feature engineering using analytical methods?
Think about deep learning. You have an input layer in your network, an output layer, and one or more hidden layers. The hidden layers are essentially doing automatic feature engineering because they try to squeeze and transform the inputs in such a way so that predicting the output is optimized. If you do this, the extracted features will be harder to understand because they represent hidden unit activation values.
In my world, senior executives are apprehensive around analytics/models, and even more apprehensive when analytics/models become more complex. How do you make these stakeholders comfortable with some of these newer techniques?
Trust and education are very important. You will not use what you don’t understand. Many organizations are introducing C-level analytics executives to gain firm-wide trust in the analytics. I would start with logistic regression or decision trees which everyone understands to steer your decisions rather than a complex deep learning neural network. The complex methods can certainly be helpful, but you must establish firm-wide trust and education before you use them.
How does variable transformation, such as Yeo-Johnson and box-cox transformation, affect interpretability?
It can contribute in a positive way to interpretability because it can model exponential and saturation effects of variables on your output.
Since the beginning of the pandemic customer behavior has changed significantly. How are you balancing historic data with the challenges of this ever-changing behavior?
It depends on the type of model. I’ll elaborate on credit risk modeling which has three levels. Level 0 is the data that feeds the model. Level 1 is the discrimination that separates the risky from the non-risky obligers. Level 2 is the calibration that will give you the probabilities of default. I anticipate the biggest changes will take place at the calibration level based on regulatory input or expert input.
Does the call duration of people play any role in calculating expected loss?
No, we didn’t find this, just the connection to (non-) defaulters in the call network.
Is it best if Data Scientists and similar roles have advanced degrees for expertise and credibility, even if that closes the field out to those who are unable to obtain such education?
I do think education is indeed handy to do data science alike activities. If I may suggest, the Analytics: Putting it all to Work course, might be of interest to you, see www.sas.com/emea/bb
How does Yeo-Johnson transformation compare to using GAM?
Yeo-Johnson is a form of GAM (generalized additive model).
What is your advice of imbalanced outcome variables? What is your preferred method?
This is also discussed in the Fraud Analytics Course. I use SMOTE (Synthetic Minority Oversampling Technique). It’s a way of generating synthetic observations by combining existing minority observations so as to boost the number of minority observations, creating artificial ones to boost the performance of your machine learning model because it will better discriminate between minority class and the majority class.
Are your courses going to be available in the SAS Learning Subscription?
Yes, the courses will be available in the SAS Learning Subscription. You can access a free 30-day trial here.
Does SAS have a package for XGBoost or other Boost?
Yes, it’s implemented in SAS Enterprise Miner and SAS Viya.
Is it the right time to include the impact of climate change in the credit risk modeling? For example, agriculture and the automobile industry are being impacted due to climate change. If climate change is the new normal, should credit risk modeling include this?
Yes, definitely, very good question. The BIS (Bank for International Settlements) recently issued some guidelines on this, see their website.
What is the role of ethics in data analytics? What about when analytics is used to manipulate people by determining their fears and biases?
Obviously, this should not be done. Ethics is becoming really important in analytics these days. Unfair discrimination and/or customer targeting based on analytics should not be done. There is a plethora of literature on how to make analytical models more compliant with ethical guidelines.
What is the difference between knowledge and wisdom?
By definition knowledge is facts, information, and skills acquired by a person through experience or education; the theoretical or practical understanding of a subject. Wisdom is the soundness of an action or decision with regard to the application of experience, knowledge, and good judgment.
Has SAS integrated ProfTree and Prof into its PROCs?
No at this moment, but this could change in the future as SAS is continuously sophisticating its products.
How relevant would you say is the deep knowledge on the mathematical formulas or, could you do these analyses with bare math knowledge but more applied knowledge?
Honestly, I do think some math knowledge is needed before you start embarking on any complex analytical modeling exercise. It’s a danger to use techniques to steer your critical business processes which you don’t fully understand.
Do you recommend to first check how the predictive model performs without feature engineering and then, compare it with a model with the new features? Or should we try a model with the engineered features in the first shot?
Yes, definitely kick off without feature engineering first.
How are we able to define the profit metric in case of fraud detection?
See my forthcoming publication as follows:
Höppner S., Baesens B., Verbeke W., Verdonck T., Instance-Dependent Cost-Sensitive Learning for Detecting Transfer Fraud, submitted for publication.
If I have one unsupervised model, can I use the feature engineering? How do I measure the performance change?
Yes, you can. One way is to use multivariate feature engineering using for example, principal component analysis, t-SNE or UMAP.
What is your preference, lift or AUC?
For credit risk modeling, AUC, for response modeling, churn prediction and fraud analytics, lift.
Recommended Resources
Data Science & Analytics @ LIRIS, KU Leuven
Course: Analytics: Putting It All to Work
Course: Credit Risk Modeling
Course: Fraud Detection Using Supervised, Unsupervised, and Social Network Analytics
Want more tips? Be sure to subscribe to the Ask the Expert board to receive follow up Q&A, slides and recordings from other SAS Ask the Expert webinars.
... View more