Data scientists, data miners and other analysts that use machine learning and predictive analytics to address business problems often face a dilemma – there is much more work to do after the time allocated to the project runs out. Since building predictive models is still largely an art rather than a science, providing more opportunities to test hypotheses, experiment with data and learn algorithms will lead to increased confidence in the output. With a factory modeling approach, analysts can automate the science so the there is more time for the art.
Often data scientists want to experiment with different levels of granularity in the data. Will my predictive models become more accurate if I introduce strata in my data, or in other words, will models for pre-defined segments in my population perform better than a generic model for the entire population? We call this stratified modeling, and often, it turns out that stratified models perform better.
However, if you have a large number of segments or hierarchical segments in your data, the number of models can grow quickly. Consider a marketing campaign with three customer value segments (low, medium, high) and 20 different regions (countries, states, etc.). The combination of the two segmentations in your data would create 60 models. Now, add in the different learning algorithms (decision trees, random forests, support vector machines etc.) or even different data preparation strategies (variable transformation, feature generation, feature selection).The number of experiments to run grows very fast.
This is where SAS Factory Miner comes in.
Imagine you have an application that allows you to quickly set up experiments based on your selection of data segments and machine learning techniques. Then it sends off these experiments to run in a scalable environment in parallel. The resulting reports allow you to quickly identify the winner of the (model) tournament and problem areas so that you can model by exception and only focus on the challenging, high value or problematic segments.
The report below is an example of a modeling tournament created with 150 segments and five modeling pipelines per segment. Based on pre-selected performance indicators, the models are ranked by their performance on validation data. The champion model is identified for each segment, and we can see that different modeling strategies prevail in different segments.
SAS Factory Miner: Stratified Model Performance Report (click image to enlarge)
You can select thresholds to identify the underperforming models quickly. Now, as a data scientist or data miner, of course, I want to improve models that do not perform well in an automated setting. I want to open the box and fine-tune the process where it is not working. SAS Factory Miner allows me to drill into each model template. A model template is a sequence of components that lead from the raw data to the model results. Often, data transformation, data imputation, feature generation and feature selection tasks precede the actual learning algorithms. And data scientists find that the secret sauce of successful predictive models lies in the data transformation steps, more so than in the actual learning algorithm.
So being able to customize every step of the default modeling process and save my winning settings as a customized template is a huge productivity booster. Also, I can share my favorite templates with my colleagues to make them more productive. I can even use these templates to guide my analytically-curious colleagues to take their first steps into the world of machine learning. Before I know it, I’m starting to expand the analytical resources in my organization.
Now, I’ve just created hundreds of models with a click of a button. I evaluated them, found some underperforming models, and fine-tuned them all in a matter of hours - not days or weeks. What do I do with all these models? Right, I go and talk to IT to implement the models to select the right targets for our next personalized customer marketing campaigns. We all know that this will take some time.
Or, I can push out my portfolio of winning models (we call them champions) to the SAS Enterprise Decision Management environment (e.g., SAS Model Manager or SAS Decision Manager) so they become available for the design and execution of operational decision flows. That is, we can now design decision flows that call a specific analytical model (the right model for the right segment strata) whenever a customer calls into the call center, visits a branch or identifies herself on a company webpage.
And next month or even next week, I can do all this again - because SAS Factory Miner stores all the assets to retrain my model portfolio on new data. Just provide the refreshed data and push the button. All your model flow selections will run on the new data, you can easily assess if the new data has had an impact on your model portfolio, and whether or not you need to replace the production champion models.
So, to summarize, with SAS Factor Miner you can:
See SAS Factory Miner in action by watching this demo. What do you think?