Model Studio for SAS Enterprise Miner Users: Part 1

6 Likes

The purpose of this blog is to compare SAS Enterprise Miner with SAS Model Studio. This blog is part 1 in what will be a multi-part blog series to help current SAS Enterprise Miner users transition to Model Studio. SAS Model Studio, a point and click interface for building machine learning models in SAS Viya, has been out for several years, so why blog about transitioning to it now? Well, the simple fact is that there are still many SAS customers that are currently making the move from SAS 9 (the foundation for SAS Enterprise Miner) to SAS Viya. So, comparing the two products is as relevant today as it was when Model Studio was first released.

Since I plan on releasing a few blogs on this topic, I’d like to start at a high level here and then drill into things with more specifics in following posts. SAS Enterprise Miner is SAS’s flagship data mining tool, that was designed around SAS 9 technologies. What’s the “engine under the hood” for SAS Enterprise Miner? For the most part, it essentially uses SAS 9 procedures but it does come with special procedures that were developed specifically for SAS Enterprise Miner to work with large data sets. It is a point-and-click interface, that can be installed and run on a desktop but its ideal use is at the enterprise level where it is a server-based application. The main idea behind SAS Enterprise Miner as a data mining tool is efficiency. SAS Enterprise Miner saves the analyst time and frustration by automatically taking care of tasks that would typically be done manually when writing code. Primarily SAS Enterprise Miner is centered around its supervised modeling capabilities, but it also has tools for data preparation, feature engineering, unsupervised modeling, model deployment and more.

For SAS Viya, there are three main areas for analysts to work: Data Mining and Machine Learning, Forecasting, and Text Analytics. Model Studio (henceforth for brevity, Model Studio) is a point and click interface for working in these three areas. The “engine under the hood” for Model Studio is SAS Viya procedures and the CAS server. CAS stands for Cloud Analytic Services and is the in-memory engine for SAS Viya. It works with distributed data using a massively parallel processing architecture. Like all Viya products, Model Studio is a web-based application. So, it is accessed using a web browser with a specific URL. Model Studio running on Viya 4.0, the latest release of Viya, can be installed on-site at an enterprise using their hardware, it can run in the cloud, or it can be third-party hosted such as by AWS. Thus, Model Studio is more modern and flexible compared to SAS Enterprise Miner in terms of its hosting options. Model Studio has the same goal as SAS Enterprise Miner in providing efficiency to the user through a point-and-click interface. Model Studio’s focus is on building models in the three areas stated above, but as with SAS Enterprise Miner, it also has capabilities in data preparation, feature engineering, unsupervised modeling, and model deployment.

Getting Started

Let’s start with comparing the first thing a user sees when they access each tool. For SAS Enterprise Miner, after logging in, the user sees a Welcome screen. (Side note: Logging in is only required for server-based instillations of SAS Enterprise Miner, not the desktop version.) Here’s what the Welcome screen looks like:

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

From the welcome screen users can perform actions such as creating a new project, accessing all existing or recently accessed projects, and even accessing SAS Enterprise Miner's help documentation. If the user chooses the option “Open Projects…” from the initial Welcome screen, a list of all existing projects is then shown:

To open a project, select it and click OK. I’ll address creating a new project later.

When a user accesses Model Studio, what they initially see is quite different compared to SAS Enterprise Miner. For Model Studio the user initially accesses a Projects page. The projects page automatically shows all current projects that exist in Model Studio and provides the option to create a new project. (Model Studio requires a log in, but the log in is done immediately upon accessing SAS Viya which takes users to SAS Drive. From SAS Drive, Model Studio is accessed.) Here’s what the Projects page looks like, for an installation where several projects already exist:

So unlike SAS Enterprise Miner, projects themselves are shown immediately and are directly accessible. The project type- Data Mining and Machine Learning, Forecasting, or Text Analytics- can also be seen for each existing project. Users can also create new projects from the projects page. If no projects have yet been created, then none exist to be shown, and the main option is simply to create a new project.

Organizational Hierarchy

Now let’s get into what I like to think of as the organizational hierarchy of each tool. What I mean by this is the top-down structure of the main components used in an analysis and how they assist an analyst in organizing their work. What’s the highest level of organization down to the smallest? SAS Enterprise Miner and Model Studio are somewhat similar in this regard. As what may be obvious from what is stated above, the highest level of organization for SAS Enterprise Miner is a project. Contained within a SAS Enterprise Miner project is one or more data sources. More details on data sources will come in a future blog. Also, contained within a project is one or more diagrams. Diagrams are the level of organization for SAS Enterprise Miner where the work is actually done. A Project Panel exists within a project that allows the user to navigate through what the project contains, in other words, navigate through the organizational hierarchy. Here’s a look at the Project Panel for a SAS Enterprise Miner project. It shows a project named Advanced Predictive Models and that the project contains several data sources (Claim, Fraud, Organics, etc.) and diagrams (Advanced Models, Claim, Donation Analysis, etc.).

Projects can also be as simple as containing a single data source and a single diagram. The user creates what’s known as a process flow in a diagram, where analytical tools are connected based on what is needed to be performed within an analysis. The analytical tools are known as nodes, and they’ll also be covered in detail in a future blog. So, the node is really the smallest level of the analysis. It represents a step performed in the analysis which is typically some action taken on the data. A node can be used to explore data, partition data, create new inputs, build supervised or unsupervised models, compare models, score new data with a model, and much, much more. Here’s a typical process flow contained in a SAS Enterprise Miner diagram. The process flow starts with a data node, includes some basic data preparation, builds four predictive models, and performs model comparison:

For Model Studio, the highest level of organization is also the project. When we drill into a Model Studio project, we do begin to see some differences from SAS Enterprise Miner. For Model Studio it is better to think of the project being linked to a data set, rather than the data being contained within the project. Model Studio works with data that is in-memory. Whereas in SAS Enterprise Miner you add data sources after the project is created, for Model Studio you have to link the project to an in-memory table during the process of creating the project. In fact, you cannot even create a project without linking it to a data source. There is no project panel in the same way there is in SAS Enterprise Miner. You navigate around the organizational hierarchy in a Model Studio project by using a series of tabs across the top of the project. Each tab takes you to a different place, or view, within the project. The tabs that you see are different depending on the type of Model Studio project. For Data Mining and Machine Learning projects there are four tabs: Data, Pipelines, Pipeline Comparison, and Insights. For Text Analytics projects there are two tabs: Data and Pipelines. And for Forecasting projects there are five tabs: Data, Pipelines, Pipeline Comparison, Overrides, and Insights. Here are the tabs for a Data Mining and Machine Learning project:

No matter the type of project, they must each be linked to a data source and consist of at least one pipeline where analytic activities take place. The Data tab is definitely something very different from SAS Enterprise Miner. Keep in mind that for Model Studio only a single data source applies to a project. Thus, more information about the data set and the variables it contains can be directly displayed and available to make changes, from the Model Studio project. On the Data tab, all columns included in the in-memory table are displayed with their associated metadata. In Model Studio, part of the variable metadata includes rules that may be assigned to each variable for various data preparation activities such as methods of imputation and types of transformations. I’ll cover this in more detail in another blog. The next tab on the interface is the Pipelines tab. Pipelines are structured flows of analytic actions. They are essentially like the diagrams and process flows in SAS Enterprise Miner. Pipelines are where the work is done in Model Studio. Just as SAS Enterprise Miner can contain a single or several diagrams, the same is true for pipelines within Model Studio. Here is a typical pipeline for a Data Mining and Machine Learning project. It performs similar actions as the SAS Enterprise Miner diagram above.

In Model Studio, pipelines are built vertically. In SAS Enterprise Miner, the user can decide whether to build the process flow horizontally (the default) or vertically.

For Data Mining and Machine Learning projects, the Pipeline Comparison tab is another feature in Model Studio that SAS Enterprise Miner does not have. When multiple models are built within a single pipeline, they are compared in a model comparison node (see the image above); just as multiple models can be compared with the same diagram in SAS Enterprise Miner. However, Model Studio has the ability to compare models across pipelines. The pipeline comparison allows for a sort of “champion of champions” comparison. Each pipeline may have its own champion model, but the Pipeline Comparison tab allows for an over-all project champion. The Pipeline Comparison tab automatically compares each pipeline’s champion model, but the user can add any other challenger models from pipelines they choose. Here’s a partial view of the pipeline comparison window for a Model Studio project containing two pipelines, where a logistic regression model has been declared the project champion based on the KS statistic:

For Forecasting projects, the Overrides tab provides a way for the analyst to apply their business knowledge to change the forecasted value for a certain time point that has been predicted by a model. The Overrides tab is only available for Forecasting projects.

Finally, the Insights tab provides a summary of the entire project. The Insights tab is only available for Data Mining and Machine Learning and Forecasting projects. It summarizes the data, states and summarizes the over-all champion model (selected from the Pipeline Comparison tab), and provides other information such as the most important variables across all models built in the project. The results of the Insights tab (and any node from within a pipeline, for that matter) can also be saved to a PDF for reporting purposes.

SAS Enterprise Miner has something close to insights, but it is not automatic, and it only summarizes a single process flow within a diagram rather than the entire project. The tool for providing this process flow summary is the Reporter node. The Reporter node states and summaries each node in the process flow. It creates a summary document as either a PDF or RTF.

Creating a New Project

Before we wrap up this initial blog in this series, let me compare how projects are created in each tool. SAS Enterprise Miner takes a wizard approach to creating a project. By “wizard” I’m referring to a process where one window is followed by the next as steps are performed to complete the task. A new project can be created from the initial Welcome screen or from the interface in general using the File pull-down menu at the top (see below) or a short-cut button ( ).

Once the initial request to create a new project is performed, SAS Enterprise Miner opens a Create New Project window.

The wizard for creating a new project goes through a 4-step process, assuming a full server-based installation. (It is a two-step process for the desk-top version.) The multi-step wizard asks the user to provide information such as the server where SAS Enterprise Miner processing will take place, the name of the project, and where the project files will be saved. The wizard also provides a summary of the new project information in the final step.

For Model Studio, a project can only be created from the initial Projects page. Rather than a step-by-step wizard, Model Studio opens a single New Project window where the user enters required information into several fields.

Required fields are indicated with a red asterisk. These fields are for the project Name, project Type (Data Mining and Machine Learning, Text Analytics, or Forecasting, depending on licensing), and Data. Optional fields are for pipeline Template and project Description. Advanced project options are also available with a single click, and I’ll cover these in an upcoming blog.

The next blog in this series will focus on how each tool looks at and uses data. I’ll discuss how data sources are created in SAS Enterprise Miner, the benefits of the Data tab in a Model Studio project, and some of the advanced project settings in Model Studio. Plans for other future blogs in this series include but are not limited to: creating a process flow in SAS Enterprise Miner versus creating a pipeline in Model Studio, importing SAS Enterprise Miner diagrams into Model Studio pipelines via batch code or score code, automation methods in each application, and a comparison of all supervised modeling methods available between the two tools.

Additional Resources concerning Model Studio

Training:

Machine Learning using SAS Viya: https://learn.sas.com/course/view.php?id=343

YouTube videos:

Build Models with SAS Model Studio/SAS Viya Quick Start Tutorial: https://www.youtube.com/watch?v=CtuBWJW46Zk

SAS Tutorial/End to End Model Building and Machine Learning in SAS Viya: https://www.youtube.com/watch?v=8Ey06q9FHyM

Blogs on Model Studio and Data Mining and Machine Learning:

An Introduction to Machine Learning in SAS Model Studio: https://communities.sas.com/t5/SAS-Communities-Library/An-Introduction-to-Machine-Learning-in-SAS-Mo...

SAS Visual Data Mining and Machine Learning: Getting Started: https://communities.sas.com/t5/Ask-the-Expert/SAS-Visual-Data-Mining-and-Machine-Learning-VDMML-Gett...