BookmarkSubscribeRSS Feed

Model Studio for Enterprise Miner Users: Part 1

Started ‎02-13-2024 by
Modified ‎02-13-2024 by
Views 357

 

The purpose of this blog is to compare SAS Enterprise Miner with SAS Model Studio. This blog is part 1 in what will be a multi-part blog series to help current Enterprise Miner users transition to Model Studio. SAS Model Studio, a point and click interface for building machine learning models in SAS Viya, has been out for several years, so why blog about transitioning to it now? Well, the simple fact is that there are still many SAS customers that are currently making the move from SAS 9 (the foundation for Enterprise Miner) to SAS Viya. So, comparing the two products is as relevant today as it was when Model Studio was first released.

 

Since I plan on releasing a few blogs on this topic, I’d like to start at a high level here and then drill into things with more specifics in following posts. SAS Enterprise Miner (henceforth for brevity, E-Miner) is SAS’s flagship data mining tool, that was designed around SAS 9 technologies. What’s the “engine under the hood” for E-miner? For the most part, it essentially uses SAS 9 procedures but it does come with special procedures that were developed specifically for E-Miner to work with large data sets. It is a point-and-click interface, that can be installed and run on a desktop but its ideal use is at the enterprise level where it is a server-based application. The main idea behind E-Miner as a data mining tool is efficiency. E-Miner saves the analyst time and frustration by automatically taking care of tasks that would typically be done manually when writing code. Primarily E-Miner is centered around its supervised modeling capabilities, but it also has tools for data preparation, feature engineering, unsupervised modeling, model deployment and more.

 

For SAS Viya, there are three main areas for analysts to work: Data Mining and Machine Learning, Forecasting, and Text Analytics. Model Studio (henceforth for brevity, M-Studio) is a point and click interface for working in these three areas. The “engine under the hood” for Model Studio is SAS Viya procedures and the CAS server. CAS stands for Cloud Analytic Services and is the in-memory engine for SAS Viya. It works with distributed data using a massively parallel processing architecture. Like all Viya products, M-Studio is a web-based application. So, it is accessed using a web browser with a specific URL. M-Studio running on Viya 4.0, the latest release of Viya, can be installed on-site at an enterprise using their hardware, it can run in the cloud, or it can be third-party hosted such as by AWS. Thus, M-Studio is more modern and flexible compared to E-Miner in terms of its hosting options. M-Studio has the same goal as E-Miner in providing efficiency to the user through a point-and-click interface. M-Studio’s focus is on building models in the three areas stated above, but as with E-Miner, it also has capabilities in data preparation, feature engineering, unsupervised modeling, and model deployment.

 

Getting Started

 

Let’s start with comparing the first thing a user sees when they access each tool. For E-Miner, after logging in, the user sees a Welcome screen. (Side note: Logging in is only required for server-based instillations of E-Miner, not the desktop version.) Here’s what the Welcome screen looks like:

 

JT_1_EM_welcome_page_151.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

From the welcome screen users can perform actions such as creating a new project, accessing all existing or recently accessed projects, and even accessing E-Miner’s help documentation. If the user chooses the option “Open Projects…” from the initial Welcome screen, a list of all existing projects is then shown:

 

JT_2_EM_open_a_project.png

 

To open a project, select it and click OK. I’ll address creating a new project later.

 

When a user accesses M-Studio, what they initially see is quite different compared to E-Miner. For M-Studio the user initially accesses a Projects page. The projects page automatically shows all current projects that exist in M-Studio and provides the option to create a new project. (M-Studio requires a log in, but the log in is done immediately upon accessing SAS Viya which takes users to SAS Drive. From SAS Drive, M-Studio is accessed.) Here’s what the Projects page looks like, for an installation where several projects already exist:

 

JT_3_MS_proj_page.png

 

So unlike E-Miner, projects themselves are shown immediately and are directly accessible. The project type- Data Mining and Machine Learning, Forecasting, or Text Analytics- can also be seen for each existing project. Users can also create new projects from the projects page. If no projects have yet been created, then none exist to be shown, and the main option is simply to create a new project.

 

Organizational Hierarchy

 

Now let’s get into what I like to think of as the organizational hierarchy of each tool. What I mean by this is the top-down structure of the main components used in an analysis and how they assist an analyst in organizing their work. What’s the highest level of organization down to the smallest? E-Miner and M-Studio are somewhat similar in this regard. As what may be obvious from what is stated above, the highest level of organization for E-Miner is a project. Contained within an E-Miner project is one or more data sources. More details on data sources will come in a future blog. Also, contained within a project is one or more diagrams. Diagrams are the level of organization for E-Miner where the work is actually done. A Project Panel exists within a project that allows the user to navigate through what the project contains, in other words, navigate through the organizational hierarchy. Here’s a look at the Project Panel for an E-Miner project. It shows a project named Advanced Predictive Models and that the project contains several data sources (Claim, Fraud, Organics, etc.) and diagrams (Advanced Models, Claim, Donation Analysis, etc.).

 

JT_4_EM_project_panel.png

 

Projects can also be as simple as containing a single data source and a single diagram. The user creates what’s known as a process flow in a diagram, where analytical tools are connected based on what is needed to be performed within an analysis. The analytical tools are known as nodes, and they’ll also be covered in detail in a future blog. So, the node is really the smallest level of the analysis. It represents a step performed in the analysis which is typically some action taken on the data. A node can be used to explore data, partition data, create new inputs, build supervised or unsupervised models, compare models, score new data with a model, and much, much more. Here’s a typical process flow contained in an E-Miner diagram. The process flow starts with a data node, includes some basic data preparation, builds four predictive models, and performs model comparison:

 

JT_5_EM_diagram.png

 

For M-Studio, the highest level of organization is also the project. When we drill into an M-Studio project, we do begin to see some differences from E-Miner. For M-Studio it is better to think of the project being linked to a data set, rather than the data being contained within the project. M-Studio works with data that is in-memory. Whereas in E-Miner you add data sources after the project is created, for M-Studio you have to link the project to an in-memory table during the process of creating the project. In fact, you cannot even create a project without linking it to a data source. There is no project panel in the same way there is in E-Miner. You navigate around the organizational hierarchy in an M-Studio project by using a series of tabs across the top of the project. Each tab takes you to a different place, or view, within the project. The tabs that you see are different depending on the type of M-Studio project. For Data Mining and Machine Learning projects there are four tabs: Data, Pipelines, Pipeline Comparison, and Insights. For Text Analytics projects there are two tabs: Data and Pipelines. And for Forecasting projects there are five tabs: Data, Pipelines, Pipeline Comparison, Overrides, and Insights. Here are the tabs for a Data Mining and Machine Learning project:

 

JT_6_MS_DMML_tabs.png

 

No matter the type of project, they must each be linked to a data source and consist of at least one pipeline where analytic activities take place. The Data tab is definitely something very different from E-Miner. Keep in mind that for M-Studio only a single data source applies to a project. Thus, more information about the data set and the variables it contains can be directly displayed and available to make changes, from the M-Studio project. On the Data tab, all columns included in the in-memory table are displayed with their associated metadata. In M-studio, part of the variable metadata includes rules that may be assigned to each variable for various data preparation activities such as methods of imputation and types of transformations. I’ll cover this in more detail in another blog. The next tab on the interface is the Pipelines tab. Pipelines are structured flows of analytic actions. They are essentially like the diagrams and process flows in E-Miner. Pipelines are where the work is done in M-Studio. Just as E-miner can contain a single or several diagrams, the same is true for pipelines within M-Studio. Here is a typical pipeline for a Data Mining and Machine Learning project. It performs similar actions as the E-Miner diagram above.

 

JT_7_MS_DMML_pipeline.jpg

 

In M-Studio, pipelines are built vertically. In E-Miner, the user can decide whether to build the process flow horizontally (the default) or vertically.

 

For Data Mining and Machine Learning projects, the Pipeline Comparison tab is another feature in M-Studio that E-Miner does not have. When multiple models are built within a single pipeline, they are compared in a model comparison node (see the image above); just as multiple models can be compared with the same diagram in E-Miner. However, M-Studio has the ability to compare models across pipelines. The pipeline comparison allows for a sort of “champion of champions” comparison. Each pipeline may have its own champion model, but the Pipeline Comparison tab allows for an over-all project champion. The Pipeline Comparison tab automatically compares each pipeline’s champion model, but the user can add any other challenger models from pipelines they choose. Here’s a partial view of the pipeline comparison window for an M-Studio project containing two pipelines, where a logistic regression model has been declared the project champion based on the KS statistic:

 

JT_8_MS_pipe_compare.png

 

For Forecasting projects, the Overrides tab provides a way for the analyst to apply their business knowledge to change the forecasted value for a certain time point that has been predicted by a model. The Overrides tab is only available for Forecasting projects.

 

Finally, the Insights tab provides a summary of the entire project. The Insights tab is only available for Data Mining and Machine Learning and Forecasting projects. It summarizes the data, states and summarizes the over-all champion model (selected from the Pipeline Comparison tab), and provides other information such as the most important variables across all models built in the project. The results of the Insights tab (and any node from within a pipeline, for that matter) can also be saved to a PDF for reporting purposes.

 

JT_9_MS_insights.png

 

E-Miner has something close to insights, but it is not automatic, and it only summarizes a single process flow within a diagram rather than the entire project. The tool for providing this process flow summary is the Reporter node. The Reporter node states and summaries each node in the process flow. It creates a summary document as either a PDF or RTF.

 

Creating a New Project

 

Before we wrap up this initial blog in this series, let me compare how projects are created in each tool. E-Miner takes a wizard approach to creating a project. By “wizard” I’m referring to a process where one window is followed by the next as steps are performed to complete the task. A new project can be created from the initial Welcome screen or from the interface in general using the File pull-down menu at the top (see below) or a short-cut button (  JT_10_EM_shortcut_button_new.png ).

 

JT_11_EM_new_proj_pulldown_menu.png

 

Once the initial request to create a new project is performed, E-Miner opens a Create New Project window.

 

JT_12_EM_project_wizard.png

 

The wizard for creating a new project goes through a 4-step process, assuming a full server-based installation. (It is a two-step process for the desk-top version.) The multi-step wizard asks the user to provide information such as the server where E-Miner processing will take place, the name of the project, and where the project files will be saved. The wizard also provides a summary of the new project information in the final step.

 

For M-Studio, a project can only be created from the initial Projects page. Rather than a step-by-step wizard, M-Studio opens a single New Project window where the user enters required information into several fields.

 

JT_13_MS_new_project.png

 

Required fields are indicated with a red asterisk. These fields are for the project Name, project Type (Data Mining and Machine Learning, Text Analytics, or Forecasting, depending on licensing), and Data. Optional fields are for pipeline Template and project Description. Advanced project options are also available with a single click, and I’ll cover these in an upcoming blog.

 

The next blog in this series will focus on how each tool looks at and uses data. I’ll discuss how data sources are created in E-Miner, the benefits of the Data tab in an M-Studio project, and some of the advanced project settings in M-Studio. Plans for other future blogs in this series include but are not limited to: creating a process flow in E-Miner versus creating a pipeline in M-Studio, importing E-Miner diagrams into M-Studio pipelines via batch code or score code, automation methods in each application, and a comparison of all supervised modeling methods available between the two tools.

 

Additional Resources concerning Model Studio

 

Training:

Machine Learning using SAS Viya: https://learn.sas.com/course/view.php?id=343

 

YouTube videos:

Build Models with SAS Model Studio/SAS Viya Quick Start Tutorial: https://www.youtube.com/watch?v=CtuBWJW46Zk

SAS Tutorial/End to End Model Building and Machine Learning in SAS Viya: https://www.youtube.com/watch?v=8Ey06q9FHyM

 

Blogs on Model Studio and Data Mining and Machine Learning:

An Introduction to Machine Learning in SAs Model Studio: https://communities.sas.com/t5/SAS-Communities-Library/An-Introduction-to-Machine-Learning-in-SAS-Mo...

SAS Visual Data Mining and Machine Learning: Getting Started: https://communities.sas.com/t5/Ask-the-Expert/SAS-Visual-Data-Mining-and-Machine-Learning-VDMML-Gett...

Version history
Last update:
‎02-13-2024 02:49 PM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags