Model Studio for SAS Enterprise Miner Users: Part 5, Building Models…Let’s get physical!

1 Like

This post will focus on the differences and similarities between model building in SAS Enterprise Miner versus Model Studio. So why in the title am I saying we’re gonna get physical? Is it because I love the Olivia Newton-John song, “Physical”? Although that may be true, that’s not why I chose that title. In this post, I’m going to cover the physical aspect of how models are built in SAS Enterprise Miner compared to Model Studio. I’m not going to get into technical details of the models such as how to choose hyper-parameters or even the type of model to build, but rather compare the mechanics of building models in these two predictive modeling workhorses. The physical actions of what the analyst does to not only build models but perform any analysis in these two tools is far more different than they are similar. Compared to SAS Enterprise Miner, Model Studio really makes building a pipeline, which is the workflow that contains models, much easier (at least in the opinion of this data scientist!). Easier mechanics behind building models allows the data scientist to focus on the analysis and minimizes potential frustrations working with the interface. Although the process in Model Studio is easier, if you are making the move from SAS Enterprise Miner, you’ll have a big advantage knowing ahead of time what the differences are.

Before we get into it, I need to remind you that some terminology and features that I refer to for both tools may have been covered earlier in this series. So, if you haven’t checked out my earlier posts yet, I suggest you take a minute to do that first:

Model Studio for SAS Enterprise Miner Users: Part 1

Model Studio for SAS Enterprise Miner Users: Part 2, Data

Model Studio for SAS Enterprise Miner Users: Part 3, Let’s get philosophical

Model Studio for SAS Enterprise Miner Users: Part 4, Partitioning Data

Caveat:

Allow me to make one statement out of the gate for posterity. The physical act of “building models” in both tools is the same as doing any type of analysis. Meaning, whether I am building a model, doing data exploration, performing a clustering analysis, or doing data preprocessing, the actions I’m about to discuss are the same. In fact, I’ve mentioned that there are three modeling applications available for Model Studio and thus three types of Model Studio projects that can be built (depending of course on licensing): Data Mining and Machine Learning, Text Analytics, and Forecasting. The physical act of doing an analysis for each type of project is exactly the same. This all said, as we move forward, I’ll generically talk about “building models” because that is the main purpose of each tool. Just keep in mind that nothing in the physical process changes if you are doing some other type of analysis. And, for Model Studio, my examples will be based on building a predictive model for a Data Mining and Machine Learning project.

SAS Enterprise Miner:

Drag-and-Drop:

The idea with physically building models in SAS Enterprise Miner can be thought of as “drag-and-drop”. Although we’ll discuss some “point-and-click” ways to build models, we’ll start with what we initially teach in our classes and what I think most SAS Enterprise Miner users probably do. Of course, before building a model or doing any type of analysis, a project, diagram, and data source must all be created first. All analyses will start with data, so that is almost always going to be the first node brought into a diagram; the data source node. To bring the data source node into a diagram, point to the data source name in the project panel with your cursor, left click on it, keep the mouse button depressed, then move your cursor into the diagram. If you have successfully “grabbed” the data source, the mouse arrow shows an additional plus sign in a square to indicate the data source is selected and can be moved. Put the cursor in the diagram where you want to place the data node and release the mouse button. Diagram flows in SAS Enterprise Miner can be built horizontally or vertically in similar manners. I’ll discuss horizontally built diagrams but will also let you know how to change from one to the other later.

The screen shot below shows what has been done to this point. A project, data source, and diagram have all been created. The data source, commsdata, has been placed into a diagram called blog. Although we won’t really be analyzing the data it comes from a telecommunications company that is trying to predict customer churn. Let’s say we want to explore the data with the StatExplore node, which is found above the Explore tab in the SEMMA tools palette. Click on the Explore tab to activate it. The StatExplore node is third from the right.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

Place the StatExplore node in the diagram in a similar way that the data source was brought in. Left click on the node icon above the Explore tab, keep the mouse button depressed, drag the node into the diagram where you want to place it, and then release the mouse button. When you have correctly “grabbed” the node, the cursor shows the same plus sign in a square that was shown when the data source was grabbed. To build a horizontal flow, place the node to the right of the data source. Technically you can place the node anywhere in the diagram, but I suggest placing the node somewhat close to (but not right next to) the data node, especially if you know you will be adding additional nodes to the diagram. Leave a little space between the nodes because this will help when you want to connect them.

To connect the COMMSDATA node into the StatExplore node is almost like a manual dexterity test. Until you get used to it, be patient with getting it to work! Point your mouse cursor to the right-hand edge of the data source node. You’ll see that two things happen (but only one of which I can show you here because of limitations with my screen capture application). When SAS Enterprise Miner is ready to make a connection, and you are pointed to the right-hand edge of the first node, two visual indicators are shown signifying SAS Enterprise Miner is ready to connect nodes: the mouse cursor arrow changes to a pencil and a small grey rectangle protrudes from the first node. Notice the grey rectangle below:

Once SAS Enterprise Miner shows you both visual indicators, left click with your mouse and keep the button depressed so that you can grab an arrow out of the first node to connect to the second node.

Unfortunately, I cannot show you that the mouse cursor has changed to a pencil in the screen capture but notice the arrow coming from the grey rectangle in the data source node. Bring the arrow into the left-hand edge of the node you want to connect to, in this case, StatExplore, and when the arrow is pointing to the correct edge, release the mouse button.

Congratulations!! You’ve connected nodes and now have a process flow. Now the data (including metadata) flows from the COMMSDATA data source node into the StatExplore node. When the diagram is run, the data are processed by the second node to produce summary statistics. Although I will not do it here, (keep in mind this post covers the physical aspect of how to build models) there are a few ways to run the StatExplore node. One way is to right click on the node and select Run. Another way is to click on the node you want to run and then open the Actions menu at the top of the interface and select Run.

Yet another option is to select the desired node and then click the Run short-cut button found just below the drop-down menus.

Adding additional steps to the analysis is done by adding more nodes to the diagram.

Nodes do not have to be connected in a perfect line either; the diagram can branch out. Suppose in the diagram we’ve built up to this point, we want to next partition the data. We do not need to connect the Data Partition node to the StatExplore node (although we could). The StatExplore node can be like a “dead end” where no nodes need to be connected after it. That’s because the StatExplore node is not altering or changing the data in any way. There’s nothing the node “does” to the data that requires it to be connected directly to other nodes. But that’s the same reason we could connect another node to it. The data would simply flow through that node and if we chose to connect Data Partition to it, the unaltered data would simply pass through StatExplore and flow into the Data Partition node. Keep in mind SAS Enterprise Miner is completely flexible in how diagrams are built, giving full control to the analyst. Below is what the process flow looks like when I connect the data node to the Data Partition node in a new branch. (The Data Partition node is found above the Sample tab in the SEMMA palette.)

The process flow, and thus the analysis, continues from there. The diagram is ready for more data preprocessing, or a supervised modeling node can be added if we’re ready to build a model.

There may be a reason to disconnect nodes. First, left click on the connection you want to remove. SAS Enterprise Miner indicates what connection you have selected by placing a yellow square at the start and end of the connection arrow. The properties panel even provides information about what connection is selected by stating the From and To nodes properties.

Once the connection is selected, right click on the selected arrow and a Delete option appears.

Click Delete and the connection is removed. I won’t delete the connection in the current diagram. Nodes themselves can also be deleted with a right click on the desired node and then selecting Delete.

Alternative methods:

Before comparing and contrasting with Model Studio, let me mention a few alternate ways to build models. In SAS Enterprise Miner, if you can do something one way, probably it can be done three or four other ways as well! (I may be exaggerating a tiny bit, but SAS Enterprise Miner as well as Model Studio do typically give you alternate ways to perform actions.) Once a user is used to the drag-and-drop method, I think it is the most efficient and fastest way to build a process flow. However, I’ll mention a few point-and-click methods. One starts from the Actions menu at the top of the interface. By clicking Actions > Add Node, a menu shows all available tabs in the SEMMA tools palette. If you are ready to add a model to the diagram, click Actions > Add Node > Model and a list of all model nodes from the Model tab are shown.

Click on the desired model and SAS Enterprise Miner adds the node to the diagram. SAS Enterprise Miner may not drop the node exactly where you want it so simply click on it, drag and move it to where you want it to go.

You can also connect and disconnect nodes using the Actions menu. I’ve added a Gradient Boosting node and moved it to where I want it to be. To connect Data Partition into it, I first select the Data Partition node, then I select Actions > Connect Nodes. A pop-up appears that shows what nodes the Data Partition node can connect to. Be careful, as even nodes you may not want to connect, are shown.

Select the desired node and click OK.

You can also perform these same point-and-click actions from within the diagram itself, meaning, you don’t need to use the Actions pull-down menu. To add a node, right click in any blank area of the diagram workspace and select Add Node.

To connect nodes, select a node, right click on it, then select Connect Nodes from the bottom of the menu that appears.

One final note. I mentioned earlier that workflows can be built horizontally or vertically. If you are still using SAS Enterprise Miner to build models, this is something you may want to play with to see which you like best. To change the flow from horizontal to vertical, right click in any blank area of the diagram workspace, and select Layout.

To change to the vertical arrangement if you started to build horizontally, select Vertically.

If your process flow starts to get too big and you feel like the nodes are not neatly arranged, you can use this same Layout feature and Enterprise Miner will arrange your nodes in a neater order.

Model Studio:

Drag-and-Drop:

In my earlier post on partitioning data, I said that you need to have a totally different mindset when you think of partitioning data as you move from SAS Enterprise Miner to Model Studio. “Having a totally different mindset” may be too strong of a statement when we think of building models in Model Studio compared to SAS Enterprise Miner, but I do want you to be prepared for a lot of differences. Model Studio pipelines are similar to SAS Enterprise Miner diagrams, but there’s fewer decisions and less flexibility left to the user in terms of how the pipeline flows are built. Less flexibility may not sound like a good thing, but in this case, I think it allows the analyst to focus more on the task at hand, the analysis! They need not be bogged down with things like deciding where to place nodes or whether they pass or fail the manual dexterity test in connecting nodes. For example, in SAS Enterprise Miner you can build diagram flows horizontally or vertically. In Model Studio, only vertical flows can be constructed. Let’s get into it!

To make the transition to Model Studio easiest, of the handful of ways there are to build models, I’ll describe the drag-and-drop method first, as that’s what I started with for SAS Enterprise Miner. Just as in the example above, to get to the point of the screen shot below, a project has been created and the project uses the same data, commsdata. This is a Data Mining and Machine Learning (DMML) project to align most closely with SAS Enterprise Miner functionality. We are on the pipelines tab and based on how the project was created, a blank pipeline template is being used. The blank pipeline includes only a data node. Suppose we first want to explore the data using the Data Exploration node. For the drag-and-drop method, the first action is to expand the Nodes pane by clicking the nodes short-cut button which is to the left of the Pipeline 1 tab.

The menu of areas of application for DMML projects is shown. It includes Data Mining Preprocessing, Supervised Learning, Postprocessing, and Miscellaneous.

The Data Exploration node is found in the Miscellaneous group, so expand that portion of the menu.

Left click on the Data Exploration node and with the mouse button depressed, drag the node into the pipeline and hover it over the Data node.

Although I cannot show it due to limitations of my screen capture application, Model Studio shows visual indicators in terms of where you can and cannot drop the node. When the Data Exploration node is hovered over the Data node, the mouse cursor shows a plus sign, indicating the node can be added there. (The plus sign is the same image shown in SAS Enterprise Miner when a node has been successfully grabbed.) Otherwise, the mouse cursor shows a red “do-not-enter” sign (red circle with a diagonal line through it). So unlike SAS Enterprise Miner, Model Studio helps guide you to where the node can be placed. You cannot drop the node just anywhere in the pipeline, like you can in a SAS Enterprise Miner diagram. With the Data Exploration node hovering over the data node (and the mouse cursor showing the plus sign) release the mouse button. Model Studio places the node into a “swim lane” in the process flow and automatically connects the two nodes.

Even though I will not do it (again, I want to focus on building models), to run the Data Exploration node you can click the Run pipeline button. This of course runs the entire pipeline and if there are additional nodes that have been added all nodes will run. To run the pipeline up to and including the desired node, open the node menu by either clicking the 3 vertical dots shown within the node on the right or right click the node. Then click Run.

Adding additional nodes to the analysis is done in a similar way, but keep in mind that Model Studio helps you by letting you know where you can and cannot place nodes. Suppose we what to add an Imputation node. The flexibility of SAS Enterprise Miner means that you can add an Imputation node directly after a StatExplore node and connect StatExplore into Imputation. This is not the case in Model Studio. In the Nodes pane, the Imputation node is found in the Data Mining Preprocessing group. With the Nodes pane and the Data Mining Preprocessing group expanded, drag the Imputation node into the pipeline and hover it over the Data node. (The mouse cursor will show a plus sign.)

Release the mouse button.

Model Studio adds the Imputation node in a new swim lane and automatically connects the Data node to it. If you hovered the Imputation node over the Data Exploration node, the mouse cursor would show the red do-no-enter sign indicating that is not a valid place to drop the node. For Model Studio, the Data Exploration node is one of the few that are like a dead-end node; no nodes can be added after it.

You cannot delete connections as you can in SAS Enterprise Miner, but you can delete nodes. Select the node you want to delete. Open the node menu (either right-click on the node or click the 3 vertical dots within the node) and select Delete.

Alternate methods:

Just as SAS Enterprise Miner had a point-and-click method of adding nodes in addition to the drag-and-drop method, so does Model Studio. This consistency between the two tools is nice for those moving to Model Studio from SAS Enterprise Miner. Whatever the preferred method of the analyst was to building models in SAS Enterprise Miner, there’s likely something similar in Model Studio. Unlike SAS Enterprise Miner where the point-and-click method was based on, for example, the Actions pull-down menu at the top of the interface, for Model Studio, the point-and-click method uses the node menu which is opened from within nodes.

Let’s add a Gradient Boosting model after our Imputation node. Right click on or click the 3 vertical dots within the Imputation node to reveal the node menu. Expand Add child node. (We’ll discuss the Add parent node option in a bit.)

Of the 4 areas of application shown once Add child node is expanded, notice that the Postprocessing group is greyed out. The Postprocessing group (which contains only the Ensemble node) is not available when a user tries to add a child node after a Data Mining Preprocessing node. This makes sense as an Ensemble model must follow a Supervised Learning model. Model Studio does not allow you to do something that does not make analytical sense. Pretty smart and user friendly, huh?!

Expand Supervised Learning and select Gradient Boosting.

A Gradient Boosting node is added in the same swim lane as the Imputation node and a Model Comparison node is automatically added at the bottom of the pipeline. Also, the Imputation node is automatically connected to the Gradient Boosting node and the Gradient Boosting node is automatically connected to the Model Comparison node. (Side note: Nodes are colored according to what group they belong to.) Continue the analysis by continuing to add desired nodes.

Allow me to circle back to the Add parent node option on the node menu. To illustrate how it works, let’s suppose in our pipeline we want to do a bit more data preprocessing before we build our Gradient Boosting model. Let’s say we want to add a Variable Selection node after imputation but before the model. Let’s see what happens when I add a Variable Selection node as a child node to Imputation. Open the node menu on the Imputation node and select Add child node > Data Mining Preprocessing > Variable Selection.

Notice the Variable Selection node is added to a new swim lane. Anytime you add a child node, it is placed in a new swim lane. But I want to add Variable Selection between Imputation and Gradient Boosting! This is where Add parent node is used. Delete the current Variable Selection node. Open the node menu on the Gradient Boosting node and expand Add parent node.

First, notice that the Supervised Learning and Postprocessing groups are not available. Again, Model Studio will not allow you to add nodes that do not make analytical sense. Expand Data Mining Preprocessing and select Variable Selection.

Now the Variable Selection node is placed within the current swim lane, between the Imputation and Gradient Boosting nodes. So, the Add parent node feature is the method that is used to place a node between nodes, within a current swim lane.

I think there is one final comment to make before wrapping this one up. Seasoned SAS Enterprise Miner users may know that SAS Enterprise Miner allows a supervised model to connect to another supervised model within a diagram. This would be done, for example, if we wanted to use the inputs selected by one model (say, a decision tree) and pass those selected inputs on to another model (say, a neural network). The reason this is not allowed in Model Studio is that the Variable Selection node itself allows certain supervised modeling algorithms to be used to select inputs. Thus, if we wanted a decision tree to select inputs for a neural network in Model Studio, we would use a Variable Selection node with the decision tree option turned on and connect that node to a neural network node to pass along selected inputs.

SAS Enterprise Miner and Model Studio are both predictive modeling workhorses. Keep in mind, however, that they can each also be used to perform other types of analysis aside from just building supervised models. SAS Enterprise Miner is more flexible when it comes to building models, but the physical act of doing so may take a bit of practice. Model Studio actually helps the analyst build a workflow by removing mundane tasks from the analyst such as where to place nodes or how to connect them. Your transition from SAS Enterprise Miner into the modern and exciting world of Model Studio will be easier if you understand the similarities and differences between how you physically build models in each. I hope this post has been helpful! Now, all that’s left to be said on this topic, as Olivia Newton-John surely would agree, is “Let’s get physical!!”

Finally, I’ll tease you with some future topics you’ll likely see in this ongoing series. In no particular order, in future posts I plan on covering topics such as automation capabilities of the tools, how each tool handles feature creation and feature selection, and the model interpretability and bias assessment features of Model Studio. If there’s a topic you’d like me to cover, please leave a comment about it below.

Find more articles from SAS Global Enablement and Learning here.

SAS Communities Library