Interactively Building a Decision Tree in Model Studio Pipeline

The Decision Tree node in Model Studio pipeline can build decision tree models autonomously or interactively. To build a model autonomously, simply run the Decision Tree node. Building models interactively, however, gives you more control and can be more informative and is helpful when you first learn about creating decision trees, and even when you are more expert at predictive modeling.

After you run a Decision Tree node in Model Studio pipeline, you can modify the splitting and pruning logic of nodes when you open the decision tree. The splitting logic that you can modify varies, depending on whether you want to modify a category predictor (class input) or a measure predictor (continuous input).

Interactively Editing a Decision Tree

To edit split for a node, right-click the Decision Tree node and select Open. Switch to the Stack view from the top right corner of the window. Stack view displays the objects as if they are in a slide deck. Only one object is displayed at a time, however, a control bar lets you move between objects. By default, Decision tree object is open.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

Right-click the node in the decision tree that you want to split and select Edit split for node. The Split Node window appears.

Decision tree models involve recursive partitioning of the training data in an attempt to isolate concentrations of cases with identical target values. Regardless of the type of predictor that you select, you can edit rules when splitting a node.

Under Variable, select the variable that you want to use for splitting. Variables are sorted by -Log(p) or logworth. The logworth is used to determine how well a variable divides the data. As the logworth increases, the partition better isolates cases with identical target values. You can choose to split the data on any of the listed inputs and also edit the splitting point given additional information.

Under Branches, specify the splitting criteria for each branch that stems from a node. You can directly replace the values that you want to use for your splitting criteria in the Values field.

You can also add branches and specify the splitting criteria for each of those branches. If you want to add branches to your splitting criteria, click Add new branch. You can clear all of the branches or reset the branches to reflect the best split by clicking the Actions (the three vertical dots icon) and selecting either Set to best split or Clear all branches. Any changes that are made at this step override the value that is defined for Maximum number of branches in the options pane for the Decision Tree node.

Under Missing values, specify how to treat missing values when splitting the node.

Once done with editing the splitting rules, click OK to apply your changes to the decision tree. You can undo or redo edits that were made to your decision tree by clicking Undo or Redo icons in the upper right corner of the window.

Interactively Pruning and Training a Decision Tree

In order to prune your decision tree to a node, right-click the node that you want to prune to and select Prune to node. The Prune to node option removes all nodes beneath the selected node and turns that node into a leaf node, resulting in a little simpler tree.

If you want to undo pruning, you have two options – you can either split the node one level or you can train more than one level beyond the leaf node.

For the former one, select Split from node and the Split Node window appears to specify the splitting rules. You can select the variable that will be used to split the node one level and edit the splitting point.

For the latter one, select the Train from node option and the Train Node window appears to select the variables to use in splitting. Variables are still sorted in descending order by their logworths at that node.

The Train node window appears with all the variables selected by default. However, you can deselect variables in the list based on business requirements or based on any other reasons. You can also change the Maximum depth of subtree.

Interactive Editor

When you close the interactive editor, you are prompted to save your changes. To save your changes, click Save and Run. If you do not want to save your changes, click Don’t Save. To continue editing your decision tree, click Cancel.

You cannot copy a Decision Tree node when it is in interactive mode. To copy a Decision Tree node, right-click the node and select Enable Properties. When the Enable Properties window appears, select Yes. You can now copy the Decision Tree node. When you enable properties after interactively editing a decision tree, the changes that you made to the decision tree are lost. During an interactive editing session, the Decision Tree node is locked so that other users cannot simultaneously make edits in interactive mode. The node is also locked when a second user who has access to the project attempts to open the editor while it is in use by another user or the same user attempts to open the editor in a different tab or browser. If you encounter an error when you try to save your work, you are returned to the interactive editor. The interactive editing session lasts 30 minutes by default.

Concluding Remarks

In a Model Studio pipeline, you can interactively edit splitting variable or split point or both in one or more than one decision tree nodes, of course, one at a time. You can interactively control the missing values allocation when splitting the node. Additionally, you can create entire tree interactively by first pruning it up to the root node and then use the Split from node or the Train from node options. The tree will then be created by splitting the source data, constituting the root node of the tree, into subsets, which constitute the child nodes. The splitting is based on a set of splitting rules based on classification features (Shai and Shai 2014). This process is repeated on each derived subset in a recursive manner. The recursion is completed when the subset at a node has all the same values of the target variable, or when splitting no longer adds value to the predictions. This process of top-down induction of decision trees (Quinlan 1986) is an example of a greedy algorithm, which is based on the concept of heuristic problem solving by making an optimal local choice at each node. By making these local optimal choices, you reached the approximate optimal solution globally.

If you're just getting started with Model Studio, check out Build Models with SAS Model Studio | SAS Viya Quick Start Tutorial. It guides you through the process of training and deploying machine learning models using Model Studio pipelines.

Find more articles from SAS Global Enablement and Learning here.