We’re smarter together. Learn from this collection of community knowledge and add your expertise.

Tip: Customize SAS® Enterprise Miner™ Node Results

by SAS Employee rayIII on ‎04-15-2014 02:59 PM - edited on ‎10-06-2015 02:34 PM by Community Manager (2,414 Views)

The interactive tables and charts displayed in Enterprise Miner nodes are designed by data mining experts to summarize modeling results in an easy-to-digest format. But there are several reasons why you may want to customize the output:

 

  • To enhance or highlight information in a chart
  • To gain a different perspective on your results
  • To prepare results for presentation or publication
  • Personal taste

 

I will walk through customization of interactive charts with four examples – from the simplest to the more advanced:

 

  1. Make cosmetic changes to default node output
  2. Recast default node output as new plot type
  3. Generate plot from a summary data table
  4. Use SAS Code node to restructure node table and plot it using the %em_report macro.

 

 

Example 1:  Make Cosmetic Changes

 

Not surprisingly, the simplest changes to make are those having to do with chart labels and titles. For example, take the segment plot—a pie chart of segment frequencies—produced by the HP Cluster node:

image001.png

 

 

Each slice shows segment number as a label by default. You can easily add segment sizes and percentages to the slice labels.

 

Steps:

 

1. Run a cluster analysis using the HP Cluster node.

 

2. In the HP Cluster node results window, select the pie chart and right click on it to view available options.  (You can also view available options using the pull-down Edit menu.)

image003.png

 

3. Choose Graph Properties.

 

This opens the Properties – Pie dialog box.

 

4. In the Pie panel, select Value and Percentage checkboxes.

 

image004.png

 

5. Press Apply.

 

Segment sizes and percentages appear within each slice.

 

 

image005.png

 

 

You could easily make other cosmetic changes to the chart, such as adding a title, using the Properties dialog box. But note that these changes to the default charts are only temporary—if you close the Results window and reopen it, the Segment chart will revert to its original form, without your customized labels.  Sometimes that’s all you need—you may just want to take a quick screen shot to share with a colleague. But if you want your changes to to persist when you close and reopen your node results, you’ll need to create a new chart, rather than modifying the default pie chart. I’ll illustrate that in the next example.

 

Example 2:  Recast a Chart

 

Although adding labels makes it easier to compare narrow slices, detailed slice labels can sometimes look crowded or busy. Besides, bar charts can facilitate comparisons when there are small slices. So let’s recast the pie chart as a horizontal bar chart like this one:

 

image007.png

 

 

 

Steps:

 

1. Select the pie chart and press Plot (image008.png) button or choose Plot from View menu.

 

This opens the Select Chart Type dialog:

image009.png

 

                                                          

By choosing Plot, we are creating a new chart, rather than editing the original. Both the original pie and the new one will be visible the next time we open results for the HP Cluster node. The new chart will use the same underlying data as those that were used to create the original pie chart. 

 

2. Choose the Bar, then the Vertical Bar option:

image010.png

 

3. Click Next.

 

4. Assign chart roles as follows: Cluster = Category; Frequency = Response.

image011.png

 

5. Press Finish.

 

A horizontal bar chart is produced:

 

image012.png

 

Now all we need to is to sort the bars in descending order of frequency.

 

6. Right click on the bar chart and choose Data Options.

 

This opens the Data Options dialog:

image013.png

 

                                                        

7. In the Sorting tab, choose Response: Ascending.

 

image015.png

 

 

8. Press OK to dismiss the dialog.

 

Now the bars (segments) are sorted nicely in order of segment size:

image007.png

 

 

Example 3:  Create a Plot from a Data Table

 

Some node results are summarized in data tables without an accompanying chart. For example, the Cluster Centroids data table in the HP Cluster node:

 

image017.png

 

You can construct  custom plots from these raw tables of results, for example, a profile plot of the cluster centroids:

 

image019.png

 

By the way, this type of plot may not be suitable if you have a large number of inputs or they are measured on different scales. For this particular analysis I’ve used Fisher's Iris Data, whose four interval inputs (flower attributes) are all measured in millimeters. In addition, I've chosen the HP Cluster option to standardize the inputs.

 

Steps:

 

1. Select the Cluster Centroids table and choose Plot.

 

This opens the Select a Chart Type dialog.

 

 

2. Choose Line

 

image021.png

 

 

3. Press Next.

 

4. In the Select Chart Roles step, define roles as shown below:

 

image022.png

 

 

Note that I’ve selected the Allow Multiple Role Assignment checkbox. If it were left unchecked, you wouldn’t be able to assign more than one Y role.

 

5. Finish

 

 

The profile plot is produced:

 

image023.png

 

 

 

Examining the chart, you can easily compare the clusters. For example, relative to the other clusters, Cluster 2 has short, narrow petals, and short but wide sepals. Cluster 3 has relatively large petals and sepals.

 

Example 4: Use SAS code to restructure summary data and plot it

 

You may encounter cases where you want to plot results data that aren’t quite ready to plot. In these cases you can restructure the data before plotting it. For example, imagine that you have run a flow to produce a tree:

 

image025.png

 

 

The HP Tree node produces a table of fit statistics by partition:

 

image026.png

 

Suppose you want to create a chart to compare the partitions on each fit statistic. You can use the lattice feature to plot the fit statistics as a series of stacked charts:

 

image028.png

 

In order to create this plot, we first need to restructure the data so the train, validation, and test values are arranged in a single column, rather than in a separate column per partition. While you cannot restructure the data in Results view, you can do it with a SAS Code node.

 

Steps:

 

1. Attach a SAS Code node to your flow.

 

image029.png

 

2. Copy and paste the following SAS code into your code node:

 

 

proc sort data=&em_lib..hptree_emreportfit out=treeFitStats;

            by label stat;

run;

 

proc transpose data=treeFitStats

            out=FitStatsT

            prefix=Value

            name=Statistic

            label=Label;

            by Label Stat;

            var Train Validate Test;

run;

 

 

%em_register(type=data,key=FitStatsT);

 

data &em_user_FitStatsT;

            set FitStatsT;

   label statistic="Partition" value1 = "";

run;

 

%em_report(

                        key=FitStatsT,

                        viewtype=Lattice,

                        latticeType=bar

                        viewtype=bar,

                        latticeY=Label,

                        x=Statistic,

                        y=Value1,

                        autodisplay=Y

);

 

 

 

3. Run the node.

 

When you examine results for the SAS Code node, you should see a plot like the following:

 

image031.png

 

 

Since the individual plots have a common X axis but different Y scales (Max Absolute Error, Misclassification rate, etc.), let’s interactively turn off wrapping so each plot appears in its own row: 

 

4. Right click on the plot and choose Graph Properties.

 

5. In the Properties – Lattice dialog, uncheck Wrap.

 

6. Press OK.

 

The various fit statistics are now stacked in a single column with a common X (partition) axis.

 

image028.png

 

This plot is essentially the graphical analogue of the original data table, which displayed each statistic in a separate row.

 

Alternatively, we might have turned off display of Y-axis values and ticks in Graph Properties to encourage focus on the relative bar heights without reference to their actual scales:

 

image033.png

 

By the way, if you select the lattice chart and click the Table icon (image035.png) you can view the restructured data that were used to create the plot:

 

image036.png

 

 

 

More About the SAS Code Used In This Example

 

The code you pasted into the SAS Code node does four things:

 

1. Sort and transpose the fit statistics:

 

This step is done using SAS procedures. This creates a new transformed dataset called FitStatsT.

 

proc sort data=&em_lib..hptree_emreportfit out=treeFitStats;

            by label stat;

run;

 

proc transpose data=treeFitStats

            out=FitStatsT

            prefix=Value

            name=Statistic

            label=Label;

            by Label Stat;

            var Train Validate Test;

run;

 

The built in &em_lib macro variable points to the location of the fit statistics dataset on the SAS server. But what about the name of the dataset?

 

Like most Enterprise Miner nodes, the HP Tree node produces several datasets. Each dataset uses the node ID as a prefix—in this case, “HPTree.” The rest of the dataset name usually has some correspondence to the title used in node results. The dataset used in this example is named “hptree_emreportfit.” I found the dataset by running Proc Contents specifying “data=&em_Lib.._all_”  (not shown here) and examining the output.

 

2. Register the restructured data using the built-in %em_register macro:

 

%em_register(type=data,key=FitStatsT);

 

3. Use Data step code to create the dataset to plot:

 

Since we used the KEY value of FitStatsT in the previous step, the dataset must be assigned the name &em_user_FitStatsT.

 

data &em_user_FitStatsT;

            set FitStatsT;

   label statistic="Partition" value1 = "";

run;

 

4. Use the built-in %em_report macro to create a Lattice Bar chart that displays partition value on the X axis and uses fit statistic label as the lattice variable.

 

Here it is important to use the same KEY  (FitStatsT) as in step 3.

 

%em_report(

                        key=FitStatsT,

                        viewtype=Lattice,

                        latticeType=bar

                        viewtype=bar,

                        latticeY=Label,

                        x=Statistic,

                        y=Value1,

                        autodisplay=Y

);

 

 

For more on using the built-in %em_register and %em_report macros, see the earlier tip, Create Graphs in a SAS Code Node Using %em_report.

 

Summary

 

We’ve seen how to customize interactive results produced by EM nodes, from making cosmetic edits to creating new charts that persist in node results.  We’ve also seen how to use the SAS Code node to manipulate node output datasets so they conform to plotting requirements.

 

Hopefully this tip will help you start tailoring node results to your needs. We’ve only scratched the surface, though. To learn more about the rich set of chart customization options offered in Enterprise Miner, I suggest opening node results for your favorite supervised or unsupervised modeling node and exploring the options listed under Graph Properties and Data Options.

 

Keep in mind that you can also create graphs interactively using the Graph Explore node. The difference is that the Graph Explore node uses input data, whereas most other nodes present summaries of results.

Contributors
Your turn
Sign In!

Want to write an article? Sign in with your profile.


Looking for the Ask the Expert series? Find it in its new home: communities.sas.com/askexpert.