BookmarkSubscribeRSS Feed

Tip: Spectral Clustering in SAS® Enterprise Miner™ Using Open Source Integration Node

Started ‎10-14-2014 by
Modified ‎10-06-2015 by
Views 4,803

Introduction

Are you looking for a way to incorporate your R code into SAS® Enterprise Miner™ (EM)? The Open Source Integration node is what you need. This node not only offers a bridge between EM and R, but also empowers EM users to access to a wider range of statistical learning methods.

 

In the example below, I will demonstrate how to use the Open Source Integration node to run a spectral clustering method within EM.

 

Load Data

The data in this example can be downloaded from a public website. It is available in the tab-delimited format without a name (header) row. The steps of importing the example data are as follows.

 

  1. Download the data set from http://cs.joensuu.fi/sipu/datasets/jain.txt to your local machine.
  2. Drag a File Import node from the Sample tab to your diagram workspace.
  3. Enter TAB into the Delimiter property.
  4. Set the Name Row property to No.
  5. Click the ... to the right of the Import File property.
  6. Select the My Computer option.
  7. Click the Browse button to locate your downloaded file, and then click the OK button.

import_property_marked.png

Use the File Import node to load data sets that are stored in common delimited text formats, such as csv, tsv, etc. If your data is in SAS data set format, you should import it as a standard EM data source.

 

Set Up Open Source Integration Node

  1. Drag an Open Source Integration node from the Utility tab to your diagram workspace.
  2. Connect the File Import node to the Open Source Integration node.
  3. Set the value of the Training Mode property to Unsupervised and the value of the Output Mode property to None. Note that the other output modes (PMML and Merge) can allow variables created in R to be used in subsequent nodes in a workflow. You can find the detailed usage of these modes in SAS Enterprise Miner 13.2 Reference Help.
  4. Click the ... to the right of the Code Editor property to open the Code Editor window.
  5. Enter the following code into the Code Editor.
    library('kernlab')
    sc <- specc(as.matrix(&EMR_IMPORT_DATA[1:2]), centers=2)
    png("EMR_SPECC.png")
    plot(&EMR_IMPORT_DATA[1:2], col=sc)
    dev.off()
  6. To compare the result between spectral clustering and K-means, you can follow the previous steps to create another Open Source Integration node. Enter the following code into the the Code Editor.
    m <- kmeans(&EMR_IMPORT_DATA[1:2], 2)
    png("EMR_KMEANS.png")
    plot(&EMR_IMPORT_DATA[1:2], col=m$cluster)
    dev.off()

 

In the first Open Source Integration node, we load the kernlab library in R and run the spectral clustering function specc on the first and second columns of the data (i.e. &EMR_IMPORT_DATA[1:2]). Note that these columns should be numeric columns. We specify the number of clusters to be two and plot the result in a scatter plot where data points are colored based on their cluster membership. We save the output figure to ''EMR_SPECC.png''.

 

In the second Open Source Integration node, we call the kmeans function. To see the difference between spectral clustering and K-means, we also set the number of result clusters to be two. The file name of the output figure is ''EMR_KMEANS.png''.

 

Below is the complete diagram.

diagram_clustering.png

Run and Get Results

  1. Right-click each of the Open Source Integration nodes and select Run. In the Confirmation window, click Yes. After the node has successfully run, click Results in the Run Status window.
  2. To view the output figure, click View SAS Results Train Graphs.

 

Below are the two output figures.

train_spectral.pngtrain_kmeans.png

As shown in the figures, spectral clustering (on the left) performs better on this data set than K-Means (on the right) in terms of accuracy.

 

Summary

To sum up, the Open Source Integration node enables users to integrate R code into Enterprise Miner workflows. For more details about the node, please refer to the help document in SAS® Enterprise Miner™.

Reference

Version history
Last update:
‎10-06-2015 01:32 PM
Updated by:
Contributors

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags