Are you looking for a way to incorporate your R code into SAS® Enterprise Miner™ (EM)? The Open Source Integration node is what you need. This node not only offers a bridge between EM and R, but also empowers EM users to access to a wider range of statistical learning methods.
In the example below, I will demonstrate how to use the Open Source Integration node to run a spectral clustering method within EM.
The data in this example can be downloaded from a public website. It is available in the tab-delimited format without a name (header) row. The steps of importing the example data are as follows.
Use the File Import node to load data sets that are stored in common delimited text formats, such as csv, tsv, etc. If your data is in SAS data set format, you should import it as a standard EM data source.
In the first Open Source Integration node, we load the kernlab library in R and run the spectral clustering function specc on the first and second columns of the data (i.e. &EMR_IMPORT_DATA[1:2]). Note that these columns should be numeric columns. We specify the number of clusters to be two and plot the result in a scatter plot where data points are colored based on their cluster membership. We save the output figure to ''EMR_SPECC.png''.
In the second Open Source Integration node, we call the kmeans function. To see the difference between spectral clustering and K-means, we also set the number of result clusters to be two. The file name of the output figure is ''EMR_KMEANS.png''.
Below is the complete diagram.
Below are the two output figures.
As shown in the figures, spectral clustering (on the left) performs better on this data set than K-Means (on the right) in terms of accuracy.
To sum up, the Open Source Integration node enables users to integrate R code into Enterprise Miner workflows. For more details about the node, please refer to the help document in SAS® Enterprise Miner™.