We’re smarter together. Learn from this collection of community knowledge and add your expertise.

How to Use the Link Analysis Node as a Clustering Tool in SAS® Enterprise Miner™

by New User tailee2 on ‎06-17-2014 01:19 PM - edited on ‎10-06-2015 01:52 PM by Community Manager (2,715 Views)

Do you know you can do a clustering task with the Link Analysis node in SAS® Enterprise Miner™?

 

Using a dataset with the train/raw role, the Link Analysis node provides a cluster analysis. The node uses a community detection algorithm widely used in social network analysis for the clustering task. There are two big advantages of using the Link Analysis node as a clustering tool:

 

1) You can see the relationships between variables and levels of variables through the network constellation plot.

2) The number of clusters is determined automatically based on the community resolution.

 

The following famous clustering example (Fisher’s Iris data) shows an outstanding result of clustering obtained using the Link Analysis node. Here is the process flow:

 

TipPicture1.png

 

Since the species are the actual clusters, we do not use the species variable in the link analysis. So the variable settings at IDS node are as follows:

 

Data set role:

TipPicture2.png

 

Variable settings:

TipPicture3.png

All interval variables are binned automatically in the Link Analysis node before the clustering.

 

The link analysis in this example uses all the default properties. As you would expect from the Iris dataset (which contains data for three Iris species), the result shows three clusters:

 

 

TipPicture4.png

 

The following (network) constellation plot shows which variables and which levels of each variable affect the resulting clusters.

 

TipPicture5.png

 

Node colors represent item clusters which are communities, as in social network analysis. Those communities are used to create the clusters among observations. So the items (variable and variable levels) will determine the resulting (observation) clusters. The node sizes represent weights for each community. The link thickness shows the transaction count which the items occurs together.

 

By adding the segment profile node after link analysis node, you can assess and interpret the clustering result. For this example, you need to set the Use option of the species variable to 'Yes' in the variable editor of the segment profile node, to get more details related to the true segments.

 

TipPicture6.png

 

TipPicture7.png

 

From the class variable statistics (result >> view >> summary statistics >> class variables), you can calculate the miscalculation rate which is (11+6)/150=0.11. A performance comparison of clustering with k-mean clustering shows the link analysis clustering performs well.

TipPicture71.png

Finally, since the link analysis node produces score codes as a result of clustering, you can score new data sets by using the score node as follows.

 

TipPicture8.png

As you have seen the example, the link analysis node provides an alternative clustering tool with automatic selection of the number of clusters. Unlike other standard clustering tools, it also explains how the variable and variable levels contribute to the formation of clusters.

Contributors
Your turn
Sign In!

Want to write an article? Sign in with your profile.