This tip is part of Learn by Example with SAS® Enterprise Miner™ Templates where a new data mining topic is introduced and explained with one or more example SAS Enterprise Miner process flow diagrams.
The topic discussed here is clustering – a technique that uses unlabeled data (data with no target variable), also called unsupervised learning. Data is often unlabeled as it is can be expensive and/or time consuming to label it, so this technique can be widely applied.
So what is clustering? In its simplest form, the goal of this technique is to create clusters (aka groups or segments) of observations so that within cluster variability is minimized and between clusters variability is maximized. In the end, all observations are divided into clusters so that every observation belongs to exactly one cluster.
To get started with clustering using SAS Enterprise Miner, download the process flow diagrams (XML files) and the accompanying PDF documentation for the following two examples from the GitHub repository at https://github.com/sassoftware/dm-flow/tree/master/Clustering
1. ClusterNodeExplore: A simple example that shows how the Cluster and Segment Profile nodes can be used to explore data
2. ClusterNodePredict: An advanced example that uses the Cluster node as part of a regression modeling flow to demonstrate one of the ways it can be used to improve the prediction accuracy of the model.
To run these examples, refer to the README file that is part of the GitHub repository at https://github.com/sassoftware/dm-flow. Please note that these examples were tested with SAS Enterprise Miner 13.2