BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Noelblanc
Calcite | Level 5

Hello,

 

is it possible to run a canonical discriminant analysis in SAS ENTERPRISE 13.2 ? I want to visualize clustering group in SAS EMINER... how i can do this ?

 What is the difference between Cluster Node and HP Cluster node ? can i run a k-means clustering with Cluster node ?

1 ACCEPTED SOLUTION

Accepted Solutions
husseinmazaar
Quartz | Level 8

In cluster node, when you choose automatic option.

This is explanation in details from cluster node's help in sas E-Miner.

 

  • The Automatic setting (default) configures SAS Enterprise Miner to automatically determine the optimum number of clusters to create.
    • When the Automatic setting is selected, the value in the Maximum Number of Clusters property in the Number of Clusters section is not used to set the maximum number of clusters. Instead, SAS Enterprise Miner first makes a preliminary clustering pass, beginning with the number of clusters that is specified as the Preliminary Maximum value in the Selection Criterion properties.

      After the preliminary pass completes, the multivariate means of the clusters are used as inputs for a second pass that uses agglomerative, hierarchical algorithms to combine and reduce the number of clusters. Then, the smallest number of clusters that meets all four of the following criteria is selected.

 

I hope this what do you want.

Thanks

View solution in original post

13 REPLIES 13
WendyCzika
SAS Employee

The Cluster node performs hierarchical clustering using PROC CLUSTER (see the SAS/STAT documentation for more details on that procedure), while the HP Cluster node performs k-means clustering.  Here is a helpful tip that can provide you more information on that:

https://communities.sas.com/t5/SAS-Communities-Library/Tip-K-means-clustering-in-SAS-comparing-PROC-...

 

Hope that helps!

Wendy

Noelblanc
Calcite | Level 5

Thanks...but it is written on sas Eminer Reference Help that the Cluster node performs a clustering with seed. I do not see where it is written that the node performs a hierarchical clustering and in addition, the results of that node don't see like the results of proc cluster in SAS/STAT. also a question:performs this node always a hierarchical clustering when we select User-specified setting in Specification Method ?

Noelblanc
Calcite | Level 5

https://communities.sas.com/t5/SAS-Communities-Library/Tip-Guidelines-for-Choosing-a-Clustering-Meth...

 

in this article, the autor said at the conclusion: "After the number of clusters is determined, the clusters are obtained using a k-means algorithm."     I am confused 

WendyCzika
SAS Employee

Sorry, I was incorrect about the hierarchical clustering above.  The Cluster node performs k-means clustering.  The optimum number of clusters when using the Automatic setting is first determined making a preliminary cluster pass then k-means is performed with that number of clusters.  

Noelblanc
Calcite | Level 5

Ok... and which node performs a hierarchical clustering ?

husseinmazaar
Quartz | Level 8

In cluster node, when you choose automatic option.

This is explanation in details from cluster node's help in sas E-Miner.

 

  • The Automatic setting (default) configures SAS Enterprise Miner to automatically determine the optimum number of clusters to create.
    • When the Automatic setting is selected, the value in the Maximum Number of Clusters property in the Number of Clusters section is not used to set the maximum number of clusters. Instead, SAS Enterprise Miner first makes a preliminary clustering pass, beginning with the number of clusters that is specified as the Preliminary Maximum value in the Selection Criterion properties.

      After the preliminary pass completes, the multivariate means of the clusters are used as inputs for a second pass that uses agglomerative, hierarchical algorithms to combine and reduce the number of clusters. Then, the smallest number of clusters that meets all four of the following criteria is selected.

 

I hope this what do you want.

Thanks

Noelblanc
Calcite | Level 5

thanks... but which clustering methods is performing after the number of cluster is selected ?

I have a project: i must performing a hierarchical clustering in SAS EMINER, but i don't find the node that do this...

DougWielenga
SAS Employee

Noelblanc, 

 

SAS Enterprise Miner was designed for data mining (extremely large) data sets for which many classical analytical approaches (including hierarchical clustering) are often not practical.  There is therefore no node that automatically performs hierarchical clustering.   The cluster node itself identifies n seeds using the FASTCLUS procedure based on the maximum number of clusters requested and then hierarchically clusters those seeds themselves to identify the number of clusters based on the criterion selected in the node.  It then runs FASTCLUS with the chosen number of clusters to create the output.  

If you wish to visualize clusters in SAS Enterprise Miner, you would be best off using the Segment Profile node which can take the output of the Cluster node.  The Segment Profile node allows you to create a decision tree based on cluster membership so that you can identify which factors might classify an observation into one cluster vs. another.  

 

If you still wish to perform hierarchical clustering, you could write code to call the CLUSTER procedure in order to generate a hierarchical cluster analysis.  You would be limited in how you could visualize those results as the SAS Code node is not a complete replacement for the SAS Display Manager System and therefore has limitations on certain types of output.  For example, code utilizing the Output Delivery System won't always run successfully when running SAS Enterprise Miner.  

SAS_ASS
Obsidian | Level 7

So what method is exactly used in this preliminary cluster pass?

This is of importance, I guess.

DougWielenga
SAS Employee

So what method is exactly used in this preliminary cluster pass?

This is of importance, I guess.

 

To clarify, the hierachical clustering being done is only on the cluster seeds initially generated by the FASTCLUS procedure.  There is no hierarchical clustering of the entire date set.  The initial cluster seeds (using the maximum number of clusters of interest) are clustered hierarchically reducing the number of seeds to submit to FASTCLUS by one at each step.  This generates a different clustering solution for each value from the maximum number of clusters considered to the smallest.   You can then choose the clustering solution based on the any of the criteria which are provided and/or interpretability and/or usefulness.

 

SAS Enterprise Miner provides the option to use AVERAGE, CENTROID, or WARD when doing this hierarchical step.   

 

Hope this helps!


Cordially,

Doug

 

  

 

 

SAS_ASS
Obsidian | Level 7
heey, thanks for your answer, so for my understanding:

1) hierarchical clustering based on maximum number of clusters
2) submitting this solution to FASTCLUS
Is this right?
How can I see this is the SAS Code?
Thanks and kind regards
Laura
DougWielenga
SAS Employee

1) hierarchical clustering based on maximum number of clusters
2) submitting this solution to FASTCLUS
Is this right?
How can I see this is the SAS Code?

 

If you look at the options in the Cluster node, you will see the following settings in the Selection Criterion section:

 

Clustering Method:   Ward (default) or you can change to Average or Centroid  

Preliminary Maximum:   50 (default)

Minimum: 2 (default)

 

SAS Enterprise Miner identifies initial seeds using the DMVQ procedure (which can perform Vector Quantization and k-means clustering).  Note: This initial step was performed by FASTCLUS in early versions of SAS Enterprise Miner prior to the introduction of the DMVQ procedure.  The DMVQ procedure provides k seeds to the CLUSTER procedure based on the Preliminary Maximum setting (50 by default).  These seeds (50 in my example) are clustered hierarchically by the CLUSTER procedure in order to identify candidate solutions based on the Cubic Clustering Criterion (CCC).   The seeds and the associated statistics from this step are written to the 

 

         <project folder> / Workspaces / <workspace folder> / <node id>_CLUSSEED.sas7bdat

 

data set.  The _CCC_ variable in this data set contains the computed CCC value for various steps in the hierarchical clustering of the initial cluster seeds.   Candidates for the optimum number of clusters based on this hierarchical step are identified and then the DMVQ procedure runs again to obtain a direct (k-means) cluster analysis of the training data itself based on the number of seeds chosen by the hierarchical step.   You can see the results of this in the Output window of the Cluster node where it shows output from the CLUSTER procedure including the Eigenvalues of the Covariance Matrix, the Cluster History, and the Candidates for Optimum Number of Clusters. 

 

In order to see the actual code that is running, you will need to add some options to your Project Start Code requesting SAS Enterprise Miner to print the logic from the macros which are running to perform these steps.  Specifically, you can get a great deal more detail in the Log if you add the following statement to the Project Start Code:

 

/*** BEGIN SAS CODE ***/

 

options mprint source mlogic;

 

/*** END SAS CODE ***/

 

Remember that SAS Enterprise Miner handles ordinal and nominal inputs as well as interval inputs so there is more that is happening but this is the basic outline of how the process works.   

 

I hope this helps!


Cordially,

Doug

 

SAS_ASS
Obsidian | Level 7

Thanks for your answer!

You helped me a lot!

Is there any documentation of the DMVQ procedure. I couldn't find any.

Unfortunately I have to write my master thesis about all this. Kind of hard without formulas and sufficient documentation by SAS 😄

Maybe you can answer me one more question about the CCC.

The greater the better... I know.

But what if it is just monoton increasing, so the more cluster, the greater it is. Seems kind of nonsense to me or quite unreliable.

Are other statistics more  important like the R² or cluster distance?

 

Thank you! 🙂

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 9616 views
  • 2 likes
  • 5 in conversation