Re: SAS EM 12.3 Cluster Node - Cluster Distance Plot

r_sethi2001 · Posted 06-19-2016 01:01 PM

I am working on Chap 5 of the book by Randy Collica : Customer Segmentation and Clustering using E Miner. The attached zip file contains: 1) Customers Dataset (100000 customer recs) 2) Word Doc with images of two distance plots 3) XML Process Flow

1) Pls start with the word doc, one distance plot does not have any circles drawn around the centroids whereas the other has the circles automatically drawn. I need to find out why are there no cricles drawn automatically in the first plot.

2) I am enclosing the XML diagram where there are two cluster nodes. One is named Copied Cluster Node (source sample code from the author's website) the other node was created by me.My Node has a problem where I need help

3) Cluster Node Property (SCORE) : Hide Original variables - how do I change the Yes to NO. In the transform node 4 variables were transformed and i want to see the orginal variables

would be very grateful for the help thanks

PS: the customers data set could not be uploaded due to limitations of file size. Please let me know if it is necessary, perhaps the file size problem can be obviated via google docs

r_sethi2001 · Posted 06-20-2016 07:39 AM

update on my previous mail dated Jun 19th 2016

a trucated version of the customer file is now uploaded. Original customer dataset had 100000 customer recs, now the dataset customers2 has 70000 recs

trust this helps

thanks and best regards

rayIII · Posted 06-20-2016 09:34 AM

Regarding the cluster plots, the first solution is one dimensional, that is, all of the variation between the clusters can be explained by a single latent variable. Thus circles, which are two dimensional, aren't needed to describe the within-cluster variation as there is little or no variance in dimension 2. (Caveat: I haven't looked at the code. This is my just my best guess as to what's going on here. )

r_sethi2001 · Posted 06-20-2016 12:38 PM

thanks Ray, let me look at the system again tomorrow morning and come back to you. You have given me line to investigate further .
Randy Collica's solution, (for the same dataset and as per his XML diagram) the cluster node generates a solution where two variables together differentiate the clusters . In the XML attachment I copy pasted the cluster node from the text book solution and I get a cluster plot with circles. The data source node, transform node and the filter nodes are all common.
I am also struggling to set Hide Original Variables to NO, the default value is YES.
thanks once again for the support.

rayIII · Posted 06-20-2016 01:19 PM

Hi.

Regarding the original variables, you should be able to see them (along with your cluster scores) if you select your Cluster node then press the Exported Data button in node properties. You don't need to do anything special.

The Hide Original Variables option enables only when Scoring Imputation is in effect (i.e., Scoring Imputation Method = Seed of Nearest Cluster). In that case, the original untransformed variables will hide by default. If you have chosen to impute, then NO will include two sets of variables (original, transformed) when you look at the exported data.

Does that make sense?

Ray

SAS EM 12.3 Cluster Node - Cluster Distance Plot