BookmarkSubscribeRSS Feed
r_sethi2001
Calcite | Level 5

I am working on Chap 5 of the book by Randy Collica : Customer Segmentation and Clustering using E Miner. The attached zip file contains: 1) Customers Dataset (100000 customer recs) 2) Word Doc with images of two distance plots 3) XML Process Flow

 

1) Pls start with the word doc, one distance plot does not have any circles drawn around the centroids whereas the other has the circles automatically drawn. I need to find out why are there no cricles drawn automatically  in the first plot.

 

2) I am enclosing the XML diagram where there are two cluster nodes. One is named Copied Cluster Node (source sample code from the author's website) the other node was created by me.My Node has a problem where I need help

 

3) Cluster Node Property (SCORE) : Hide Original variables - how do I change the Yes to NO. In the transform node 4 variables were transformed and i want to see the orginal variables

 

 

would be very grateful for the help thanks

 

PS: the customers data set could not be uploaded due to limitations of file size. Please let me know if it is necessary, perhaps the file size problem can be obviated via google docs

4 REPLIES 4
r_sethi2001
Calcite | Level 5

update on my previous mail dated Jun 19th 2016

 

a trucated version of the customer file is now uploaded. Original customer dataset  had 100000 customer recs, now the dataset customers2 has 70000 recs 

 

trust this helps

 

thanks and best regards

rayIII
SAS Employee

Regarding the cluster plots, the first solution is one dimensional, that is, all of the variation between the clusters can be explained by a single latent variable. Thus circles, which are two dimensional, aren't needed to describe the within-cluster variation as there is little or no variance in dimension 2. (Caveat: I haven't looked at the code. This is my just my best guess as to what's going on here. )

r_sethi2001
Calcite | Level 5
thanks Ray, let me look at the system again tomorrow morning and come back to you. You have given me line to investigate further . 
Randy Collica's solution, (for the same dataset and as per his XML diagram)  the cluster node generates a solution  where two variables together differentiate the clusters . In the XML attachment I copy pasted the cluster node from the text book solution and I get a cluster plot with circles. The data source node, transform node and the filter nodes are all common.
I am also struggling to set Hide Original Variables to NO, the default value is YES.
thanks once again for the support.
rayIII
SAS Employee

Hi.

 

Regarding the original variables, you should be able to see them (along with your cluster scores) if you select your Cluster node then press the Exported Data button in node properties.  You don't need to do anything special. 

 

The Hide Original Variables option enables only when Scoring Imputation is in effect (i.e., Scoring Imputation Method = Seed of Nearest Cluster). In that case, the original untransformed variables will hide by default. If you have chosen to impute, then NO will include two sets of variables (original, transformed) when you look at the exported data. 

 

Does that make sense? 

 

Ray

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1405 views
  • 0 likes
  • 2 in conversation