I am working on Chap 5 of the book by Randy Collica : Customer Segmentation and Clustering using E Miner. The attached zip file contains: 1) Customers Dataset (100000 customer recs) 2) Word Doc with images of two distance plots 3) XML Process Flow
1) Pls start with the word doc, one distance plot does not have any circles drawn around the centroids whereas the other has the circles automatically drawn. I need to find out why are there no cricles drawn automatically in the first plot.
2) I am enclosing the XML diagram where there are two cluster nodes. One is named Copied Cluster Node (source sample code from the author's website) the other node was created by me.My Node has a problem where I need help
3) Cluster Node Property (SCORE) : Hide Original variables - how do I change the Yes to NO. In the transform node 4 variables were transformed and i want to see the orginal variables
would be very grateful for the help thanks
PS: the customers data set could not be uploaded due to limitations of file size. Please let me know if it is necessary, perhaps the file size problem can be obviated via google docs
update on my previous mail dated Jun 19th 2016
a trucated version of the customer file is now uploaded. Original customer dataset had 100000 customer recs, now the dataset customers2 has 70000 recs
trust this helps
thanks and best regards
Regarding the cluster plots, the first solution is one dimensional, that is, all of the variation between the clusters can be explained by a single latent variable. Thus circles, which are two dimensional, aren't needed to describe the within-cluster variation as there is little or no variance in dimension 2. (Caveat: I haven't looked at the code. This is my just my best guess as to what's going on here. )
Hi.
Regarding the original variables, you should be able to see them (along with your cluster scores) if you select your Cluster node then press the Exported Data button in node properties. You don't need to do anything special.
The Hide Original Variables option enables only when Scoring Imputation is in effect (i.e., Scoring Imputation Method = Seed of Nearest Cluster). In that case, the original untransformed variables will hide by default. If you have chosen to impute, then NO will include two sets of variables (original, transformed) when you look at the exported data.
Does that make sense?
Ray
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.