Hallo,
I am facing a issue with proc fastclust. I want to have 5 clusters and I can get them very well. The problem is: If I run the same procedure on two different datasets which has actually the same data, I get different numbers for clusters although the behaviour clousters are the same.
For example:
A cluster, which was numbered 2 in first run, is numbered 3 in the second run. Using Profiling I can see the fact that cluster 2 in first run is equivalent to cluster 3 in second run.
Do any one has Idea how I can preserve the cluster numbers ?
Thanks in advance
Ehsan
If the clustering is really the same, then you can do the following:
1. From the first run you can use the OUTSTAT= option to output the centers. Call the centers
CA_1, CA_2, .., CA_k.
2. From the second run you can use the OUTSTAT= option to output the centers. Call the centers
CB_1, CB_2,..., CB_k.
3. Concatenate the centers into a single data set and use PROC DISTANCE to compute the distance between centers.
4. The first k columns and the last k rows represent the distance between the centers in each run. The smallest elemtn in each column tells you which center in Run A mathch up with which cetners in Run B.
Actually, order does not matter here. What really matters is the "time period". I have one dataset from May2016 and the second Dataset from Jun 2016. Since the (customer) data comes from the same source. Also the experiementation shows that I can always finde the same clusters but with different numbers.
If the clustering is really the same, then you can do the following:
1. From the first run you can use the OUTSTAT= option to output the centers. Call the centers
CA_1, CA_2, .., CA_k.
2. From the second run you can use the OUTSTAT= option to output the centers. Call the centers
CB_1, CB_2,..., CB_k.
3. Concatenate the centers into a single data set and use PROC DISTANCE to compute the distance between centers.
4. The first k columns and the last k rows represent the distance between the centers in each run. The smallest elemtn in each column tells you which center in Run A mathch up with which cetners in Run B.
Thanks It seems to work 🙂
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.