I would like to know the sas code for testing significance in kth nearest neighbor clustering?
Some additional information taken from SAS Stat Documentation.
One useful descriptive approach to the number-of-clusters problem is provided by Wong and Schaack (1982) based on a kth-nearest-neighbor density estimate. The kth-nearest-neighbor clustering method developed by Wong and Lane (1983) is applied with varying values of k. Each value of k yields an estimate of the number of modal clusters. If the estimated number of modal clusters is constant for a wide range of k values, there is strong evidence of at least that many modes in the population. A plot of the estimated number of modes against k can be highly informative. Attempts to derive a formal hypothesis test from this diagnostic plot have met with difficulties, but a simulation approach similar to Silverman (1986) does seem to work Girman (1994). The simulation, of course, requires considerable computer time.
And you could also check CCC statistic and its disadvantage.
Sarle (1983) used extensive simulations to develop the cubic clustering criterion (CCC), which can be used for crude hypothesis testing and estimating the number of population clusters. The CCC is based on the assumption that a uniform distribution on a hyperrectangle will be divided into clusters shaped roughly like hypercubes. In large samples that can be divided into the appropriate number of hypercubes, this assumption gives very accurate results. In other cases the approximation is generally conservative. For details about the interpretation of the CCC, consult Sarle (1983).
You can use ANOVA(proc glm) to check if these clusters are significent .
We need much more background. What is the study design, model, and hypothesis?
-Kevin
Survey data is collected from respondents about their shopping orientation using likert scaled questions. Using k-means method I am getting 3 clusters of shoppers. However, I want to cross-check my results using kth nearest neighbor clustering method as it is an unbiased method. My interest is in the knn clustering method and how significance test is conducted in knn to determine the number of true clusters.
That is very hard question. As far as I know there is not well-known or uniform criterion to know the number of cluster you should get.
But I think you can performs principal component analysis to roughly know how many cluster there will be .
I will try principal components analysis. I think plotting the objects along the first two canonical variates will give me some idea about the possible number of clusters, I am new to SAS. Please let me know the proper sas code for the said analysis. Thank you in advance.
Some additional information taken from SAS Stat Documentation.
One useful descriptive approach to the number-of-clusters problem is provided by Wong and Schaack (1982) based on a kth-nearest-neighbor density estimate. The kth-nearest-neighbor clustering method developed by Wong and Lane (1983) is applied with varying values of k. Each value of k yields an estimate of the number of modal clusters. If the estimated number of modal clusters is constant for a wide range of k values, there is strong evidence of at least that many modes in the population. A plot of the estimated number of modes against k can be highly informative. Attempts to derive a formal hypothesis test from this diagnostic plot have met with difficulties, but a simulation approach similar to Silverman (1986) does seem to work Girman (1994). The simulation, of course, requires considerable computer time.
And you could also check CCC statistic and its disadvantage.
Sarle (1983) used extensive simulations to develop the cubic clustering criterion (CCC), which can be used for crude hypothesis testing and estimating the number of population clusters. The CCC is based on the assumption that a uniform distribution on a hyperrectangle will be divided into clusters shaped roughly like hypercubes. In large samples that can be divided into the appropriate number of hypercubes, this assumption gives very accurate results. In other cases the approximation is generally conservative. For details about the interpretation of the CCC, consult Sarle (1983).
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.