BookmarkSubscribeRSS Feed
Calcite | Level 5

Hi all,


I want to do cluster analysis in SAS and I am little bit confused. I know that first you have to normalised your data and choose appropriate distance matrix (for example Euclidean distance) and then you do your cluster analysis.


However, in SAS it is not very clear. Most examples I saw dont use distance matrix, like in proc cluster or proc fastclus. Why?



SAS Employee

PROC CLUSTER allows you to pass both coordinate data and distance data.


If the data are coordinates, PROC CLUSTER computes Euclidean distances.If you want non-Euclidean distances, You may use the DISTANCE procedure to compute an appropriate distance data set that can then be used as input to PROC CLUSTER.

Assuming that you would like to use Euclidean distance then you are not required to do the extra step of converting your data to distances.


If you already have a distance matrix as input then you can use the TYPE= option on a data step to indicate to PROC CLUSTER or KMEANS that the data set is a distance matrix.


I hope this information helps.


Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.


Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1 like
  • 2 in conversation