BookmarkSubscribeRSS Feed
viollete
Calcite | Level 5

Hi all,

 

I want to do cluster analysis in SAS and I am little bit confused. I know that first you have to normalised your data and choose appropriate distance matrix (for example Euclidean distance) and then you do your cluster analysis.

 

However, in SAS it is not very clear. Most examples I saw dont use distance matrix, like in proc cluster or proc fastclus. Why?

 

Thanks.

1 REPLY 1
KevinScott
SAS Employee

PROC CLUSTER allows you to pass both coordinate data and distance data.

 

If the data are coordinates, PROC CLUSTER computes Euclidean distances.If you want non-Euclidean distances, You may use the DISTANCE procedure to compute an appropriate distance data set that can then be used as input to PROC CLUSTER.

Assuming that you would like to use Euclidean distance then you are not required to do the extra step of converting your data to distances.

 

If you already have a distance matrix as input then you can use the TYPE= option on a data step to indicate to PROC CLUSTER or KMEANS that the data set is a distance matrix.

 

https://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=ledsoptsref&docsetTarg...

 

I hope this information helps.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 407 views
  • 1 like
  • 2 in conversation