BookmarkSubscribeRSS Feed
viollete
Calcite | Level 5

Hi all,

 

I want to do cluster analysis in SAS and I am little bit confused. I know that first you have to normalised your data and choose appropriate distance matrix (for example Euclidean distance) and then you do your cluster analysis.

 

However, in SAS it is not very clear. Most examples I saw dont use distance matrix, like in proc cluster or proc fastclus. Why?

 

Thanks.

1 REPLY 1
KevinScott
SAS Employee

PROC CLUSTER allows you to pass both coordinate data and distance data.

 

If the data are coordinates, PROC CLUSTER computes Euclidean distances.If you want non-Euclidean distances, You may use the DISTANCE procedure to compute an appropriate distance data set that can then be used as input to PROC CLUSTER.

Assuming that you would like to use Euclidean distance then you are not required to do the extra step of converting your data to distances.

 

If you already have a distance matrix as input then you can use the TYPE= option on a data step to indicate to PROC CLUSTER or KMEANS that the data set is a distance matrix.

 

https://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=ledsoptsref&docsetTarg...

 

I hope this information helps.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 364 views
  • 1 like
  • 2 in conversation