BookmarkSubscribeRSS Feed
viollete
Calcite | Level 5

Hi all,

 

I want to do cluster analysis in SAS and I am little bit confused. I know that first you have to normalised your data and choose appropriate distance matrix (for example Euclidean distance) and then you do your cluster analysis.

 

However, in SAS it is not very clear. Most examples I saw dont use distance matrix, like in proc cluster or proc fastclus. Why?

 

Thanks.

1 REPLY 1
KevinScott
SAS Employee

PROC CLUSTER allows you to pass both coordinate data and distance data.

 

If the data are coordinates, PROC CLUSTER computes Euclidean distances.If you want non-Euclidean distances, You may use the DISTANCE procedure to compute an appropriate distance data set that can then be used as input to PROC CLUSTER.

Assuming that you would like to use Euclidean distance then you are not required to do the extra step of converting your data to distances.

 

If you already have a distance matrix as input then you can use the TYPE= option on a data step to indicate to PROC CLUSTER or KMEANS that the data set is a distance matrix.

 

https://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=ledsoptsref&docsetTarg...

 

I hope this information helps.

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 969 views
  • 1 like
  • 2 in conversation