BookmarkSubscribeRSS Feed
lionking19063
Fluorite | Level 6

Hi,

I am doing a cluster analysis with 10 continuous variables and 3 categorical variables. Instead of converting categorical variables into dummies, I am thinking of creating distance matrix using "PROC DISTANCE".

1) Calculate 3 sets of distance matrix and each set contains the distance between one categorical variable(id category_var1) and 10 continuous variables(var interval(continuous _var1-continuous10) 

2) then merge 3 sets of distance matrix back with the values of 10 continuous variables

3) Standardize them and use standardized variables as the new variables in "PROC CLUSTER" or "PROC FASTCLUS"

 

Question, Dose the logic make sense to you, particularly step 1 ? Thank you.

2 REPLIES 2
PGStats
Opal | Level 21

Instead, you could get clusters from continuous_var1-continuous_var10 and test for a relationship between those clusters and your categories with proc freq.

PG
lionking19063
Fluorite | Level 6

You are right. However, I really want to test the effects of categorical variables along with other continuous variables at the same time. Thank you for your response.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1267 views
  • 0 likes
  • 2 in conversation