BookmarkSubscribeRSS Feed
lionking19063
Fluorite | Level 6

Hi,

I am doing a cluster analysis with 10 continuous variables and 3 categorical variables. Instead of converting categorical variables into dummies, I am thinking of creating distance matrix using "PROC DISTANCE".

1) Calculate 3 sets of distance matrix and each set contains the distance between one categorical variable(id category_var1) and 10 continuous variables(var interval(continuous _var1-continuous10) 

2) then merge 3 sets of distance matrix back with the values of 10 continuous variables

3) Standardize them and use standardized variables as the new variables in "PROC CLUSTER" or "PROC FASTCLUS"

 

Question, Dose the logic make sense to you, particularly step 1 ? Thank you.

2 REPLIES 2
PGStats
Opal | Level 21

Instead, you could get clusters from continuous_var1-continuous_var10 and test for a relationship between those clusters and your categories with proc freq.

PG
lionking19063
Fluorite | Level 6

You are right. However, I really want to test the effects of categorical variables along with other continuous variables at the same time. Thank you for your response.

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1542 views
  • 0 likes
  • 2 in conversation