Help using Base SAS procedures

proc cluster for mixed data

Reply
Contributor
Posts: 35

proc cluster for mixed data

I have a data set of about 600,000 obs. The variables I would like to use for grouping observations/transactions include numeric and categorical variables.

In PROC CLUSTER, which METHOD or distance measure would be the most appropriate?
Super Contributor
Posts: 260

Re: proc cluster for mixed data

Posted in reply to datalligence
Hi.
1) You will wait a long time for CLUSTER to cope with computations on such a big amount of observations. Consider using FASTCLUS to do the job, or at least create first-level clusters that would be processed afterwards (the two-stage method, I think the correct name for the method is when you look in the SAS help).
2) Use PRINQUAL or CORRESP procedures to pre-process your data : these can create numeric (continuous) variables summarizing information in categorical variables. Then merge with the already existing numeric information. And then cluster.
Regards.
Olivier
Contributor
Posts: 35

Re: proc cluster for mixed data

FASTCLUS has a lot of limitations, and is not suitable for mixed data.

I guess I will have to use PROC DISTANCE with Gower's dissimilarity. But when I run PROC CLUSTER, which distance method will be the most appropriate?

Thanks,
Romakanta
Ask a Question
Discussion stats
  • 2 replies
  • 1296 views
  • 0 likes
  • 2 in conversation