Programming the statistical procedures from SAS

Cluster analysis - questions

Reply
Occasional Contributor
Posts: 13

Cluster analysis - questions

Hello,

I am rather new to SAS and trying to do a cluster analysis with my data.

I have respondents (SAMPLEID), items (NSS__) that they eat on a regular basis and their amounts. Each item is assigned to a code that's why you can see NSS11, NSS25 etc..

It looks like this:

SAMPLEIDNSS11NSS13NSS14NSS15NSS16NSS20NSS25NSS30NSS31NSS35NSS41NSS42NSS46NSS47NSS48NSS50NSS51NSS120NSS125NSS132NSS135
1111115886400003560256700000270168369000
1111120000000860003003567800002500150450
11111301007000004500458050000300280450000
1111144500570647931068000737906482000975368
1111155020400381205205000050600537500050290

I have a much larger dataset with many more variables. I would like to create clusters based on similarities in eating patterns. So here, for example, 111111 and 111113 will be in a cluster1, while 111114 and 111115 will be in a cluster2, etc.. based on similarities in their diet.

I tried FASTCLUS procedure for number of clusters from 3 to 10 and it seems that only 1 cluster has the majority of respondents, while others have only a few. Given the size of the dataset I would expect at least a few dominant dietary patterns (e.g. vegetarian or omnivore).

I tried ACECLUS procedure but after some time it gave me an error "Eigenvector computation failed"...

I am using McCarthy's paper on "Methodological approach to performing cluster analysis with SAS" - but it shows the example with only 3 variables in clustering a number of countries by similarities.

I wonder if I am doing the right procedures..?

I would greatly appreciate any suggestions or ideas!

Thank you very much,

Anastasia

Super User
Posts: 18,583

Re: Cluster analysis - questions

Things that might affect your cluster calculations are the size of your data set, having too many variables and having too few observations are common issues. Depending on your variables you may find combining variables to be a method that works.

There is also PROC CLUSTER that you can look into.

Occasional Contributor
Posts: 13

Re: Cluster analysis - questions

Hello Reeza,

Thank you for the prompt response.

Number of respondents is around 11,000 and variables - around 2,000... Do you think this could be an issue..?

The paper that I mentioned in my post says to use ACECLUS before the actual PROC CLUSTER.. So I was confused, whether I can do this with my data.

Ask a Question
Discussion stats
  • 2 replies
  • 389 views
  • 3 likes
  • 2 in conversation