BookmarkSubscribeRSS Feed
a2veeram
Calcite | Level 5

Hello,

I am rather new to SAS and trying to do a cluster analysis with my data.

I have respondents (SAMPLEID), items (NSS__) that they eat on a regular basis and their amounts. Each item is assigned to a code that's why you can see NSS11, NSS25 etc..

It looks like this:

SAMPLEIDNSS11NSS13NSS14NSS15NSS16NSS20NSS25NSS30NSS31NSS35NSS41NSS42NSS46NSS47NSS48NSS50NSS51NSS120NSS125NSS132NSS135
1111115886400003560256700000270168369000
1111120000000860003003567800002500150450
11111301007000004500458050000300280450000
1111144500570647931068000737906482000975368
1111155020400381205205000050600537500050290

I have a much larger dataset with many more variables. I would like to create clusters based on similarities in eating patterns. So here, for example, 111111 and 111113 will be in a cluster1, while 111114 and 111115 will be in a cluster2, etc.. based on similarities in their diet.

I tried FASTCLUS procedure for number of clusters from 3 to 10 and it seems that only 1 cluster has the majority of respondents, while others have only a few. Given the size of the dataset I would expect at least a few dominant dietary patterns (e.g. vegetarian or omnivore).

I tried ACECLUS procedure but after some time it gave me an error "Eigenvector computation failed"...

I am using McCarthy's paper on "Methodological approach to performing cluster analysis with SAS" - but it shows the example with only 3 variables in clustering a number of countries by similarities.

I wonder if I am doing the right procedures..?

I would greatly appreciate any suggestions or ideas!

Thank you very much,

Anastasia

2 REPLIES 2
Reeza
Super User

Things that might affect your cluster calculations are the size of your data set, having too many variables and having too few observations are common issues. Depending on your variables you may find combining variables to be a method that works.

There is also PROC CLUSTER that you can look into.

a2veeram
Calcite | Level 5

Hello Reeza,

Thank you for the prompt response.

Number of respondents is around 11,000 and variables - around 2,000... Do you think this could be an issue..?

The paper that I mentioned in my post says to use ACECLUS before the actual PROC CLUSTER.. So I was confused, whether I can do this with my data.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1625 views
  • 3 likes
  • 2 in conversation