Cluster Analysis Data Input and Proper Command

Reply
New Contributor
Posts: 2

Cluster Analysis Data Input and Proper Command

I have a question regarding how to appropriately represent my data within SAS code and if the code I am using is correct.

 

I have sorted each data point according to stage (E=Early, M=Middle, L=Late) and support type (Tangible, Emotional, Informational, and Companionship). I created a matrix within Excel that shows the association of stage to support type and it seems that clusters have formed.

 

Below, I have provided the SAS code I used. The "1" indicates the support type is provided at the specified stage, whereas "0" indicates the support type is not provided.  I am looking to see if there are any relationships between the support types themselves and/or stages. Is this the correct way to input this type of data?

 

 

dm out 'clear';

dm log 'clear';

ods html close;

ods html;

data observations ;

input observation & stage $ TANGIBLE EMOTIONAL INFORMATIONAL COMPANIONSHIP;

cards;

1     E     1     0     1     0

2     E     1     0     1     0

3     M     1     0     1     0

4     E     1     0     1     1

5     E     0     0     1     1

6     E     1     0     0     1

7     L     1     1     1     0

8     E     1     0     1     1

9     L     1     0     0     1

10    E     1     0     1     1

11    E     1     0     1     1

12    E     1     0     1     0

13    E     1     0     1     0

14    M     1     0     1     0

15    E     1     0     1     0

16    E     1     0     1     1

17    M     1     1     1     1

18    M     1     0     1     0

19    M     1     0     1     0

20    M     1     1     0     0

 

;

proc cluster s method=average pseudo outtree=tree;

var   TANGIBLE EMOTIONAL INFORMATIONAL COMPANIONSHIP;

id observation;

run;

proc tree sort height = NCL;

run;

proc tree noprint data=tree out=treeout nclusters = 4;

copy     TANGIBLE EMOTIONAL INFORMATIONAL COMPANIONSHIP;

id observation;

run;

proc sort data = treeout; by cluster;

run;

proc print data = treeout;

id observation;

variables cluster;

by cluster;

run;

proc tree noprint data = tree out = treeout nclusters = 3;

copy TANGIBLE EMOTIONAL INFORMATIONAL COMPANIONSHIP;

id observation;

run;

proc sort data = treeout; by cluster;

run;

proc print data = treeout;

id observation;

variables cluster;

by cluster;

run;

proc tree noprint data = tree out = treeout nclusters = 5;

copy   TANGIBLE EMOTIONAL INFORMATIONAL COMPANIONSHIP;

id observation;

run;

proc sort data = treeout; by cluster;

run;

proc print data = treeout;

id observation;

variables cluster;

by cluster;

run;

Super User
Posts: 23,752

Re: Cluster Analysis Data Input and Proper Command

Is this sample data? I'd be considered about the low rates of the 1s in emotional and companionship. 

 

Cn you do a distance type matrix first to see the overlaps. 

 

ie 

 

N                      Tangible Emotional Informational Companionship

Tangible 

Emotional 

Informational

Companionship

 

 

I think PROC CORR generates that. Then I'd consider looking at the slices for each of your Stages to see if there are different metrics. 

 

Then I would look at a tree or splitting protocol. I'm not sure I'd use a cluster procedure because that's typically expecting continuous data. Also, make sure to specify the DATA= on your procs so you can be absolutely sure that the correct data is being used, otherwise that's a good way to accidentally mess up your entire analysis. 

 

 

Ask a Question
Discussion stats
  • 1 reply
  • 82 views
  • 0 likes
  • 2 in conversation