Cluster Analysis Data Input and Proper Command

jnet7 · Posted 05-09-2018 04:28 PM

I have a question regarding how to appropriately represent my data within SAS code and if the code I am using is correct.

I have sorted each data point according to stage (E=Early, M=Middle, L=Late) and support type (Tangible, Emotional, Informational, and Companionship). I created a matrix within Excel that shows the association of stage to support type and it seems that clusters have formed.

Below, I have provided the SAS code I used. The "1" indicates the support type is provided at the specified stage, whereas "0" indicates the support type is not provided. I am looking to see if there are any relationships between the support types themselves and/or stages. Is this the correct way to input this type of data?

dm out 'clear';

dm log 'clear';

ods html close;

ods html;

data observations ;

input observation & stage $ TANGIBLE EMOTIONAL INFORMATIONAL COMPANIONSHIP;

cards;

1 E 1 0 1 0

2 E 1 0 1 0

3 M 1 0 1 0

4 E 1 0 1 1

5 E 0 0 1 1

6 E 1 0 0 1

7 L 1 1 1 0

8 E 1 0 1 1

9 L 1 0 0 1

10 E 1 0 1 1

11 E 1 0 1 1

12 E 1 0 1 0

13 E 1 0 1 0

14 M 1 0 1 0

15 E 1 0 1 0

16 E 1 0 1 1

17 M 1 1 1 1

18 M 1 0 1 0

19 M 1 0 1 0

20 M 1 1 0 0

;

proc cluster s method=average pseudo outtree=tree;

var TANGIBLE EMOTIONAL INFORMATIONAL COMPANIONSHIP;

id observation;

run;

proc tree sort height = NCL;

run;

proc tree noprint data=tree out=treeout nclusters = 4;

copy TANGIBLE EMOTIONAL INFORMATIONAL COMPANIONSHIP;

id observation;

run;

proc sort data = treeout; by cluster;

run;

proc print data = treeout;

id observation;

variables cluster;

by cluster;

run;

proc tree noprint data = tree out = treeout nclusters = 3;

copy TANGIBLE EMOTIONAL INFORMATIONAL COMPANIONSHIP;

id observation;

run;

proc sort data = treeout; by cluster;

run;

proc print data = treeout;

id observation;

variables cluster;

by cluster;

run;

proc tree noprint data = tree out = treeout nclusters = 5;

copy TANGIBLE EMOTIONAL INFORMATIONAL COMPANIONSHIP;

id observation;

run;

proc sort data = treeout; by cluster;

run;

proc print data = treeout;

id observation;

variables cluster;

by cluster;

run;

Reeza · Posted 05-09-2018 04:54 PM

Is this sample data? I'd be considered about the low rates of the 1s in emotional and companionship.

Cn you do a distance type matrix first to see the overlaps.

ie

N Tangible Emotional Informational Companionship

Tangible

Emotional

Informational

Companionship

I think PROC CORR generates that. Then I'd consider looking at the slices for each of your Stages to see if there are different metrics.

Then I would look at a tree or splitting protocol. I'm not sure I'd use a cluster procedure because that's typically expecting continuous data. Also, make sure to specify the DATA= on your procs so you can be absolutely sure that the correct data is being used, otherwise that's a good way to accidentally mess up your entire analysis.

Cluster Analysis Data Input and Proper Command

Re: Cluster Analysis Data Input and Proper Command

Registration is open

SAS Training: Just a Click Away