I have a question regarding how to appropriately represent my data within SAS code and if the code I am using is correct.
I have sorted each data point according to stage (E=Early, M=Middle, L=Late) and support type (Tangible, Emotional, Informational, and Companionship). I created a matrix within Excel that shows the association of stage to support type and it seems that clusters have formed.
Below, I have provided the SAS code I used. The "1" indicates the support type is provided at the specified stage, whereas "0" indicates the support type is not provided. I am looking to see if there are any relationships between the support types themselves and/or stages. Is this the correct way to input this type of data?
dm out 'clear';
dm log 'clear';
ods html close;
ods html;
data observations ;
input observation & stage $ TANGIBLE EMOTIONAL INFORMATIONAL COMPANIONSHIP;
cards;
1 E 1 0 1 0
2 E 1 0 1 0
3 M 1 0 1 0
4 E 1 0 1 1
5 E 0 0 1 1
6 E 1 0 0 1
7 L 1 1 1 0
8 E 1 0 1 1
9 L 1 0 0 1
10 E 1 0 1 1
11 E 1 0 1 1
12 E 1 0 1 0
13 E 1 0 1 0
14 M 1 0 1 0
15 E 1 0 1 0
16 E 1 0 1 1
17 M 1 1 1 1
18 M 1 0 1 0
19 M 1 0 1 0
20 M 1 1 0 0
;
proc cluster s method=average pseudo outtree=tree;
var TANGIBLE EMOTIONAL INFORMATIONAL COMPANIONSHIP;
id observation;
run;
proc tree sort height = NCL;
run;
proc tree noprint data=tree out=treeout nclusters = 4;
copy TANGIBLE EMOTIONAL INFORMATIONAL COMPANIONSHIP;
id observation;
run;
proc sort data = treeout; by cluster;
run;
proc print data = treeout;
id observation;
variables cluster;
by cluster;
run;
proc tree noprint data = tree out = treeout nclusters = 3;
copy TANGIBLE EMOTIONAL INFORMATIONAL COMPANIONSHIP;
id observation;
run;
proc sort data = treeout; by cluster;
run;
proc print data = treeout;
id observation;
variables cluster;
by cluster;
run;
proc tree noprint data = tree out = treeout nclusters = 5;
copy TANGIBLE EMOTIONAL INFORMATIONAL COMPANIONSHIP;
id observation;
run;
proc sort data = treeout; by cluster;
run;
proc print data = treeout;
id observation;
variables cluster;
by cluster;
run;
Is this sample data? I'd be considered about the low rates of the 1s in emotional and companionship.
Cn you do a distance type matrix first to see the overlaps.
ie
N Tangible Emotional Informational Companionship
Tangible
Emotional
Informational
Companionship
I think PROC CORR generates that. Then I'd consider looking at the slices for each of your Stages to see if there are different metrics.
Then I would look at a tree or splitting protocol. I'm not sure I'd use a cluster procedure because that's typically expecting continuous data. Also, make sure to specify the DATA= on your procs so you can be absolutely sure that the correct data is being used, otherwise that's a good way to accidentally mess up your entire analysis.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.