I have a question regarding how to appropriately represent my data within SAS code and if the code I am using is correct.
I have sorted each data point according to stage (E=Early, M=Middle, L=Late) and support type (Tangible, Emotional, Informational, and Companionship). I created a matrix within Excel that shows the association of stage to support type and it seems that clusters have formed.
Below, I have provided the SAS code I used. The "1" indicates the support type is provided at the specified stage, whereas "0" indicates the support type is not provided. I am looking to see if there are any relationships between the support types themselves and/or stages. Is this the correct way to input this type of data?
dm out 'clear';
dm log 'clear';
ods html close;
ods html;
data observations ;
input observation & stage $ TANGIBLE EMOTIONAL INFORMATIONAL COMPANIONSHIP;
cards;
1 E 1 0 1 0
2 E 1 0 1 0
3 M 1 0 1 0
4 E 1 0 1 1
5 E 0 0 1 1
6 E 1 0 0 1
7 L 1 1 1 0
8 E 1 0 1 1
9 L 1 0 0 1
10 E 1 0 1 1
11 E 1 0 1 1
12 E 1 0 1 0
13 E 1 0 1 0
14 M 1 0 1 0
15 E 1 0 1 0
16 E 1 0 1 1
17 M 1 1 1 1
18 M 1 0 1 0
19 M 1 0 1 0
20 M 1 1 0 0
;
proc cluster s method=average pseudo outtree=tree;
var TANGIBLE EMOTIONAL INFORMATIONAL COMPANIONSHIP;
id observation;
run;
proc tree sort height = NCL;
run;
proc tree noprint data=tree out=treeout nclusters = 4;
copy TANGIBLE EMOTIONAL INFORMATIONAL COMPANIONSHIP;
id observation;
run;
proc sort data = treeout; by cluster;
run;
proc print data = treeout;
id observation;
variables cluster;
by cluster;
run;
proc tree noprint data = tree out = treeout nclusters = 3;
copy TANGIBLE EMOTIONAL INFORMATIONAL COMPANIONSHIP;
id observation;
run;
proc sort data = treeout; by cluster;
run;
proc print data = treeout;
id observation;
variables cluster;
by cluster;
run;
proc tree noprint data = tree out = treeout nclusters = 5;
copy TANGIBLE EMOTIONAL INFORMATIONAL COMPANIONSHIP;
id observation;
run;
proc sort data = treeout; by cluster;
run;
proc print data = treeout;
id observation;
variables cluster;
by cluster;
run;
Is this sample data? I'd be considered about the low rates of the 1s in emotional and companionship.
Cn you do a distance type matrix first to see the overlaps.
ie
N Tangible Emotional Informational Companionship
Tangible
Emotional
Informational
Companionship
I think PROC CORR generates that. Then I'd consider looking at the slices for each of your Stages to see if there are different metrics.
Then I would look at a tree or splitting protocol. I'm not sure I'd use a cluster procedure because that's typically expecting continuous data. Also, make sure to specify the DATA= on your procs so you can be absolutely sure that the correct data is being used, otherwise that's a good way to accidentally mess up your entire analysis.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.