Hi,
I use PROC CLUSTER and PROC TREE to do hierarchical clustering and draw dendrograms. My input data typically features around thousand observations. As part of the analysis I output (using the outtree= option) the hierarchy of linkages (denoted as CL1, CL2, CL3,…) between the observations (e.g., Obs1, Obs2, Obs3,…). This information is provided in the output variables _NAME_ and _PARENT_ .
My question is: how can I get the hierarchy of linkages (i.e., notes) listed for each observation, as a string? And can this be done in a generic manner such that it is applicable to data sets with thousands of observations?
Simple example: one has a data set with ten observations. Using PROC CLUSTER one gets an output specifying the following linkages between nodes and observations:
_NAME_ _PARENT_
CL1
CL2 CL1
CL3 CL1
CL4 CL2
CL5 CL3
CL6 CL4
CL7 CL5
CL8 CL4
CL9 CL2
Obs1 CL9
Obs10 CL7
Obs2 CL9
Obs3 CL3
Obs4 CL6
Obs5 CL6
Obs6 CL7
Obs7 CL5
Obs8 CL8
Obs9 CL8
I would like to generate (in a generic manner) an additional output specifying the hierarchy (sequence) of nodes for all the observations. Like this:
Observation Hierarchy
Obs1 CL1_CL2_CL9
Obs2 CL1_CL2_CL9
Obs4 CL1_CL2_CL4_CL6
Obs5 CL1_CL2_CL4_CL6
Obs8 CL1_CL2_CL4_CL8
Obs9 CL1_CL2_CL4_CL8
Obs10 CL1_CL3_CL5_CL7
Obs6 CL1_CL3_CL5_CL7
Obs7 CL1_CL3_CL5
Obs3 CL1_CL3
I would greatly appreciate if someone could point me towards a solution?
Thanks a lot in advance!