BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
onechrister
Calcite | Level 5

Hi,

 

I use PROC CLUSTER and PROC TREE to do hierarchical clustering and draw dendrograms. My input data typically features around thousand observations. As part of the analysis I output (using the outtree= option) the hierarchy of linkages (denoted as CL1, CL2, CL3,…) between the observations (e.g., Obs1, Obs2, Obs3,…). This information is provided in the output variables _NAME_ and _PARENT_ .

 

My question is: how can I get the hierarchy of linkages (i.e., notes) listed for each observation, as a string? And can this be done in a generic manner such that it is applicable to data sets with thousands of observations?

 

Simple example: one has a data set with ten observations. Using PROC CLUSTER one gets an output specifying the following linkages between nodes and observations:

 

_NAME_ _PARENT_ 

CL1        

CL2         CL1

CL3         CL1

CL4         CL2

CL5         CL3

CL6         CL4

CL7         CL5

CL8         CL4

CL9         CL2

Obs1       CL9

Obs10    CL7

Obs2       CL9

Obs3       CL3

Obs4       CL6

Obs5       CL6

Obs6       CL7

Obs7       CL5

Obs8       CL8

Obs9       CL8

 

I would like to generate (in a generic manner) an additional output specifying the hierarchy (sequence) of nodes for all the observations. Like this:

 

Observation           Hierarchy

Obs1       CL1_CL2_CL9

Obs2       CL1_CL2_CL9

Obs4       CL1_CL2_CL4_CL6

Obs5       CL1_CL2_CL4_CL6

Obs8       CL1_CL2_CL4_CL8

Obs9       CL1_CL2_CL4_CL8

Obs10    CL1_CL3_CL5_CL7

Obs6       CL1_CL3_CL5_CL7

Obs7       CL1_CL3_CL5

Obs3       CL1_CL3

 

I would greatly appreciate if someone could point me towards a solution?

 

Thanks a lot in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Hash objects are efficient for this kind of processing. Try :

 

data want;
if 0 then set have;
length hier $100;
if _N_ = 1 then do;  
    declare hash h(dataset: "have");
    h.definekey("_NAME_");
    h.definedata("_NAME_", "_PARENT_");
    h.definedone();
end;
declare hiter hList('h');
_rc = hList.first();
do until(hList.next() ne 0);
    if substr(_name_,1,2) ne "CL" then do;
        n = _name_; hier = _parent_;
        do while(h.find(key:_PARENT_) = 0);
            hier = catx("_", _parent_, hier);
            end;
        output;
        end;
    end;
stop;
drop _name_ _parent_ _rc;
rename n=_name_;
run;
PG

View solution in original post

2 REPLIES 2
PGStats
Opal | Level 21

Hash objects are efficient for this kind of processing. Try :

 

data want;
if 0 then set have;
length hier $100;
if _N_ = 1 then do;  
    declare hash h(dataset: "have");
    h.definekey("_NAME_");
    h.definedata("_NAME_", "_PARENT_");
    h.definedone();
end;
declare hiter hList('h');
_rc = hList.first();
do until(hList.next() ne 0);
    if substr(_name_,1,2) ne "CL" then do;
        n = _name_; hier = _parent_;
        do while(h.find(key:_PARENT_) = 0);
            hier = catx("_", _parent_, hier);
            end;
        output;
        end;
    end;
stop;
drop _name_ _parent_ _rc;
rename n=_name_;
run;
PG
onechrister
Calcite | Level 5

Thank you very much!

It works like a charm!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 374 views
  • 0 likes
  • 2 in conversation