Solved: Re: How to keep a certain level of the clusters from de Proc Cluster

Mathis1 · Posted 05-18-2020 06:24 AM

Hi,

I ran a proc cluster on the proc corresp output. I obtain a tree (see picture below), and the output table (see table attached).

In the table (Ward), I have in front of each of my variables, the name of (what I believe to be) the smallest cluster I can get. What i'd like, one of my most cherished dreams, is having several columns in my Ward table, each column representing one level of clustering.
For instance Var_Level_1 : Cluster_1 ; Cluster_2

Var_Level_2 : Cluster_1_1 ; Cluster_1_2 ; Cluster_2_1 ; Cluster_2_2 etc...

Thank you by advance for your help 🙂

RichardDeVen · Posted 05-18-2020 11:35 AM

A set of _NAME_ _PARENT_ data nodes define a hierarchical tree. From node X, following the data up the parent linkage, to the root is an ancestors path. You want data columns , 1..N, containing the names from root to X. The data node can be stored in a DATA step HASH object and ancestral path traversal can be performed using a series of FIND() operations.

The number of steps it takes to traverse from a given node to the root is the depth of the node.
Suppose you have some longest path P > Q > R > S > T > U > V > W which is length depth 7.
- You know you will need to have 7 columns to capture the pieces of the longest path.
- This determination (computation) needs to be done before a final DATA step.
Suppose you have another path A > B > C > D of depth 3.
- Traversing up the parent links from D you have
  - step 1 C
  - step 2 B
  - step 3 A
  - step 4 done
- Arrayed, the traversal is in reverse order C B A, you want the data as A B C.
  - The data captured during traversal needs to be reversed

Example:

* pass 1 - compute number of columns needed;
* find longest path;
data _null_;
  if 0 then set download.ward; * prep pdv;

  declare hash links (dataset:'download.ward');
  links.defineKey('_NAME_');
  links.defineData('_PARENT_');
  links.defineDone();

  declare hiter iter('links');

  do index=1 by 1 while (iter.next() = 0);
    do step = 1 by 1 while (links.find(key:_parent_) eq 0);
      if _parent_ = '' then leave;
    end;
    max_step = max (max_step, step);
  end;

  call symput ('MAX_DEPTH', cats(max_step));
run;                               

%put NOTE: &=MAX_DEPTH;

* final step;
data have;
  length depth 8;

  length level1-level&MAX_DEPTH $25;
  array level level1-level&MAX_DEPTH;

  set download.ward;

  if _n_ = 1 then do;
    declare hash links (dataset:'download.ward');
    links.defineKey('_NAME_');
    links.defineData('_PARENT_');
    links.defineDone();     
  end;

  * determine depth (number of steps to root) and capture tiers;
  * in this loop 'level' actually means 'ancestor';
  orig_parent = _parent_;
  level(1) = _parent_;
  do step = 2 by 1 while (links.find(key:_parent_) eq 0);
    if _parent_ = '' then leave;
    level(step) = _parent_;
  end;
  _parent_ = orig_parent;
  depth = step - 1;

  * reverse the the captured tiers;
  do step = 1 to depth/2;

    opstep = depth - step + 1;

    hold = level(step);
    level(step) = level(opstep);
    level(opstep) = hold;
  end;

  drop step opstep hold orig_parent;
run;

View solution in original post

RichardDeVen · Posted 05-18-2020 11:35 AM

A set of _NAME_ _PARENT_ data nodes define a hierarchical tree. From node X, following the data up the parent linkage, to the root is an ancestors path. You want data columns , 1..N, containing the names from root to X. The data node can be stored in a DATA step HASH object and ancestral path traversal can be performed using a series of FIND() operations.

The number of steps it takes to traverse from a given node to the root is the depth of the node.
Suppose you have some longest path P > Q > R > S > T > U > V > W which is length depth 7.
- You know you will need to have 7 columns to capture the pieces of the longest path.
- This determination (computation) needs to be done before a final DATA step.
Suppose you have another path A > B > C > D of depth 3.
- Traversing up the parent links from D you have
  - step 1 C
  - step 2 B
  - step 3 A
  - step 4 done
- Arrayed, the traversal is in reverse order C B A, you want the data as A B C.
  - The data captured during traversal needs to be reversed

Example:

* pass 1 - compute number of columns needed;
* find longest path;
data _null_;
  if 0 then set download.ward; * prep pdv;

  declare hash links (dataset:'download.ward');
  links.defineKey('_NAME_');
  links.defineData('_PARENT_');
  links.defineDone();

  declare hiter iter('links');

  do index=1 by 1 while (iter.next() = 0);
    do step = 1 by 1 while (links.find(key:_parent_) eq 0);
      if _parent_ = '' then leave;
    end;
    max_step = max (max_step, step);
  end;

  call symput ('MAX_DEPTH', cats(max_step));
run;                               

%put NOTE: &=MAX_DEPTH;

* final step;
data have;
  length depth 8;

  length level1-level&MAX_DEPTH $25;
  array level level1-level&MAX_DEPTH;

  set download.ward;

  if _n_ = 1 then do;
    declare hash links (dataset:'download.ward');
    links.defineKey('_NAME_');
    links.defineData('_PARENT_');
    links.defineDone();     
  end;

  * determine depth (number of steps to root) and capture tiers;
  * in this loop 'level' actually means 'ancestor';
  orig_parent = _parent_;
  level(1) = _parent_;
  do step = 2 by 1 while (links.find(key:_parent_) eq 0);
    if _parent_ = '' then leave;
    level(step) = _parent_;
  end;
  _parent_ = orig_parent;
  depth = step - 1;

  * reverse the the captured tiers;
  do step = 1 to depth/2;

    opstep = depth - step + 1;

    hold = level(step);
    level(step) = level(opstep);
    level(opstep) = hold;
  end;

  drop step opstep hold orig_parent;
run;

How to keep a certain level of the clusters from de Proc Cluster

Re: How to keep a certain level of the clusters from de Proc Cluster

Re: How to keep a certain level of the clusters from de Proc Cluster

SAS Innovate 2025: Call for Content