BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mathis1
Quartz | Level 8

Hi, 

I ran a proc cluster on the proc corresp output. I obtain a tree (see picture below), and the output table (see table attached). Tree.PNG

 

 

In the table (Ward), I have in front of each of my variables, the name of (what I believe to be) the smallest cluster I can get. What i'd like,  one of my most cherished dreams, is having several columns in my Ward table, each column representing one level of clustering.
For instance Var_Level_1 : Cluster_1 ; Cluster_2

Var_Level_2 : Cluster_1_1 ; Cluster_1_2 ; Cluster_2_1 ; Cluster_2_2 etc...

 

 

Thank you by advance for your help 🙂

1 ACCEPTED SOLUTION

Accepted Solutions
RichardDeVen
Barite | Level 11

A set of _NAME_ _PARENT_ data nodes define a hierarchical tree. From node X, following the data up the parent linkage, to the root is an ancestors path.  You want data columns , 1..N, containing the names from root to X.  The data node can be stored in a DATA step HASH object and ancestral path traversal can be performed using a series of FIND() operations. 

 

  • The number of steps it takes to traverse from a given node to the root is the depth of the node.
  • Suppose you have some longest path P > Q > R > S > T > U > V > W which is length depth 7.
    • You know you will need to have 7 columns to capture the pieces of the longest path.
    • This determination (computation) needs to be done before a final DATA step.
  • Suppose you have another path A > B > C > D of depth 3.
    •  Traversing up the parent links from D you have
      • step 1 C 
      • step 2 B
      • step 3 A
      • step 4 done
    • Arrayed, the traversal is in reverse order C B A, you want the data as A B C.
      • The data captured during traversal needs to be reversed

Example:

 

* pass 1 - compute number of columns needed;
* find longest path;
data _null_;
  if 0 then set download.ward; * prep pdv;

  declare hash links (dataset:'download.ward');
  links.defineKey('_NAME_');
  links.defineData('_PARENT_');
  links.defineDone();

  declare hiter iter('links');

  do index=1 by 1 while (iter.next() = 0);
    do step = 1 by 1 while (links.find(key:_parent_) eq 0);
      if _parent_ = '' then leave;
    end;
    max_step = max (max_step, step);
  end;

  call symput ('MAX_DEPTH', cats(max_step));
run;                               

%put NOTE: &=MAX_DEPTH;

* final step;
data have;
  length depth 8;

  length level1-level&MAX_DEPTH $25;
  array level level1-level&MAX_DEPTH;

  set download.ward;

  if _n_ = 1 then do;
    declare hash links (dataset:'download.ward');
    links.defineKey('_NAME_');
    links.defineData('_PARENT_');
    links.defineDone();     
  end;

  * determine depth (number of steps to root) and capture tiers;
  * in this loop 'level' actually means 'ancestor';
  orig_parent = _parent_;
  level(1) = _parent_;
  do step = 2 by 1 while (links.find(key:_parent_) eq 0);
    if _parent_ = '' then leave;
    level(step) = _parent_;
  end;
  _parent_ = orig_parent;
  depth = step - 1;

  * reverse the the captured tiers;
  do step = 1 to depth/2;

    opstep = depth - step + 1;

    hold = level(step);
    level(step) = level(opstep);
    level(opstep) = hold;
  end;

  drop step opstep hold orig_parent;
run;                               

 

View solution in original post

1 REPLY 1
RichardDeVen
Barite | Level 11

A set of _NAME_ _PARENT_ data nodes define a hierarchical tree. From node X, following the data up the parent linkage, to the root is an ancestors path.  You want data columns , 1..N, containing the names from root to X.  The data node can be stored in a DATA step HASH object and ancestral path traversal can be performed using a series of FIND() operations. 

 

  • The number of steps it takes to traverse from a given node to the root is the depth of the node.
  • Suppose you have some longest path P > Q > R > S > T > U > V > W which is length depth 7.
    • You know you will need to have 7 columns to capture the pieces of the longest path.
    • This determination (computation) needs to be done before a final DATA step.
  • Suppose you have another path A > B > C > D of depth 3.
    •  Traversing up the parent links from D you have
      • step 1 C 
      • step 2 B
      • step 3 A
      • step 4 done
    • Arrayed, the traversal is in reverse order C B A, you want the data as A B C.
      • The data captured during traversal needs to be reversed

Example:

 

* pass 1 - compute number of columns needed;
* find longest path;
data _null_;
  if 0 then set download.ward; * prep pdv;

  declare hash links (dataset:'download.ward');
  links.defineKey('_NAME_');
  links.defineData('_PARENT_');
  links.defineDone();

  declare hiter iter('links');

  do index=1 by 1 while (iter.next() = 0);
    do step = 1 by 1 while (links.find(key:_parent_) eq 0);
      if _parent_ = '' then leave;
    end;
    max_step = max (max_step, step);
  end;

  call symput ('MAX_DEPTH', cats(max_step));
run;                               

%put NOTE: &=MAX_DEPTH;

* final step;
data have;
  length depth 8;

  length level1-level&MAX_DEPTH $25;
  array level level1-level&MAX_DEPTH;

  set download.ward;

  if _n_ = 1 then do;
    declare hash links (dataset:'download.ward');
    links.defineKey('_NAME_');
    links.defineData('_PARENT_');
    links.defineDone();     
  end;

  * determine depth (number of steps to root) and capture tiers;
  * in this loop 'level' actually means 'ancestor';
  orig_parent = _parent_;
  level(1) = _parent_;
  do step = 2 by 1 while (links.find(key:_parent_) eq 0);
    if _parent_ = '' then leave;
    level(step) = _parent_;
  end;
  _parent_ = orig_parent;
  depth = step - 1;

  * reverse the the captured tiers;
  do step = 1 to depth/2;

    opstep = depth - step + 1;

    hold = level(step);
    level(step) = level(opstep);
    level(opstep) = hold;
  end;

  drop step opstep hold orig_parent;
run;                               

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 417 views
  • 0 likes
  • 2 in conversation