Programming the statistical procedures from SAS

Any idea on how to generate customized Hpsplit Tree Rule?

Accepted Solution Solved
Reply
Occasional Contributor Vi_
Occasional Contributor
Posts: 10
Accepted Solution

Any idea on how to generate customized Hpsplit Tree Rule?

Hi, 

 

if specific output nodestates= option in Proc HPSPLIT, it will give you a table that I think is the key to generate the tree rule.

 

Basically, I need a code that can read like when Node(ID column)=3, parent node (PARENT column)=1, go back to ID column and find the rule (DECISION column) for ID=1 recursively until reach root node.

 

image.png

 

Any suggestion?

 

(I am using Enterprise Guide 7.1)

 

Thank you in advance,

 

Vi

 

 

 

 

 

 

 


Accepted Solutions
Solution
‎05-31-2018 01:57 PM
Esteemed Advisor
Posts: 5,540

Re: Any idea on how to generate customized Hpsplit Tree Rule?

Understood. This code fragment might be useful. I did this some time ago to format the rules from a binary tree (it probably doesn't work for just any decision tree)

 

proc hpsplit...;
...;
output nodestats = SplitNS;
...;
run;

proc sql;
create unique index Id on SplitNS (Id);
quit;

data Leaves;
set SplitNS;
if not missing(leaf) then do;
    leafNo = leaf;
    leafPop = n;
    value = predictedvalue;
    output;
    id = parent;
    do while (id >= 0);
        set SplitNS key=Id/unique;
        output;
        id = parent;
        end;
    end;
keep leafNo depth leafPop id decision insplitvar value;
run;

proc sort data=Leaves; by leafNo leafPop inSplitVar depth; run;

data LeafText;
length ds str $128;
do until(last.inSplitVar);
    set Leaves; by leafNo inSplitVar;
    /* Remove "or Missing" from decision since there were no missing value in data (optional) */
    decision = left(prxChange('s/or Missing//o', -1, decision));
    select (first(decision));
        when ('<') lt = min(lt, input(scan(substr(decision,2),1," "),best.));
        when ('>') ge = max(ge, input(scan(substr(decision,3),1," "),best.));
        otherwise if missing(ds) or length(ds) > length(decision) then ds = decision;
        end;
    end;

if cmiss(ge, lt) = 0    then str = catx(" - ", ge, lt);
else if not missing(ge) then str = catx(" ", ge, "-");
else if not missing(lt) then str = catx(" ", "-", lt);
else if not missing(ds) then str = ds;

keep leafNo leafPop value inSplitVar str; 
run;

proc transpose data=LeafText out=LeafCond(drop=_name_);
where inSplitVar is not missing;
by leafNo leafPop value;
id inSplitVar;
var str;
run;
PG

View solution in original post


All Replies
Esteemed Advisor
Posts: 5,540

Re: Any idea on how to generate customized Hpsplit Tree Rule?

HPSPLIT now has the RULES statement that creates a text version of the rules that define the leaves of the final tree

 

http://documentation.sas.com/?docsetId=statug&docsetTarget=statug_hpsplit_syntax11.htm&docsetVersion...

PG
Occasional Contributor Vi_
Occasional Contributor
Posts: 10

Re: Any idea on how to generate customized Hpsplit Tree Rule?

Hi PGStats,

Thank you for your response. I did try to use rule file but I have issue calling the txt file from my sas server library because at the end, the rule files have to be included in my pdf report. Additionally, since I am only interested in certain rules, I think if I can create my own tree rules from Nodestates file, it would be easier.


( In my case, I have 72 rule files needed to include in pdf report and I’ll have to manually clean the rules I don’t need and copy paste into pdf if I did not figure out any easier way to do it).


Solution
‎05-31-2018 01:57 PM
Esteemed Advisor
Posts: 5,540

Re: Any idea on how to generate customized Hpsplit Tree Rule?

Understood. This code fragment might be useful. I did this some time ago to format the rules from a binary tree (it probably doesn't work for just any decision tree)

 

proc hpsplit...;
...;
output nodestats = SplitNS;
...;
run;

proc sql;
create unique index Id on SplitNS (Id);
quit;

data Leaves;
set SplitNS;
if not missing(leaf) then do;
    leafNo = leaf;
    leafPop = n;
    value = predictedvalue;
    output;
    id = parent;
    do while (id >= 0);
        set SplitNS key=Id/unique;
        output;
        id = parent;
        end;
    end;
keep leafNo depth leafPop id decision insplitvar value;
run;

proc sort data=Leaves; by leafNo leafPop inSplitVar depth; run;

data LeafText;
length ds str $128;
do until(last.inSplitVar);
    set Leaves; by leafNo inSplitVar;
    /* Remove "or Missing" from decision since there were no missing value in data (optional) */
    decision = left(prxChange('s/or Missing//o', -1, decision));
    select (first(decision));
        when ('<') lt = min(lt, input(scan(substr(decision,2),1," "),best.));
        when ('>') ge = max(ge, input(scan(substr(decision,3),1," "),best.));
        otherwise if missing(ds) or length(ds) > length(decision) then ds = decision;
        end;
    end;

if cmiss(ge, lt) = 0    then str = catx(" - ", ge, lt);
else if not missing(ge) then str = catx(" ", ge, "-");
else if not missing(lt) then str = catx(" ", "-", lt);
else if not missing(ds) then str = ds;

keep leafNo leafPop value inSplitVar str; 
run;

proc transpose data=LeafText out=LeafCond(drop=_name_);
where inSplitVar is not missing;
by leafNo leafPop value;
id inSplitVar;
var str;
run;
PG
Occasional Contributor Vi_
Occasional Contributor
Posts: 10

Re: Any idea on how to generate customized Hpsplit Tree Rule?

Hi PG,

 

Thank you and your code works in my case. But I have some questions and hopefully you can help me on them. 

 

1. why create unique index on ID column. Aren't they already unique?

2. I don't fully understand this part of code:  (why when ID=Parent do something? and what's the code doing under this part?)

    id = parent;
    do while (id >= 0);
        set Node key=Id/unique;
        output;
        id = parent;
        end;

 

Thank you so much,

 

Vi

 

Esteemed Advisor
Posts: 5,540

Re: Any idea on how to generate customized Hpsplit Tree Rule?

1) The index is created to allow random access to the parent records.

 

2) This is precisely where the index comes in. This loop goes up the parent list (from the leaf to the root of the tree) and outputs every record along the way.The code

 

id = parent;
set Node key=Id/unique;

replaces the current record with the parent of the current record.

 

PG
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 214 views
  • 0 likes
  • 2 in conversation