Solved: Any idea on how to generate customized Hpsplit Tree Rule?

Vi_ · Posted 05-25-2018 02:24 PM

Hi,

if specific output nodestates= option in Proc HPSPLIT, it will give you a table that I think is the key to generate the tree rule.

Basically, I need a code that can read like when Node(ID column)=3, parent node (PARENT column)=1, go back to ID column and find the rule (DECISION column) for ID=1 recursively until reach root node.

Any suggestion?

(I am using Enterprise Guide 7.1)

Thank you in advance,

Vi

PGStats · Posted 05-26-2018 05:13 PM

Understood. This code fragment might be useful. I did this some time ago to format the rules from a binary tree (it probably doesn't work for just any decision tree)

proc hpsplit...;
...;
output nodestats = SplitNS;
...;
run;

proc sql;
create unique index Id on SplitNS (Id);
quit;

data Leaves;
set SplitNS;
if not missing(leaf) then do;
    leafNo = leaf;
    leafPop = n;
    value = predictedvalue;
    output;
    id = parent;
    do while (id >= 0);
        set SplitNS key=Id/unique;
        output;
        id = parent;
        end;
    end;
keep leafNo depth leafPop id decision insplitvar value;
run;

proc sort data=Leaves; by leafNo leafPop inSplitVar depth; run;

data LeafText;
length ds str $128;
do until(last.inSplitVar);
    set Leaves; by leafNo inSplitVar;
    /* Remove "or Missing" from decision since there were no missing value in data (optional) */
    decision = left(prxChange('s/or Missing//o', -1, decision));
    select (first(decision));
        when ('<') lt = min(lt, input(scan(substr(decision,2),1," "),best.));
        when ('>') ge = max(ge, input(scan(substr(decision,3),1," "),best.));
        otherwise if missing(ds) or length(ds) > length(decision) then ds = decision;
        end;
    end;

if cmiss(ge, lt) = 0    then str = catx(" - ", ge, lt);
else if not missing(ge) then str = catx(" ", ge, "-");
else if not missing(lt) then str = catx(" ", "-", lt);
else if not missing(ds) then str = ds;

keep leafNo leafPop value inSplitVar str; 
run;

proc transpose data=LeafText out=LeafCond(drop=_name_);
where inSplitVar is not missing;
by leafNo leafPop value;
id inSplitVar;
var str;
run;

PG

View solution in original post

PGStats · Posted 05-26-2018 04:27 PM

HPSPLIT now has the RULES statement that creates a text version of the rules that define the leaves of the final tree

http://documentation.sas.com/?docsetId=statug&docsetTarget=statug_hpsplit_syntax11.htm&docsetVersion...

PG

Vi_ · Posted 05-26-2018 04:50 PM

Hi PGStats,

Thank you for your response. I did try to use rule file but I have issue calling the txt file from my sas server library because at the end, the rule files have to be included in my pdf report. Additionally, since I am only interested in certain rules, I think if I can create my own tree rules from Nodestates file, it would be easier.

( In my case, I have 72 rule files needed to include in pdf report and I’ll have to manually clean the rules I don’t need and copy paste into pdf if I did not figure out any easier way to do it).

PGStats · Posted 05-26-2018 05:13 PM

Understood. This code fragment might be useful. I did this some time ago to format the rules from a binary tree (it probably doesn't work for just any decision tree)

proc hpsplit...;
...;
output nodestats = SplitNS;
...;
run;

proc sql;
create unique index Id on SplitNS (Id);
quit;

data Leaves;
set SplitNS;
if not missing(leaf) then do;
    leafNo = leaf;
    leafPop = n;
    value = predictedvalue;
    output;
    id = parent;
    do while (id >= 0);
        set SplitNS key=Id/unique;
        output;
        id = parent;
        end;
    end;
keep leafNo depth leafPop id decision insplitvar value;
run;

proc sort data=Leaves; by leafNo leafPop inSplitVar depth; run;

data LeafText;
length ds str $128;
do until(last.inSplitVar);
    set Leaves; by leafNo inSplitVar;
    /* Remove "or Missing" from decision since there were no missing value in data (optional) */
    decision = left(prxChange('s/or Missing//o', -1, decision));
    select (first(decision));
        when ('<') lt = min(lt, input(scan(substr(decision,2),1," "),best.));
        when ('>') ge = max(ge, input(scan(substr(decision,3),1," "),best.));
        otherwise if missing(ds) or length(ds) > length(decision) then ds = decision;
        end;
    end;

if cmiss(ge, lt) = 0    then str = catx(" - ", ge, lt);
else if not missing(ge) then str = catx(" ", ge, "-");
else if not missing(lt) then str = catx(" ", "-", lt);
else if not missing(ds) then str = ds;

keep leafNo leafPop value inSplitVar str; 
run;

proc transpose data=LeafText out=LeafCond(drop=_name_);
where inSplitVar is not missing;
by leafNo leafPop value;
id inSplitVar;
var str;
run;

PG

Vi_ · Posted 05-27-2018 01:42 PM

Hi PG,

Thank you and your code works in my case. But I have some questions and hopefully you can help me on them.

1. why create unique index on ID column. Aren't they already unique?

2. I don't fully understand this part of code: (why when ID=Parent do something? and what's the code doing under this part?)

    id = parent;
    do while (id >= 0);
        set Node key=Id/unique;
        output;
        id = parent;
        end;

Thank you so much,

Vi

PGStats · Posted 05-27-2018 03:38 PM

1) The index is created to allow random access to the parent records.

2) This is precisely where the index comes in. This loop goes up the parent list (from the leaf to the root of the tree) and outputs every record along the way.The code

id = parent;
set Node key=Id/unique;

replaces the current record with the parent of the current record.

PG

Any idea on how to generate customized Hpsplit Tree Rule?

Re: Any idea on how to generate customized Hpsplit Tree Rule?

Re: Any idea on how to generate customized Hpsplit Tree Rule?

Re: Any idea on how to generate customized Hpsplit Tree Rule?

Re: Any idea on how to generate customized Hpsplit Tree Rule?

Re: Any idea on how to generate customized Hpsplit Tree Rule?

Re: Any idea on how to generate customized Hpsplit Tree Rule?

Catch up on SAS Innovate 2026