Programming the statistical procedures from SAS

Not understanding option MINLEAFSIZE in PROC HPSPLIT

Reply
Respected Advisor
Posts: 3,059

Not understanding option MINLEAFSIZE in PROC HPSPLIT

[ Edited ]

I run PROC HPSPLIT without MINLEAFSIZE option, I get a tree that makes some sense to me

proc hpsplit data=combined2 plots=zoomedtree(linkwidth=proportional nodes=("0" "7")) nodes=detail;
    class bad;
    model bad(event='1') = x1-x10;
    partition fraction(validate=0.25 seed=98390); /* Use random sample of 25% for validation */
    grow entropy;
    prune costcomplexity(leaves=10);
run;

Here is part of the decision tree. Note that both Node 1 and Node 2 are split further, and in each of the resulting splits, there are tens of thousands of observations.

Capture.PNG

 

Next, I add MINLEFASIZE=25 option into my code, changing nothing else in the code, because there are some leaves further down (not shown) that have very few observations that I want to prune away. I assume that this MINLEAFSIZE=25 option will eliminate splits that result in nodes with fewer than 25 observations (and this is clearly what the documentation for MINLEAFSIZE implies it will do). But with the MINLEAFSIZE=25 option and no other changes in the code, node 2 is not split even though the splits for node 2 have tens of thousands of observations. Can someone explain why this happens, since the splits to node 2 have tens of thousands of observations?

 

proc hpsplit data=combined2 plots=zoomedtree(linkwidth=proportional nodes=("0" "7")) nodes=detail 
    minleafsize=25; /* Changed code; added minleafsize=25 */
    class bad;
    model bad(event='1') = x1-x10;
    partition fraction(validate=0.25 seed=98390); /* Use random sample of 25% for validation */
    grow entropy;
    prune costcomplexity(leaves=10);
run;

Capture2.PNG

--
Paige Miller
Ask a Question
Discussion stats
  • 0 replies
  • 72 views
  • 0 likes
  • 1 in conversation