BookmarkSubscribeRSS Feed
ngrundli
Calcite | Level 5

Hi all,

 

I have trained a random forest on continuous data. I now want to extract the points the variables were split on to be able to use these split points to bin/discretize my variables.

 

Your help or suggestions would be much appreciated!

 

Thanks

2 REPLIES 2
Mike_N
SAS Employee

I don't think this is possible for PROC HPFOREST (which creates a forest of decision trees). However, for a single tree model, you can get this information from PROC HPSPLIT. 

kodexolabs
Calcite | Level 5

The instructions below describe how to extract the split points from a trained Random Forest model and use them to bin or discretize your variables. This method will help you convert your continuous data into discrete bins based on the decision limits utilized by your Random Forest model.
Each tree in a Random Forest can be accessed independently. Scikit-learn, for example, lets you to view each decision tree in the Random Forest model.

from sklearn.ensemble import RandomForestRegressor

# Assuming rf is your trained Random Forest model
trees = rf.estimators_

The tree_ property provides access to the split points of each tree. This attribute stores the decision tree topology, including the split points for each node.

split_points = []
for tree in trees:
    tree_ = tree.tree_
    threshold = tree_.threshold
    split_points.extend(threshold[threshold != -2.0])

Because several trees may split at the same spots, you must remove duplicates and arrange the split points.

unique_split_points = sorted(set(split_points))

Discretize the variables. Bin your continuous variables based on their distinct split points. This can be accomplished using a variety of approaches, including pandas' cut function.

import pandas as pd

binned_data = pd.cut(your_continuous_variable, bins=unique_split_points, include_lowest=True)

We hope this detailed instruction assists you in extracting split points from your Random Forest model and using them to discretize your variables. If you have any further queries or require any support, please ask. Thank you for approaching us, we are delighted to help!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1010 views
  • 0 likes
  • 3 in conversation