Hi all,
I have trained a random forest on continuous data. I now want to extract the points the variables were split on to be able to use these split points to bin/discretize my variables.
Your help or suggestions would be much appreciated!
Thanks
I don't think this is possible for PROC HPFOREST (which creates a forest of decision trees). However, for a single tree model, you can get this information from PROC HPSPLIT.
The instructions below describe how to extract the split points from a trained Random Forest model and use them to bin or discretize your variables. This method will help you convert your continuous data into discrete bins based on the decision limits utilized by your Random Forest model.
Each tree in a Random Forest can be accessed independently. Scikit-learn, for example, lets you to view each decision tree in the Random Forest model.
from sklearn.ensemble import RandomForestRegressor # Assuming rf is your trained Random Forest model trees = rf.estimators_
The tree_ property provides access to the split points of each tree. This attribute stores the decision tree topology, including the split points for each node.
split_points = [] for tree in trees: tree_ = tree.tree_ threshold = tree_.threshold split_points.extend(threshold[threshold != -2.0])
Because several trees may split at the same spots, you must remove duplicates and arrange the split points.
unique_split_points = sorted(set(split_points))
Discretize the variables. Bin your continuous variables based on their distinct split points. This can be accomplished using a variety of approaches, including pandas' cut function.
import pandas as pd binned_data = pd.cut(your_continuous_variable, bins=unique_split_points, include_lowest=True)
We hope this detailed instruction assists you in extracting split points from your Random Forest model and using them to discretize your variables. If you have any further queries or require any support, please ask. Thank you for approaching us, we are delighted to help!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.