The Gradient Boosting node is on the Model tab of the SAS Enterprise Miner Toolbar for training a gradient boosting model, a model created by a sequence of decision trees that together form a single predictive model. A tree in the sequence is fit to the residuals of the predictions from the earlier trees in the sequence. The residuals are calculated in terms of the derivative of a loss function. The resulting ensemble model that averages together the predictions from the decision trees often outperforms (in terms of prediction accuracy) other machine learning algorithms, making gradient boosting extremely popular.
With this modeling algorithm comes several hyperparameters. When training a gradient boosting model in SAS Enterprise Miner, some of these hyperparameters are exposed to you as properties of the Gradient Boosting node, and of course, their optimal values are data dependent. The hyperparameters for a gradient boosting model can be divided into categories: those related to growing the decision trees (primarily in the Splitting Rule, Node, and Split Search property groups) and those related to the boosting process (primarily in the Series Options group). Here are recommendations provided by several of my SAS colleagues for how to adjust properties representing the hyperparameters in each of these categories to help you train an accurate but generalizable gradient boosting model. Note that my colleague Brett Wujek posted a similar tip for the HP Forest node that you might find helpful if you are fitting forest models in SAS Enterprise Miner.
Increasing the size of the decision trees used in a gradient boosting model can improve the accuracy of your predictions, though at the risk of overfitting your training data. Here are some properties in the "Splitting Rules" and "Node" sections that you can try tweaking in the following ways to grow larger trees:
As with the size of the tree, the length of the boosting process can affect your prediction accuracy, but also can introduce overfitting. This makes it important to use a validation partition and check the Subseries Plot in the results to make sure you have a generalizable model. With that in mind, you can try the following modifications to the Gradient Boosting node property values that are in the "Series Options" section:
The Gradient Boosting node bins interval inputs for creating decision tree splits and surfaces a property Interval Bins to control the number of equal-width bins to use. If you have skewed interval inputs however, it can be better to use other binning methods like quantile binning or tree-based binning (with respect to your target). To bin your skewed interval inputs, you can use either the Interactive Binning node to perform quantile binning or the Transform Variables node to perform quantile or tree-based (“Optimal”) binning prior to running the Gradient Boosting node.
Finally, if you are using SAS Visual Data Mining and Machine Learning on SAS Viya, the GRADBOOST procedure is available for training gradient boosting models. You can access this procedure in SAS Studio either programmatically or through the Gradient Boosting task, or include it in your Model Studio pipeline with the Gradient Boosting node. Whichever way, you have access to the “autotuning” capabilities that help find the optimal values of hyperparameters via a variety of search methods, including grid search and Latin hypercube sample search. You can also incorporate the GRADBOOST procedure with autotuning enabled into your process flow diagram in SAS Enterprise Miner; see my tip on the SAS Viya Code node to see how.
Acknowledgments: thank you to Tobias Kuhn, Padraic Neville, Ralph Abbey, Sanford Gayle, and Lorne Rothman for their input on this post.
In SAS Enterprise miner Gradient boosting, for a binary target using square loss function and best assesment value = average square error, I get , the scoring code these two variables are initialized _ARB_F = 1.09944342 and _ARBBAD_F = 0
For a interval target I found _ARB_F is initialized mean of my target variable but in the case of binary target how is _ARB_F initialised please help?
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.