For the 2022.10 release (October 2022) of Model Studio, the very popular LightGBM gradient boosting framework has been added to Model Studio as an available supervised learning algorithm in the Gradient Boosting node. LightGBM is an open-source gradient boosting package developed by Microsoft, with its first release in 2016.
In Model Studio, because it is a variant of gradient boosting and shares many of its properties, the LightGBM algorithm has been integrated into the Gradient Boosting node. The following image contains a pipeline with the Gradient Boosting node. The Perform LightGBM checkbox, which is in the node properties, enables the LightGBM algorithm. When you select Perform LightGBM, the node displays the available LightGBM properties.
Clicking on the Run pipeline button at the top executes the Gradient Boosting node. Under the covers, the node executes the SAS LIGHTGRADBOOST procedure, which calls the lightGradBoost.lgbmTrain CAS action to run LightGBM. This trains the LightGBM model with the options that you specified, and produces training and assessment reports in the results. When you right-click on the Gradient Boosting node and select Results, the training reports are displayed on the Node tab. One of those reports is the Iteration History report, a line plot illustrating the change in training and validation accuracy as the boosting iterations (number of trees) increase. Note that the right-hand pane provides an automated description to help you interpret the plot.
An additional report is the Training Code report, which contains the Proc Lightgradboost training code. You can use this as an example syntax with which to train your own LightGBM models in SAS Studio.
Clicking on the Assessment tab in the results brings up a handful of model assessment reports. These reports assess the LightGBM model against all available data partitions, including Train, Validate, and Test, and are the standard assessment reports generated for any supervised learning node in Model Studio.
If you had selected post-training node properties to produce one or more Model Interpretability reports, these will display when you click on the Model Interpretability tab in the results. Gradient Boosting models, while very accurate, are not very interpretable, which makes these reports very important in understanding the LightGBM model. The reports displayed here include Surrogate Variable Importance, PD and ICE Plots (Partial Dependence and Individual Conditional Expectations), LIME Explanations (Local Interpretable Model-agnostic Explanations), and HyperSHAP Values (Shapley).
After exiting the node results, you can view and compare pipeline performance across pipelines by clicking on the Pipeline Comparison tab. Shown here are two LightGBM models that you can compare, with a flag that identifies the champion model. You can score new data by clicking your model and selecting Score holdout data from the Project pipeline menu (three vertical dots) at the top.
You can also do a side-by-side assessment comparison by selecting both models and clicking the “Compare” at the top, which produces assessment plots that include both models. And then you can register your model in Model Manager by selecting Register models from the Project pipeline menu. Once registered, you can maintain and track the performance of your model in Model Manager, in addition to publishing your model for deployment (you can also publish your model from the Project pipeline menu).
Given its popularity and wide usage, providing LightGBM as an available modeling algorithm within Model Studio increases the breadth of modeling options available to Model Studio users. With the power of Model Studio, LightGBM users will appreciate the ease with which assessment and model interpretability reports can be generated, the ease with which models can be compared, and the ease with which models can be registered and published for deployment into production.
Appendix
Below are descriptions of the LightGBM specific properties in the Gradient Boosting node, with corresponding open-source parameters in parentheses.
Basic Options
Tree-splitting Options
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.