Machine learning pipeline automation has completely revolutionized the way organizations approach data analysis and model development. Through automated pipeline creation, data scientists can rapidly extract valuable insights from diverse datasets. However, the key to unlocking the full potential of automated pipelines lies in the ability to customize, providing users with an enhanced level of control and flexibility.
In this article, we dive into the realm of machine learning pipeline automation, shedding light on how advanced settings equip users with the necessary tools to tailor pipelines according to their unique requirements.
Machine Learning Pipeline Automation (MLPA) Advance Settings
The “Advanced Settings” window offers the means for customizing your automated pipeline creation.
Within the Advanced Settings window, the algorithms option offers users the ability to customize their pipelines by specifying which models should be considered or excluded. This feature provides granular control, allowing data scientists to select the most relevant algorithms based on their specific use cases and data characteristics.
When a model is selected in the “Consider” list, it becomes part of the pool of candidate models that undergo performance comparison within the pipeline. Therefore, while models in the “Consider” list are given a fair chance, the final selection process prioritizes the best-performing models based on a specified selection statistic like misclassification rate or average squared error.
In certain scenarios, users may have specific requirements where a particular model must be included in the final pipeline. For instance, in the banking industry, the interpretability of the "Regression" model makes it a preferred choice. In such cases, the “Force Include” option can be used to include the model in the final pipeline.
Note, when considering a Gradient Boosting model, the final pipeline also takes LightGBM into consideration.
Another essential customization option within the Advanced Settings window is the sampling feature. This powerful tool allows users to define how their data should be sampled during the pipeline creation process. This functionality enables the efficient processing of large datasets, reducing computational time and resource requirements while preserving representative data characteristics.
Users have the choice to either specify a fixed number of training rows or a percentage of the training rows.
Lastly, you can access the same capabilities and much more through our public API. For additional information, please refer to the links provided here and here.
In summary, this article explores the advanced options available in SAS Viya’s Machine learning pipeline automation (MLPA). By utilizing advanced settings, users gain control and flexibility in tailoring pipelines to meet their specific needs.
Additional Resources
Great Article.
Also would like to see an article on the assumptions Model Studio make when reading in the data. For example, if a variable's name contains "target" the role of the variable is assigned as "Target", if a variable's name begins with "r_" the role of the variable is assigned as "Residual", ...
Thanks.
@tom_grant That is a good topic to share. Thank you for suggesting.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.