The SAS website ( at https://go.documentation.sas.com/?cdcId=vdmmlcdc&cdcVersion=8.3&docsetId=vdmmlref&docsetTarget=n0gn2... ) has plenty of documentation on building a model using python or R and mentions repeatedly that you can move the open source node to the "Supervised learning" category. It pretty much totally lacks any documentation on how to use the node otherwise. Is it not possible to say, preprocess data with python? If I want a python node to simply transform or remove certain rows of my data, how do I accomplish this? The output variables/csv files mentioned for model training specify that the number of input and output rows must be the same. No other avenue of output is mentioned in the documentation.
currently, I have some test data in a test pipeline. I have a single line of code that changes all the values in the input dataframe to "1". The output of the open source node is just the input data, unchanged. I've logged the input dataframe to the console to verify the change ocurred. the output dataframe of the node is the original, unchanged, input.
How do I transform and output data using the open source node?
In Model Studio, the capability provided by any node (preprocessing or modeling) is backed with underlying score code, which enables the passage of information through the pipeline -- what I mean is, score code is necessary for data transformations in one node to be passed along to the subsequent node. This was done for many reasons – not to create multiple copies of data which can get unmanageable as it grows but also the need to deploy the flow into production when the project is done.
Since similar score code is not possible when working with Python or R, the Open Source Code node cannot support preprocessing data as suggested. The primary goal of the Open Source Code node is to enable users to train and compare open source models in Python or R with other modeling nodes in the pipeline. This functionality is possible even without score code because the burden of providing the actual predictions in dm_scoreddf data frame is placed on the user from which model assessment and thus comparison can be accomplished.
Though there is no easy answer for what you want to do, you can choose to use the SAS Code node for any custom data preprocessing that needs to be done (assuming none of the existing nodes in Model Studio fulfil your needs) or you can choose to do the preprocessing in SAS Data Studio which is built for data manipulation.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.