Hello,
I am doing an analysis of imbalanced data for classification (2% to 98%) and evaluating different sampling techniques and classification algorithms. I use one dataset as a training/validation data and would like to use a separate dataset as a test / scoring data and asses / visualize the performance of a model on those data. However, when I use model comparison node after a score node, I still only see the model performance on training and validation dataset, and don't see any results for the separate dataset. I am attaching a screenshot of my workflow.
In order to see the statistics that are generated for a test data set, you need to make sure that your separate data set has been read in before the modeling nodes and has a Role of Test. If you set the Role property in the data source to Test, then it will act like a Test data set and any of the fit statistics generated in the training and validation data sets will also be generated for the separate test data set.
There is a similar story for the Score data set. You need to make sure that the Role property for that data set has been set to Score. You will not get many fit statistics for it; however, you will get some summary statistics around the predicted probabilities or predicted values for the target variables.
In order to see the statistics that are generated for a test data set, you need to make sure that your separate data set has been read in before the modeling nodes and has a Role of Test. If you set the Role property in the data source to Test, then it will act like a Test data set and any of the fit statistics generated in the training and validation data sets will also be generated for the separate test data set.
There is a similar story for the Score data set. You need to make sure that the Role property for that data set has been set to Score. You will not get many fit statistics for it; however, you will get some summary statistics around the predicted probabilities or predicted values for the target variables.
Thank you very much! 🙂
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.