BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
gabon
Calcite | Level 5

Hello,

 

I am doing an analysis of imbalanced data for classification (2% to 98%) and evaluating different sampling techniques and classification algorithms. I use one dataset as a training/validation data and would like to use a separate dataset as a test / scoring data and asses / visualize the performance of a model on those data. However, when I use model comparison node after a score node, I still only see the model performance on training and validation dataset, and don't see any results for the separate dataset. I am attaching a screenshot of my workflow.

 

Screenshot 2017-02-03 18.36.10.png

 

1 ACCEPTED SOLUTION

Accepted Solutions
CraigDeVault
SAS Employee

In order to see the statistics that are generated for a test data set, you need to make sure that your separate data set has been read in before the modeling nodes and has a Role of Test.  If you set the Role property in the data source to Test, then it will act like a Test data set and any of the fit statistics generated in the training and validation data sets will also be generated for the separate test data set.

 

There is a similar story for the Score data set.  You need to make sure that the Role property for that data set has been set to Score.  You will not get many fit statistics for it; however, you will get some summary statistics around the predicted probabilities or predicted values for the target variables.


test_statistics.JPG

View solution in original post

2 REPLIES 2
CraigDeVault
SAS Employee

In order to see the statistics that are generated for a test data set, you need to make sure that your separate data set has been read in before the modeling nodes and has a Role of Test.  If you set the Role property in the data source to Test, then it will act like a Test data set and any of the fit statistics generated in the training and validation data sets will also be generated for the separate test data set.

 

There is a similar story for the Score data set.  You need to make sure that the Role property for that data set has been set to Score.  You will not get many fit statistics for it; however, you will get some summary statistics around the predicted probabilities or predicted values for the target variables.


test_statistics.JPG

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1453 views
  • 0 likes
  • 2 in conversation