BookmarkSubscribeRSS Feed
Mike90
Quartz | Level 8

There are categorical and interval inputs, and a binary target.  The target outcomes are split 95 / 5.

I want to know how many predictions were made for the rare event, and how many of those were correct.

 

The score report node does not report this information.

 

The model comparison node has a Classification Chart that has this information for the training and validation sets, but not for the test set.

 

Data Partition = 40/30/30

 

Data Source -> Data Partition \ -> Decision Tree  ->  Model Comparison

                                                -> Decision Tree  /

 

Data Source -> Data Partition \ -> Decision Tree  ->  Score Report

 

-------------------------------------------------------------------

Here is the only relevant  thing in the score report, it doesn't involve P_... variables,

and it doesn't tell me how many predictions were made for outcome 0, or how many of those were correct.

 

Data Role=TEST Output Type=C
 
                 Numeric    Formatted    Frequency
  Variable        Value       Value        Count      Percent
 
D_LIVEOUTCOME       .           0           3245      21.7246
D_LIVEOUTCOME       .           1          11692      78.2754
 
 
Data Role=TEST Output Type=CLASSIFICATION
 
                 Numeric    Formatted    Frequency
  Variable        Value       Value        Count      Percent
 
I_LiveOutcome       .           0            257       1.7206
I_LiveOutcome       .           1          14680      98.2794
 

-------------------------------------------------------------------------------

Here is some output from the model comparison. The True/False Negative/Positive section for the

test data is MISSING, but it reports those results for the train and validation sets.

 
Data Role=Test
 
Statistics                                                           Tree2        Tree
 
Test:  Kolmogorov-Smirnov Statistic                                   0.42        0.50
Test: Average Profit for LiveOutcome                                  1.43        1.50
Test: Average Squared Error                                           0.04        0.04

...

...

 

Event Classification Table
Model Selection based on Test: Misclassification Rate (_TMISC_)
 
Model                   Data                   Target      False    True     False    True
Node  Model Description Role       Target       Label    Negative Negative Positive Positive
 
Tree  Decision Tree     TRAIN    LiveOutcome LiveOutcome    130      397      955     25403
Tree  Decision Tree     VALIDATE LiveOutcome LiveOutcome    122      266      749     19028
Tree2 Decision Tree (2) TRAIN    LiveOutcome LiveOutcome    126      335     1017     25407
Tree2 Decision Tree (2) VALIDATE LiveOutcome LiveOutcome    110      249      766     19040

1 REPLY 1
RandyCollica
SAS Employee

Hi,  A simple way to test your data set would be to score it with the score node as the test set isn't captured or used until the model comparison node.  It is your best estimator for a hold-out sample. 

 

Regards,

Randy

"All truths are easy to understand once they are discovered; the point is to discover them." G.G.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1875 views
  • 0 likes
  • 2 in conversation