Score node is not saying how my test data set performed. (Classificat...

Mike90 · Posted 10-07-2017 08:30 PM

There are categorical and interval inputs, and a binary target. The target outcomes are split 95 / 5.

I want to know how many predictions were made for the rare event, and how many of those were correct.

The score report node does not report this information.

The model comparison node has a Classification Chart that has this information for the training and validation sets, but not for the test set.

Data Partition = 40/30/30

Data Source -> Data Partition \ -> Decision Tree -> Model Comparison

-> Decision Tree /

Data Source -> Data Partition \ -> Decision Tree -> Score Report

-------------------------------------------------------------------

Here is the only relevant thing in the score report, it doesn't involve P_... variables,

and it doesn't tell me how many predictions were made for outcome 0, or how many of those were correct.

Data Role=TEST Output Type=C

                 Numeric    Formatted    Frequency
Variable        Value       Value        Count      Percent

D_LIVEOUTCOME       .           0           3245      21.7246
D_LIVEOUTCOME       .           1          11692      78.2754

Data Role=TEST Output Type=CLASSIFICATION

                 Numeric    Formatted    Frequency
Variable        Value       Value        Count      Percent

I_LiveOutcome       .           0            257       1.7206
I_LiveOutcome       .           1          14680      98.2794

-------------------------------------------------------------------------------

Here is some output from the model comparison. The True/False Negative/Positive section for the

test data is MISSING, but it reports those results for the train and validation sets.

Data Role=Test

Statistics                                                           Tree2        Tree

Test: Kolmogorov-Smirnov Statistic                                   0.42        0.50
Test: Average Profit for LiveOutcome                                  1.43        1.50
Test: Average Squared Error                                           0.04        0.04

...

Event Classification Table
Model Selection based on Test: Misclassification Rate (_TMISC_)

Model                   Data                   Target      False    True     False    True
Node Model Description Role       Target       Label    Negative Negative Positive Positive

Tree Decision Tree     TRAIN    LiveOutcome LiveOutcome    130      397      955     25403
Tree Decision Tree     VALIDATE LiveOutcome LiveOutcome    122      266      749     19028
Tree2 Decision Tree (2) TRAIN    LiveOutcome LiveOutcome    126      335     1017     25407
Tree2 Decision Tree (2) VALIDATE LiveOutcome LiveOutcome    110      249      766     19040

RandyCollica · Posted 10-11-2017 08:19 AM

Hi, A simple way to test your data set would be to score it with the score node as the test set isn't captured or used until the model comparison node. It is your best estimator for a hold-out sample.

Regards,

Randy

"All truths are easy to understand once they are discovered; the point is to discover them." G.G.

Score node is not saying how my test data set performed. (Classification table, etc.)

Re: Score node is not saying how my test data set performed. (Classification table, etc.)

Registration is open