There are categorical and interval inputs, and a binary target. The target outcomes are split 95 / 5.
I want to know how many predictions were made for the rare event, and how many of those were correct.
The score report node does not report this information.
The model comparison node has a Classification Chart that has this information for the training and validation sets, but not for the test set.
Data Partition = 40/30/30
Data Source -> Data Partition \ -> Decision Tree -> Model Comparison
-> Decision Tree /
Data Source -> Data Partition \ -> Decision Tree -> Score Report
-------------------------------------------------------------------
Here is the only relevant thing in the score report, it doesn't involve P_... variables,
and it doesn't tell me how many predictions were made for outcome 0, or how many of those were correct.
Data Role=TEST Output Type=C
Numeric Formatted Frequency
Variable Value Value Count Percent
D_LIVEOUTCOME . 0 3245 21.7246
D_LIVEOUTCOME . 1 11692 78.2754
Data Role=TEST Output Type=CLASSIFICATION
Numeric Formatted Frequency
Variable Value Value Count Percent
I_LiveOutcome . 0 257 1.7206
I_LiveOutcome . 1 14680 98.2794
-------------------------------------------------------------------------------
Here is some output from the model comparison. The True/False Negative/Positive section for the
test data is MISSING, but it reports those results for the train and validation sets.
Data Role=Test
Statistics Tree2 Tree
Test: Kolmogorov-Smirnov Statistic 0.42 0.50
Test: Average Profit for LiveOutcome 1.43 1.50
Test: Average Squared Error 0.04 0.04
...
...
Event Classification Table
Model Selection based on Test: Misclassification Rate (_TMISC_)
Model Data Target False True False True
Node Model Description Role Target Label Negative Negative Positive Positive
Tree Decision Tree TRAIN LiveOutcome LiveOutcome 130 397 955 25403
Tree Decision Tree VALIDATE LiveOutcome LiveOutcome 122 266 749 19028
Tree2 Decision Tree (2) TRAIN LiveOutcome LiveOutcome 126 335 1017 25407
Tree2 Decision Tree (2) VALIDATE LiveOutcome LiveOutcome 110 249 766 19040
Hi, A simple way to test your data set would be to score it with the score node as the test set isn't captured or used until the model comparison node. It is your best estimator for a hold-out sample.
Regards,
Randy
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.