I am using a proc hpsplit to create a decision tree. The resulting confusion matrix is below. The misclassification rate for the test data seems wrong (although it is right for training and validation). This happens on other data sets I have tried too. What could be causing this?
Hello,
I think it has to do with missing values. Could that be possible?
Do the cell counts for 'Test' add up to the total number of observations in your 'Test' - partition? Probably not.
What if you use the total number of observations in your 'Test' - partition as the denominator. Do you get 0.2808 then?
In your code, are you using assignmissing=similar?
If assignmissing=none is used instead, then for the Test partition, the sum of the cells in the Confusion Matrix table does match the Number of Test Observations Used, I believe.
Here are some workarounds so that you can move forward with your analysis.
Good luck,
Koen
Thank you for your help! SAS Support acknowledged there was a bug in the output for the same reason you identified. They offered the same temporary solutions. Thank you for your time with this!
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.