BookmarkSubscribeRSS Feed
nataliegerhart0
Calcite | Level 5

I am using a proc hpsplit to create a decision tree. The resulting confusion matrix is below. The misclassification rate for the test data seems wrong (although it is right for training and validation). This happens on other data sets I have tried too. What could be causing this?

Screenshot 2021-11-30 152511.png

2 REPLIES 2
sbxkoenk
SAS Super FREQ

Hello,

 

I think it has to do with missing values. Could that be possible?
Do the cell counts for 'Test' add up to the total number of observations in your 'Test' - partition? Probably not.
What if you use the total number of observations in your 'Test' - partition as the denominator. Do you get 0.2808 then?

 

In your code, are you using assignmissing=similar?

 

If assignmissing=none is used instead, then for the Test partition, the sum of the cells in the Confusion Matrix table does match the Number of Test Observations Used, I believe.  

 

Here are some workarounds so that you can move forward with your analysis.

  • Impute the missing values with a procedure (PROC STDIZE, PROC MI, PROC FASTCLUS, and so on), or by some value(s) that make sense based on your subject knowledge.
  • Use assignmissing=none on the PROC statement.
  • ( Remove observations that have missing values. ) Maybe not a viable option.
  • ( Remove variables that have missing values. ) Maybe not a viable option.

 

Good luck,

Koen

nataliegerhart0
Calcite | Level 5

Thank you for your help! SAS Support acknowledged there was a bug in the output for the same reason you identified. They offered the same temporary solutions. Thank you for your time with this!

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1250 views
  • 1 like
  • 2 in conversation