Misclassification rate on proc hpsplit

nataliegerhart0 · Posted 11-30-2021 04:27 PM

I am using a proc hpsplit to create a decision tree. The resulting confusion matrix is below. The misclassification rate for the test data seems wrong (although it is right for training and validation). This happens on other data sets I have tried too. What could be causing this?

sbxkoenk · Posted 12-21-2021 10:47 AM

Hello,

I think it has to do with missing values. Could that be possible?
Do the cell counts for 'Test' add up to the total number of observations in your 'Test' - partition? Probably not.
What if you use the total number of observations in your 'Test' - partition as the denominator. Do you get 0.2808 then?

In your code, are you using assignmissing=similar?

If assignmissing=none is used instead, then for the Test partition, the sum of the cells in the Confusion Matrix table does match the Number of Test Observations Used, I believe.

Here are some workarounds so that you can move forward with your analysis.

Impute the missing values with a procedure (PROC STDIZE, PROC MI, PROC FASTCLUS, and so on), or by some value(s) that make sense based on your subject knowledge.
Use assignmissing=none on the PROC statement.
( Remove observations that have missing values. ) Maybe not a viable option.
( Remove variables that have missing values. ) Maybe not a viable option.

Good luck,

Koen

nataliegerhart0 · Posted 12-21-2021 01:06 PM

Thank you for your help! SAS Support acknowledged there was a bug in the output for the same reason you identified. They offered the same temporary solutions. Thank you for your time with this!

Misclassification rate on proc hpsplit

Re: Misclassification rate on proc hpsplit

Re: Misclassification rate on proc hpsplit

Misclassification rate on proc hpsplit

Re: Misclassification rate on proc hpsplit

Re: Misclassification rate on proc hpsplit

Registration is open