BookmarkSubscribeRSS Feed
JKarp_11
Calcite | Level 5

Dear community,

I need to better understand what the property „Perform Cross Validation“ in the section „Cross Validation“ for a decision tree does in general.

For me the purpose of cross validation (CV) is not to help select a particular tree (as the final model) but rather to qualify a model (which is created by 100% of the training sample before the CV), i.e. to provide metrics such as the average MSE (average of all “sub-trees” generated by the CV) which can be useful in asserting the level of precision one can expect from the application.

Now I have run two trees separately, one with “Perform Cross Validation”=yes and one without. The trees are different, i.e. the tree with CV=yes has less leaves. According to this outcome I assume that the enterprise miner uses a specific tree created by the CV as the final model (probably the one with the smallest MSE). I.e. a tree which is trained by 100-X% instead of 100% of the initial training sample.

Or does the results of the cross validation (average MSE) are used for pruning the original tree? However in this case pruning would be executed after CV…In my case I have selected the pruning property method “assessment” in section subtree.

I already thank you for your precious assistance! As it is a general question I hope this can be answered without data, codes.

Best regards

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 0 replies
  • 1228 views
  • 0 likes
  • 1 in conversation