Hi!
I have trained a model in model studio using a gradient boosting method with a 5-fold cross validation. However, on the results page, on the table that gives a summary of the amount of data used for training/validation, I cannot see the connection between this and my chosen 5-fold.
Here, I chose validation method
And when I look at the results table for the same node, it says that it is divided into approximately 60% and 30% for the training and validation set.
I was wondering what this means? Does the 5-fold cross validation not apply for some reason, or does this mean something else?
Thank you in advance!
Hello @yiyhio ,
This is cross-validation for assessing / selecting the model(s), not for constructing the model(s).
This is what the documentation says:
===========
For small to medium data tables, cross validation provides, on average, a better representation of error across the whole data table. Partition is the default value.
===========
To use k-fold cross-validation for constructing the model, see here:
Cross Validation of a Forest Model
https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/casactml/casactml_mltools_example01.htm
The above example uses the crossValidateML action (in PROC CAS).
The crossValidateML Action doc:
https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/casactml/casactml_mltools_details02.htm
Kind regards,
Koen
See also my previous response!!
For questions like this, it's better to post in the board :
Analytics > SAS Data Mining and Machine Learning.
More (many more!) of the people in your target audience will read your question (topic).
Kind regards,
Koen
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Get started using SAS Studio to write, run and debug your SAS programs.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.