BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SlutskyFan
Obsidian | Level 7
I have 2 questions:

1) Cross validated Decision trees: Under the panel for cross validation, if you select 'yes' and for number of subsets ='10' and number of repeats ='10' , are these results equivalent to 10-fold cross validation?

Cross validated regression: When you choose 'cross validation misclassification' as your selection criteria for the logistic regression node, it seems that this is similar to an n-fold cross validation where n = the total # of observations in your data set? Is that correct?

2) With cross validation techniques, do you still partition your data into training and validation subsets? I'm thinking, based on sas help documentation, since it is primarily used when small data sets are not large enough for partitioning, you wouldn't generally use a cross validation technique with partitioned data.
1 ACCEPTED SOLUTION

Accepted Solutions
PadraicGNeville
SAS Employee
In SAS decision trees, ’10 repeats’ means 10-fold cross-validation 10 times, for a total of 101 trees, including the original tree.

'Leave-one-out' cross-validation has been available in the EM Regression Node. In leave-one-out CV, n = the total # of observations in your data set.

Re: Using CV, do you still partition your data into training and validation subsets?
Not for a single EM modelling node. However, partitioning into data-available-for-CV vs test-hold-out is still useful, and if comparing models from several EM modeling nodes, using a single validation data set for the comparison may be useful. It's up to the analyst.

Re: primarily used when small data sets are not large enough for partitioning
That is my belief. Partitioning applies hold-out data directly to the model being deployed, providing a transparently unbiased estimate of accuracy. CV validates the model construction process. People disagree as to whether leave-one-out cross-validation provides unbiased or overrly optimistic estimates of prediction.

However, many people prefer to CV anything, regardless of size.

View solution in original post

2 REPLIES 2
SatishG
Calcite | Level 5
I'm not sure of the Cross validation in regression. I agree with the Decision Tree method and your second point.
PadraicGNeville
SAS Employee
In SAS decision trees, ’10 repeats’ means 10-fold cross-validation 10 times, for a total of 101 trees, including the original tree.

'Leave-one-out' cross-validation has been available in the EM Regression Node. In leave-one-out CV, n = the total # of observations in your data set.

Re: Using CV, do you still partition your data into training and validation subsets?
Not for a single EM modelling node. However, partitioning into data-available-for-CV vs test-hold-out is still useful, and if comparing models from several EM modeling nodes, using a single validation data set for the comparison may be useful. It's up to the analyst.

Re: primarily used when small data sets are not large enough for partitioning
That is my belief. Partitioning applies hold-out data directly to the model being deployed, providing a transparently unbiased estimate of accuracy. CV validates the model construction process. People disagree as to whether leave-one-out cross-validation provides unbiased or overrly optimistic estimates of prediction.

However, many people prefer to CV anything, regardless of size.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 3611 views
  • 1 like
  • 3 in conversation