Suppose you first partitioned your data into training and validation sets by using the Data Partition node, and then connected the Data Partition node to the Regression node available in the Model tab. If you run the Regression node without changing the default selection method (which is “None”), the validation set won’t be used at all. However, if you change the model selection method from None to any other selection method (such as step-wise) and also choose Validation Error as the Selection Criterion, then at each step of the selection process the model error will be calculated based on the validation data, and the step where the validation error is the smallest will be selected as your final model. Instead of validation error if, you pick cross validation as the Selection Criterion, then at each selection step the cross validation error will be calculated by using the training part of the data. Thus, the validation data set again won’t be used in the model selection process. Therefore if you choose to use cross validation, you do not need to set aside an additional validation set. I hope it is clearer now.
... View more