Hi Funda_SAS I should have clarified my question; what I am trying to understand is how within a specific modelling node (e.g. a regression node) does the modelling node utilise the training and validation data to arrive at the chosen model? Given the case where I partition my data 60/40 training/validation and no test data, when I pass the data to a modelling node (regression, decision tree, neural network etc), I am guessing EM will use the combination of training and validation data to iteratively select the best model (i.e. training the model on the training data and using the validation data to generalise the model and avoid overfitting - you mention hyperparameter tuning in your reply). This is before the result is sent to a model comparison node to select the best from a range of models. So my question really is about what goes on with the training and validation data sets within an individual modelling node and how does this differ from other techniques such as cross validation?
... View more