I'm new to Enterprise Miner. Would anyone be able to point me to anywhere where I can find out more about what the different role options mean (raw, train, test, score, validate, transaction) in the Data Source Attributes step of the adding a new data source.
The internet just seems to be full of examples telling you just to leave it as 'raw'.
Typically when you import a data source that you will use to build a predictive/classification model you set the role to RAW. You then use a successor Data Partion node to create these data sources:
Train is used for preliminary model fitting. The analyst attempts to find the best model weights using this data set.
Validation is used to assess the adequacy of the model in the Model Comparison node. The validation data set is also used to prevent overfitting by some modeling nodes.
Test - is used to obtain a final, unbiased estimate of the generalization error of the model. A true hold out data source.
Very common predictive modeling flow
IDS node (RAW) to Data Partition to Explore/Modificaiton/Modeling nodes
If you are already created train, validation and/or test data sets outside of EM you can define them separately and then feed them to successor nodes (tools). Tip use a Control Pt node to manage better the connnections.
A score data set is a new table that typically does not containt the target(response) that you want to apply the score code from model(s) that you developed in EM and likely evaluated using the model comparision node.
Market Baskets requires transactional data so there is a role for this. The help should define these roles in more detail.