BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi,
I'm new to Enterprise Miner. Would anyone be able to point me to anywhere where I can find out more about what the different role options mean (raw, train, test, score, validate, transaction) in the Data Source Attributes step of the adding a new data source.

The internet just seems to be full of examples telling you just to leave it as 'raw'.

Any help greatly appreciated,
Thanks,
Cat
2 REPLIES 2
WayneThompson
SAS Employee
Hi Cat,

Typically when you import a data source that you will use to build a predictive/classification model you set the role to RAW. You then use a successor Data Partion node to create these data sources:

Train is used for preliminary model fitting. The analyst attempts to find the best model weights using this data set.

Validation is used to assess the adequacy of the model in the Model Comparison node. The validation data set is also used to prevent overfitting by some modeling nodes.

Test - is used to obtain a final, unbiased estimate of the generalization error of the model. A true hold out data source.


Very common predictive modeling flow

IDS node (RAW) to Data Partition to Explore/Modificaiton/Modeling nodes

If you are already created train, validation and/or test data sets outside of EM you can define them separately and then feed them to successor nodes (tools). Tip use a Control Pt node to manage better the connnections.


A score data set is a new table that typically does not containt the target(response) that you want to apply the score code from model(s) that you developed in EM and likely evaluated using the model comparision node.

Market Baskets requires transactional data so there is a role for this. The help should define these roles in more detail.
deleted_user
Not applicable
Thanks Wayne!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 822 views
  • 0 likes
  • 2 in conversation