BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi,
I'm new to Enterprise Miner. Would anyone be able to point me to anywhere where I can find out more about what the different role options mean (raw, train, test, score, validate, transaction) in the Data Source Attributes step of the adding a new data source.

The internet just seems to be full of examples telling you just to leave it as 'raw'.

Any help greatly appreciated,
Thanks,
Cat
2 REPLIES 2
WayneThompson
SAS Employee
Hi Cat,

Typically when you import a data source that you will use to build a predictive/classification model you set the role to RAW. You then use a successor Data Partion node to create these data sources:

Train is used for preliminary model fitting. The analyst attempts to find the best model weights using this data set.

Validation is used to assess the adequacy of the model in the Model Comparison node. The validation data set is also used to prevent overfitting by some modeling nodes.

Test - is used to obtain a final, unbiased estimate of the generalization error of the model. A true hold out data source.


Very common predictive modeling flow

IDS node (RAW) to Data Partition to Explore/Modificaiton/Modeling nodes

If you are already created train, validation and/or test data sets outside of EM you can define them separately and then feed them to successor nodes (tools). Tip use a Control Pt node to manage better the connnections.


A score data set is a new table that typically does not containt the target(response) that you want to apply the score code from model(s) that you developed in EM and likely evaluated using the model comparision node.

Market Baskets requires transactional data so there is a role for this. The help should define these roles in more detail.
deleted_user
Not applicable
Thanks Wayne!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 771 views
  • 0 likes
  • 2 in conversation