BookmarkSubscribeRSS Feed
Ujjawal
Quartz | Level 8

I have two data sources (data sets). One is a training dataset in which i have developed a decision tree model.  The other one is a out-of-time validation sample. I am trying to calculate AUC based on predicted probabilities drawn from model on training sample. In SAS enterprise miner, there is an option to define % of data for training, validation and test. But i have these samples in two separate datasets. How can i calculate AUC on out-of-time sample? And where predicted  probability column is saved against each rows?

3 REPLIES 3
M_Maldonado
Barite | Level 11

Hey ujjawal,

To make things easier let's call your "out-of-time validation sample" a test partition.

The way to incorporate a test partition in Enterprise Miner, is to set the role of a data source as Test. You can specify the role of a data source during step 7 of the Create A Data Source wizard. Another alternative, once you drag-and-drop your data into a diagram, specify the train property "role" as Test.

You can connect your Test partition to your flow diagram. I tend to connect my test data after a partition node, but you can connect it directly to a Model Comparison node. The Model Comparison will calculate several fit statistics for all your partitions, including the area under the ROC curve (note that it is called roc index in this node). The exported set for your test partition will also have the predicted probability as a new column with the prefix p_<target>.

It is a good practice to include a Data Partition node so that you don't end up with an overtrained model. Since you already have a test partition, you can specify 70 for training and 30 for validation in your data partition node.

Your flow would look like this:

test set flow diagram.png


Good luck!

-Miguel

Ujjawal
Quartz | Level 8

Thanks a bunch for your detailed answer. Last question - Where output file is saved? The file in which predicted probability (P_1) against each row is stored. Does it calculate automatically? I didn't find any option to look at probability column.

M_Maldonado
Barite | Level 11

Look at this thread, it shows you where to click to see the exported data sets. Or you can also use the Save Data node.

https://communities.sas.com/message/247767

Good luck!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 2037 views
  • 0 likes
  • 2 in conversation