07-28-2015 04:56 PM
I have two data sources (data sets). One is a training dataset in which i have developed a decision tree model. The other one is a out-of-time validation sample. I am trying to calculate AUC based on predicted probabilities drawn from model on training sample. In SAS enterprise miner, there is an option to define % of data for training, validation and test. But i have these samples in two separate datasets. How can i calculate AUC on out-of-time sample? And where predicted probability column is saved against each rows?
07-28-2015 06:02 PM
To make things easier let's call your "out-of-time validation sample" a test partition.
The way to incorporate a test partition in Enterprise Miner, is to set the role of a data source as Test. You can specify the role of a data source during step 7 of the Create A Data Source wizard. Another alternative, once you drag-and-drop your data into a diagram, specify the train property "role" as Test.
You can connect your Test partition to your flow diagram. I tend to connect my test data after a partition node, but you can connect it directly to a Model Comparison node. The Model Comparison will calculate several fit statistics for all your partitions, including the area under the ROC curve (note that it is called roc index in this node). The exported set for your test partition will also have the predicted probability as a new column with the prefix p_<target>.
It is a good practice to include a Data Partition node so that you don't end up with an overtrained model. Since you already have a test partition, you can specify 70 for training and 30 for validation in your data partition node.
Your flow would look like this:
07-29-2015 12:54 AM
Thanks a bunch for your detailed answer. Last question - Where output file is saved? The file in which predicted probability (P_1) against each row is stored. Does it calculate automatically? I didn't find any option to look at probability column.
07-29-2015 09:33 AM
Look at this thread, it shows you where to click to see the exported data sets. Or you can also use the Save Data node.