Solved: Different Datasets for Training/Testing and Validation

FK · Posted 06-03-2016 05:04 AM

Hello Everbody,

I'm trying to use two different datasets for a model, i.e. training/testing and validation. Please see the picture below:

As you can see, I partitioned my Raw dataset (after having assigned variable roles target , input, etc.) into 70% training and 30% testing. Also, I have a second dataset called "Validation" which I assigned the role of "Validate".

Regarding the model (here: a decision tree) I now want Enterprise Miner (version 12.1) to use the partitioned "training" dataset to set up a model and use the "test" partition to test it. AFTERWARDS I WANT THE GENERATED MODEL TO BE VALIDATED ON THE SECOND DATASET ("Validation"). There, however, I only have left the the target variable, an ID variable and another variable I assigned the role of "Rejected":

When I run this model I get the following error:

What am I doing wrong? Do I first have to use a "Score" node after the decission tree node?

Any suggestion would be appreciated.

Thank you,

Felix

JasonXin · Posted 06-05-2016 04:23 PM

Hi,
First, you don't really need two nodes as indicated in your post. You can just drag the validation data set and go to the panel to the left and change it to Validate. Second, yes you need to engage Score node because the nature of your goal is to assess. So 1. delete the Assign Role node. 2. Change data set to Validate. 3. connect both the validation data set AND the DT node to a Score node. Then connect the Score to a Model comparison node. Jason Xin

View solution in original post

JasonXin · Posted 06-05-2016 04:23 PM

Hi,
First, you don't really need two nodes as indicated in your post. You can just drag the validation data set and go to the panel to the left and change it to Validate. Second, yes you need to engage Score node because the nature of your goal is to assess. So 1. delete the Assign Role node. 2. Change data set to Validate. 3. connect both the validation data set AND the DT node to a Score node. Then connect the Score to a Model comparison node. Jason Xin

Different Datasets for Training/Testing and Validation

Re: Different Datasets for Training/Testing and Validation

Re: Different Datasets for Training/Testing and Validation

Different Datasets for Training/Testing and Validation

Re: Different Datasets for Training/Testing and Validation

Re: Different Datasets for Training/Testing and Validation

Registration is open