Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Different Datasets for Training/Testing and Validation

Accepted Solution Solved
Reply
Occasional Contributor FK
Occasional Contributor
Posts: 5
Accepted Solution

Different Datasets for Training/Testing and Validation

Hello Everbody,

I'm trying to use two different datasets for a model, i.e. training/testing and validation. Please see the picture below:

 

Test_Validate_problem.JPG

As you can see, I partitioned my Raw dataset (after having assigned variable roles target , input, etc.) into 70% training and 30% testing. Also, I have a second dataset called "Validation" which I assigned the role of "Validate". 

 

Regarding the model (here: a decision tree) I now want Enterprise Miner (version 12.1) to use the partitioned "training" dataset to set up a model and use the "test" partition to test it. AFTERWARDS I WANT THE GENERATED MODEL TO BE VALIDATED ON THE SECOND DATASET ("Validation"). There, however, I only have left the the target variable, an ID variable and another variable I assigned the role of "Rejected":


varsummary_validation.JPG

 

When I run this model I get the following error:
error_message.JPG

 

What am I doing wrong? Do I first have to use a "Score" node after the decission tree node?

Any suggestion would be appreciated.

Thank you,

Felix


Accepted Solutions
Solution
‎06-06-2016 07:39 AM
SAS Employee
Posts: 122

Re: Different Datasets for Training/Testing and Validation

Hi,
First, you don't really need two nodes as indicated in your post. You can just drag the validation data set and go to the panel to the left and change it to Validate. Second, yes you need to engage Score node because the nature of your goal is to assess. So 1. delete the Assign Role node. 2. Change data set to Validate. 3. connect both the validation data set AND the DT node to a Score node. Then connect the Score to a Model comparison node. Jason Xin

View solution in original post


All Replies
Solution
‎06-06-2016 07:39 AM
SAS Employee
Posts: 122

Re: Different Datasets for Training/Testing and Validation

Hi,
First, you don't really need two nodes as indicated in your post. You can just drag the validation data set and go to the panel to the left and change it to Validate. Second, yes you need to engage Score node because the nature of your goal is to assess. So 1. delete the Assign Role node. 2. Change data set to Validate. 3. connect both the validation data set AND the DT node to a Score node. Then connect the Score to a Model comparison node. Jason Xin
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 1 reply
  • 561 views
  • 0 likes
  • 2 in conversation