Hello,
I am creating a decision tree model in SAS EMINER for which I have 2 different datasets, training and validate.
Scenario 1: I use only train dataset (I set the role of the training dataset to 'TRAIN') it gives a specific result
Scenario 2: I include the validation dataset (where I set the role of the validation dataset to 'VALIDATE') and I use the same exact training dataset from Scenario 1, but I get different model results from Scenario 1.
Can anyone kindly explain the reason behind this?
I am assuming SAS Eminer tries to give the best possible model from the dataset I use as an input. I think if I only give EMINER a training dataset (Scenario 1), it would give me results according to that one dataset only, BUT when I give them 2 separate datasets (TRAINING and VALIDATION, in Scenario 2) then EMINER would try to fit the best possible model where the model accuracy is consistent in both the datasets, which is why I could be getting different model results in both the Scenarios.
I wanted to confirm this or if there is any other reason behind this. Any input would be appreciated.
Thank you!
Look in the "Subtree" section of the Decision Tree properties. You more than likely have the option for Method set to "Assessment". When you click on Method it says: "...ASSESSMENT (the smallest subtree with the best assessment value is chose)"
Two lines down, are the options for Assessment. It says, "Use the Assessment Measure property to specify the method that you want to use to select the best tree, based on the validation data."
I think that answers your question. Let me know if you need more than that or if you have any questions!
Look in the "Subtree" section of the Decision Tree properties. You more than likely have the option for Method set to "Assessment". When you click on Method it says: "...ASSESSMENT (the smallest subtree with the best assessment value is chose)"
Two lines down, are the options for Assessment. It says, "Use the Assessment Measure property to specify the method that you want to use to select the best tree, based on the validation data."
I think that answers your question. Let me know if you need more than that or if you have any questions!
Thank you so much. I really appreciate it!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.