BookmarkSubscribeRSS Feed
blue34
Calcite | Level 5

Hi All,

 

I have been working on one Decision Tree  on SAS EM. I created Validation / Train dataset for hold-out validation and run decision tree model. I will have two questions, appreciate if you could help.

 

1. How can I make sure train and validation data look reasonable besides checking GINI, ROC? Any other thoughts I have to look at?

 

2. Do you think, does it make sense to do out-of-time validation in Decision Tree Model ? i know we should do this type of validation in credit scoring, but not sure if it is worth or appropriate for Decision Tree Models.

 

All feedback welcome, appreciate it! Please feel free to add if you have any other thoughts regarding DT development and validation.  

1 REPLY 1
DougWielenga
SAS Employee

I have been working on one Decision Tree  on SAS EM. I created Validation / Train dataset for hold-out validation and run decision tree model. I will have two questions, appreciate if you could help.

 

1. How can I make sure train and validation data look reasonable besides checking GINI, ROC? Any other thoughts I have to look at?

 

2. Do you think, does it make sense to do out-of-time validation in Decision Tree Model ? i know we should do this type of validation in credit scoring, but not sure if it is worth or appropriate for Decision Tree Models.


I'm not sure what you mean that train and validation data look reasonable.   In most modeling scenarios, both the TRAIN and VALIDATION data sets are representative of the same population.   Candidate models are typically built on the TRAIN data and then compared on the VALIDATION data.   There is no clear answer as to when your results appear to be reasonable, but if the model performs far better on one of these data sets than on the other, you should consider checking some things such as  

     * were important stratification variables taken into account when the observations were divided between TRAIN and VALIDATE?

     * are there sufficient observations for each target level outcome (if building a classification tree) and/or for both the TRAIN and VALIDATE data sets?  (too few observations can cause wildly varying results)

 

Note:  These questions are not limited to decision tree models but should be considered anytime partitioning is done.    

 

Regarding out-of-time validation, you are blending together two different ideas -- the performance of the initial model (which will likely be better than when the model is deployed) and the performance on future data.   Using future data too validate might yield a model that performs better in more recent data but it also requires you to either build the model using older data (because your training data comes from further in the past) or forces you to wait to decide which model is best.   Neither is typically desirable, particularly in situations where the relationships in the data might be constantly changing.   In the end, you can evaluate using different approaches and will likely identify best practices around the business problem you are answering if you compare the performance of these strategies.   In general, they both have good things and bad things about them, but it is more common to use training and validation data drawn from the same time period in my own experience, but there are certainly others who might prefer the out-of-time approach. 

 

Either way, SAS Enterprise Miner allows you to use either with approach with ease. 

 

Hope this helps!

Doug

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1114 views
  • 0 likes
  • 2 in conversation