Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Decision Tree Model uplift

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 12
Accepted Solution

Decision Tree Model uplift

[ Edited ]

Dear Community Members,

 

If i have a set of variables 1K+ (raw variables + transformed ), will it help to perform Data impute and variable selection to come up with the Model uplift

I have a binary prediction Model to predict the Event occurnace (1/0). I use multiple Decesion Trees to compare which is the best fit .

I change the tree depth and the split criteria for each tree .

Will it also help to convert all my continious variables to bins (100 bin) for a better Model lifetime going forward ( If i have any variables that may increase by time ( life time as an exampel )?

 

Thanks for your help


Accepted Solutions
Solution
‎05-24-2016 08:54 AM
SAS Employee
Posts: 122

Re: Decision Tree Model uplift

Hi, Your title says "uplift". Do you actually mean just regular lift performance? Because uplift often refers treatment over control kind of incremental lift. Lift or uplift has different imputation and variable selection strategy. Assume you mean lift. 1. Your title says decision trees (DT). DT does not require missing value imputation. Also, imputation decision does not have to always go hand-in-hand with that for variable selection, although the two aften are made 'together'. 2. when number of input variables is large, a differentiation between V selection and V-Screening becomes increasingly needed. V-Selection traditionally has implication of "I have determined the best". versus V-screening refers to "roughly cut out the obviously weakest variables". To do the screening, you set the entry level/significance test values to be generous in using the DT node. It is not unusual for one to repeat several rounds of screening. You need to watch how many and what variables are left per each round of screening. Generally speaking, you should have more confidence in throwing out the obviously bad than picking the final elite. 3. Apparently you have settled on using Transformed as starting point of your question. I am not going to question that at this thread. If all possible, consider running screening BEFORE transformation. The earlier stage the screening, the raw the variables the better. Notice though, this is in the context of decision trees. 4. if you are running regressions, the strategy may be very different. 5. If you use EM's DT node to screen or select variables, make sure to use "split-based approach" first. That is, turn on variable importance, but do NOT turn on "observation- based approach =YES". Regarding binning, in the context of decision tree, binning generally makes selecting split cut on continuous variables less freely (and less optimal), But you said you bin them to 100, then you may still be OK in this regard. The general rule in using DT models is: interfere with raw inputs little as possible, no imputation, no transformation, no binning if there is no hard reason to do the contrary. Hope this helps? Best Regards. Jason Xin

View solution in original post


All Replies
Solution
‎05-24-2016 08:54 AM
SAS Employee
Posts: 122

Re: Decision Tree Model uplift

Hi, Your title says "uplift". Do you actually mean just regular lift performance? Because uplift often refers treatment over control kind of incremental lift. Lift or uplift has different imputation and variable selection strategy. Assume you mean lift. 1. Your title says decision trees (DT). DT does not require missing value imputation. Also, imputation decision does not have to always go hand-in-hand with that for variable selection, although the two aften are made 'together'. 2. when number of input variables is large, a differentiation between V selection and V-Screening becomes increasingly needed. V-Selection traditionally has implication of "I have determined the best". versus V-screening refers to "roughly cut out the obviously weakest variables". To do the screening, you set the entry level/significance test values to be generous in using the DT node. It is not unusual for one to repeat several rounds of screening. You need to watch how many and what variables are left per each round of screening. Generally speaking, you should have more confidence in throwing out the obviously bad than picking the final elite. 3. Apparently you have settled on using Transformed as starting point of your question. I am not going to question that at this thread. If all possible, consider running screening BEFORE transformation. The earlier stage the screening, the raw the variables the better. Notice though, this is in the context of decision trees. 4. if you are running regressions, the strategy may be very different. 5. If you use EM's DT node to screen or select variables, make sure to use "split-based approach" first. That is, turn on variable importance, but do NOT turn on "observation- based approach =YES". Regarding binning, in the context of decision tree, binning generally makes selecting split cut on continuous variables less freely (and less optimal), But you said you bin them to 100, then you may still be OK in this regard. The general rule in using DT models is: interfere with raw inputs little as possible, no imputation, no transformation, no binning if there is no hard reason to do the contrary. Hope this helps? Best Regards. Jason Xin
Occasional Contributor
Posts: 12

Re: Decision Tree Model uplift

Thanks Jason for the generous information, you have covered all my doubts and enquires 

Have a nice day!

Mohammed ElSofany

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 351 views
  • 0 likes
  • 2 in conversation