03-09-2015 08:31 AM
I am working on segmentation of a Data (around 2500 obs) for designing a Scorecard model. I have created all variables that are required for segmenting.
When I start with interactive decision tree, I dont get all the variables in the Split Node Window. Can someone guide me on how to tackle this problem?
03-09-2015 09:46 AM
This depend on the Subtree Method you chosed in the decision tree options. The default method (Assessment) give you the smallest subtree with best value. Check the other options Largest and N. And the final decision tree could be affected by the other options too.
03-10-2015 02:44 AM
I think changing Subtree Method will have any effect if one is automatically training a decision tree, but in this case I want interactive Decision Tree.
Anyways, I changed the Subtree Method to Largest and N but still I couldnt see all the variables in Split Node Window.
Is there any upper limit of variables in input data set (I have around 3300 variables)??
03-10-2015 08:32 AM
Did you tried to open the interactive mode just after connecting the node to the data set without running the decision node. But you should run the data set first.
Also i find sometimes after working with the interactive mode i need to prune the tree and reopen it to find all the variables again, because the refresh option in split node will not work.
03-10-2015 09:14 AM
Yes I did not run the decision node and opened the interactive decision tree window but still some of the variables are missing in Split Node window.
Yes sometimes only some of the variables (default value is 5) are shown as it depends on the number specified in Number of Rules in Node properties of the decision tree node.
03-10-2015 05:41 PM
Can you try to run the decision tree with the same data set but include some dimension (for example only 50) not all the 3300. Just to make sure it is because the dimension not something else in your data.
03-11-2015 06:56 AM
I tried with lesser no of variables (only 100) and surprisingly every variable was coming in Split Node Window. But I want to try segmentation with all the variables.
Is there any way around for that?
03-11-2015 08:18 AM
If the other dimension data is similar to the 100 variables you choosed previously, then i think it is a the dimension. Hope WendyCzika to return to you by a limit considerations. Although, i hope you keep increase it and tell us the limit you found.
And it is always valid advice to tell you to try a dimension reduction technique EM provide you with.
03-11-2015 10:59 AM
Yes there seems to be some sort of limit around 2000 variables in the UI. I tried changing various properties - Significance Level (to 1), Leaf Size (to 1), and Exhaustive (to a very large number), turning off all p-value adjustments... but it still only shows around 2000 inputs for me as well in the Split Node dialog.