Thank you very much, again. Below is a listing of the varying issues we are up against, but before then I thought I would talk a little about the error: Regarding ULTIMATE_LITIGATION & ULTIMATE_RTW, these were originally set in the data to Drop. But what I did it make sure it was explicitly Rejected as well. This prevented the error from happening. I would think Drop & Rejected would be synonymous, but... Regarding the HP Forest Node itself... Does it use some form of bootstrapping to get the varying results? I am a little worried that my cutoff results may be different after a second time running it. Then again, I have it creating a max of 100 trees. Theoretically it should converge. The Report.pdf file: Am I correct in assuming that the scoring is developed solely based on the Training data? Now ultimately I need to come up with a set of scoring rules to submit to our IT department - ultimately to score a model outside of SAS. Originally I had 1,117 variables. The files says it selected 1,093. That is probably just too much. Honestly I am trying to: (1) Come up with a decent model. (2) Maybe select a subset of the predictors that comes close to converging on the final model. An analogy would be using discriminant analyses to predict segments that I generated on a much fuller set of data. 1,000+ variables is just too many for my IT department to work with. The Selected Variable Importance does not list out the names of all the variables. But I think their details occur in an alphabetized way below. I can use the bottom information, but how does it rank them in terms of their importance? It would be ideal if I could say maybe the top-20 or 30 variables are "good enough" for estimating the overall HP Forest model. I hope that is making sense. Is there enough information contained within here to converge to the solution? Also note that I am going to also experiment in seeing how I may limit my variable splits to 2 rather than the larger number that happens by default. Am I correct in that this is what I set the Max Categories in Split Search property to? I can set it to 2, down from a default of 30, but it says it only applies to Nominal variables. Lastly, I am assuming that the Scorecard Points provides the information that our programmers need to program all of this into our system? What do these specifically mean? Can you provide perhaps an example of how this works? How did the reporter node come up with this single tree? I apologize for all of the questions - but I guess I am back to being a little scare of the utility of the HP Forests. I like the stability, and the solution is good in terms of my hit-rates & false positives. Now I just need to see how it will be implemented in reality - and will working with a basic subset of variables be close enough to get us to where we need to go? Thank you, again, & as usual.
... View more