Hi Zach, Thanks for including the screenshot and the log. That sure helped! In general, you don't want to use Model Comparison node to compare the fit statistics of models that you trained on different data sets. There might be some special cases when you do want to combine the posterior probabilities of models trained on different data sets, for example when you are building a special type of ensemble model. But that's another conversation. Quick fix: Copy-paste the subflow Model Comparison->Score->Reporter two more times. Connect each of your HPForest Models to one of those subflows, run it, and you will have a Reporter that explains the Variable Importance of each of your HPForest models. Remember, this report is using a decision tree to explain the main drivers of a model. Why you got this error? From your log, it looks like the reporter node knows what variables in the metadata are used as inputs. It errored out when one of the data sets did not have two of those inputs (ULTIMATE_LITIGATION AND ULTIMATE_RTW). I am not sure if those to input variables were not in one of the data sets from the get-go or if they were not passed. Anyway, the suggested quick fix should get you what you need (except if the decision tree finds no rules, but that would only happen if no inputs drive your predicted target. As long as you see the variable importance chart on your pdf report and the log says something like "NOTE: The data set WORK.RULES has XXX observations and YYY variables.", everything is good.). About Scoring your HPForest model In short, the good news is that the Score node writes the SAS code you need to score new observations with your HPForest model. Open the Score node that you ran in your subflow and you will see the scoring code. HPForest is a special case that uses a specific proc called hp4score to score new observations. The reason to do this through a proc is because traditional SAS code would take a lot of time to read, to write, and it would be a really big file (remember that your hpforest combines hundreds of a special type of trees). Let me elaborate on the tradeoff of predictability vs explainability. For example let's compare a single decision tree with an hpforest. As a model, it is really easy to explain. From the tree diagram or from the score code you can come up with the set of rules that classify an observation as a predicted event (e.g. if X; or if X and Y; or if X,Y, and Z). But you cannot do the same for an hpforest. Even if you came up with the huge list of rules, you still need to average them. Interpreting a forest is really hard unless you do a workaround like the reporter node, which uses a single decision tree to explain the predicted outcome using the inputs of your hpforest model. I hope this helps! M
... View more