Hi Eric, Sometimes translating numerical output into words is a challenge, so let me try my best to offer as clear an explanation I can at the moment. You have posted a good example - thank you. First, let me note some facts about the Text Rule Builder node that is helpful to keep in mind. (1) This is a predictive model that creates Boolean rules and assesses the predictive power of each rule as well as the overall classification rates. (2) Every rule has a posterior probability associated with each Target Level. So Rule #1 (e.g. "sports" and not "channel") has a [potentially different] posterior probability associated with Target=TV, Target=Food, Target-Sports, etc, from your example. These posterior probabilities are based on the training data which trained the predictive model. (3) The classification is based on the rule-assessments, which are evaluated with a binary response. That is, each rule resulted in a 'True' or 'False' outcome. Or pass/fail or 0/1, if you prefer. You example shows a column with: (a) your input variables. In this case, that is only one column: "Document". (b) n "Predicted: Target=target level" columns which provide a posterior probability that the record belongs to this target level. In this case, there are 8 target levels: TV, Food, ..., Game, Finance. The probabilities in these columns are a result of a naive Bayes algorithm which use the values from (2) above. (c) a "Why" column which indicates the rule number that determined the 'assigned' Target value. "32" stands for the 32nd rule, which can be found in the Text Rule Builder Results (without number label) or in the textrule_conj_rule table under the field conj_id. (d) an "Into: Target" column, which specifies the assigned Target Value for each record. (e) a column with the maximum value of the n "Predicted: Target=" values, or the maximum of the values in the columns from (b) above. Now I've finally set the stage for a respone to your questions. As your intuition tells you, most of the time, the target level with the highest probability in the "Predicted: Target=" column corresponds with the Target level that the Text Rule Builder node assigns. In other words, most of the time, the Text Rule Builder will assign the value that corresponds to the category with the "highest probability in which it belongs". Usually, this close to a 1:1 pattern - my experience with having a good sized training and 'score' sets are all 90+%. So why would the assigned Target level ever not correspond to the target value with the highest posterior probability? Answer: It is all about the rules! The predictive model will evaluate each of these Boolean rules and assign the category/target level for the triggering rule. For example: If the text contains "cell" and not "phone" and "reproduction" the assign the Target level = Biology. The probabilities are not probabilities of the category that the document would be assigned. Since we are talking about Boolean rules, those values would be a matrix of 0s and 1s. Instead, the probabilities are a prediction of where the document may belong based on the posterior probabilities of the rules. This is also still a good predictor of the assigned category, so it can be used as such, to answer your last question. In practice, I think it is fair to say that those records which are assigned to a category other than the one with the highest probability are good candidates for review. They may indicate reasons to augment the training data set and rerun the node, or it may represent a document that may cross two categories. For example, an article about game theory applied to ecosystems might explain your first example. Or something about cyber espionage may cross "Computer" and "Politics" such as in your second. Hopefully the clarification - that the probabilities represent a prediction of category based upon posterior probabilities where the assignment is based upon True/False rules - is helpful. Thanks, Justin
... View more