<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to interpret the scoring data result of Text Rule Builder in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/How-to-interpret-the-scoring-data-result-of-Text-Rule-Builder/m-p/110090#M9271</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Eric,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Sometimes translating numerical output into words is a challenge, so let me try my best to offer as clear an explanation I can at the moment.&amp;nbsp; You have posted a good example - thank you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;First, let me note some facts about the Text Rule Builder node that is helpful to keep in mind.&lt;/P&gt;&lt;P&gt;(1) This is a predictive model that creates Boolean rules and assesses the predictive power of each rule as well as the overall classification rates.&lt;/P&gt;&lt;P&gt;(2) Every rule has a posterior probability associated with each Target Level.&amp;nbsp; So Rule #1 (e.g. "sports" and not "channel") has a [potentially different] posterior probability associated with Target=TV, Target=Food, Target-Sports, etc, from your example.&amp;nbsp; These posterior probabilities are based on the training data which trained the predictive model.&lt;/P&gt;&lt;P&gt;(3) The classification is based on the rule-assessments, which are evaluated with a binary response.&amp;nbsp; That is, each rule resulted in a 'True' or 'False' outcome. Or pass/fail or 0/1, if you prefer.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You example shows a column with:&lt;/P&gt;&lt;P&gt;(a)&amp;nbsp; your input variables.&amp;nbsp; In this case, that is only one column:&amp;nbsp; "Document".&lt;/P&gt;&lt;P&gt;(b)&amp;nbsp; &lt;EM&gt;n &lt;/EM&gt;"Predicted: Target=&lt;EM&gt;target level&lt;/EM&gt;" columns which provide a posterior probability that the record belongs to this target level.&amp;nbsp; In this case, there are 8 target levels: TV, Food, ..., Game, Finance.&amp;nbsp;&amp;nbsp; The probabilities in these columns are a result of a naive Bayes algorithm which use the values from (2) above.&lt;/P&gt;&lt;P&gt;(c)&amp;nbsp; a "Why" column which indicates the rule number that determined the 'assigned' Target value.&amp;nbsp; "32" stands for the 32nd rule, which can be found in the Text Rule Builder Results (without number label) or in the &lt;EM&gt;textrule_conj_rule&lt;/EM&gt; table under the field &lt;EM&gt;conj_id.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;(d)&amp;nbsp; an "Into: Target" column, which specifies the assigned Target Value for each record. &lt;/P&gt;&lt;P&gt;(e)&amp;nbsp; a column with the maximum value of the &lt;EM&gt;n&lt;/EM&gt; "Predicted: Target=" values, or the maximum of the values in the columns from (b) above.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Now I've finally set the stage for a respone to your questions.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;As your intuition tells you, most of the time, the target level with the highest probability in the "Predicted: Target=" column corresponds with the Target level that the Text Rule Builder node assigns.&amp;nbsp; In other words, most of the time, the Text Rule Builder will assign the value that corresponds to the category with the "highest probability in which it belongs".&amp;nbsp; &lt;EM&gt;Usually, this close to a 1:1 pattern - my experience with having a good sized training and 'score' sets are all 90+%.&lt;/EM&gt; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;So why would the &lt;SPAN style="text-decoration: underline;"&gt;assigned&lt;/SPAN&gt; Target level ever &lt;EM&gt;not&lt;/EM&gt; correspond to the target value with the highest posterior probability? &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Answer: It is all about the rules!&amp;nbsp; The predictive model will evaluate each of these Boolean rules and assign the category/target level for the triggering rule.&amp;nbsp; For example: &lt;EM&gt;If the text contains "cell" and not "phone" and "reproduction" the assign the Target level = Biology&lt;/EM&gt;. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The probabilities are not probabilities of the category that the document would be &lt;EM&gt;assigned.&amp;nbsp; &lt;/EM&gt;Since we are talking about Boolean rules, those values would be a matrix of 0s and 1s.&amp;nbsp; Instead, the probabilities are a prediction of where the document &lt;EM&gt;may belong&lt;/EM&gt; based on the posterior probabilities of the rules.&amp;nbsp; This is also still a good predictor of the &lt;EM&gt;assigned &lt;/EM&gt;category, so it can be used as such, to answer your last question. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In practice, I think it is fair to say that those records which are assigned to a category other than the one with the highest probability are good candidates for review.&amp;nbsp; They may indicate reasons to augment the training data set and rerun the node, or it may represent a document that may cross two categories.&amp;nbsp; For example, an article about game theory applied to ecosystems might explain your first example.&amp;nbsp; Or something about cyber espionage may cross "Computer" and "Politics" such as in your second.&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hopefully the clarification - that the probabilities represent a prediction of category based upon posterior probabilities where the assignment is based upon True/False rules - is helpful. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Justin&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Wed, 16 Oct 2013 19:02:11 GMT</pubDate>
    <dc:creator>JustinPlumley</dc:creator>
    <dc:date>2013-10-16T19:02:11Z</dc:date>
    <item>
      <title>How to interpret the scoring data result of Text Rule Builder</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/How-to-interpret-the-scoring-data-result-of-Text-Rule-Builder/m-p/110089#M9270</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I am new to SAS text miner. I had built a categorization model (8 category values for target variable) in Text Rule Builder node with training data and now I am trying to score new data. I export the scoring data and tried to understand each variable that was given by the model. Here is my scoring output data.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;TABLE border="0" cellpadding="0" cellspacing="0" style="width: 1344px;"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD class="xl68" height="19" width="112"&gt;Document&lt;/TD&gt;&lt;TD class="xl68" style="border-left: none;" width="112"&gt;Predicted: Target=TV&lt;/TD&gt;&lt;TD class="xl68" style="border-left: none;" width="112"&gt;Predicted: Target=Food&lt;/TD&gt;&lt;TD class="xl68" style="border-left: none;" width="112"&gt;Predicted: Target=Biology&lt;/TD&gt;&lt;TD class="xl68" style="border-left: none;" width="112"&gt;Predicted: Target=Politics&lt;/TD&gt;&lt;TD class="xl68" style="border-left: none;" width="112"&gt;Predicted: Target=Sport&lt;/TD&gt;&lt;TD class="xl68" style="border-left: none;" width="112"&gt;Predicted: Target=Computer&lt;/TD&gt;&lt;TD class="xl68" style="border-left: none;" width="112"&gt;Predicted: Target=Game&lt;/TD&gt;&lt;TD class="xl68" style="border-left: none;" width="112"&gt;Predicted: Target=Finance&lt;/TD&gt;&lt;TD class="xl68" style="border-left: none;" width="112"&gt;Why Into: L2&lt;/TD&gt;&lt;TD class="xl68" style="border-left: none;" width="112"&gt;Into: Target&lt;/TD&gt;&lt;TD class="xl68" style="border-left: none;" width="112"&gt;Probability of Classification&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl67" height="19" style="border-top: none;"&gt;Text data here 1…&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;1.02E-05&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;1.20E-05&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;3.68E-04&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.004349291&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;1.06E-05&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.025836599&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.969411505&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;1.41E-06&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;32&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;Biology&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.969411505&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl67" height="19" style="border-top: none;"&gt;Text data here 2…&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.00160021&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;3.44E-04&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.053080483&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.019703281&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;5.09E-04&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.697857485&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.225630023&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.00127548&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;53&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;Politics&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.697857485&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl67" height="19" style="border-top: none;"&gt;Text data here 3…&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.035478096&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.034664386&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.023956528&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.07893037&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.066393264&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.174361862&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.332943365&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.253272129&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;175&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;Game&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.332943365&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl67" height="19" style="border-top: none;"&gt;Text data here 4…&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;5.98E-06&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;7.80E-06&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.011925956&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;8.08E-05&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;2.14E-05&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;3.44E-04&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.987613772&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;2.86E-07&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;31&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;Biology&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.987613772&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl67" height="19" style="border-top: none;"&gt;Text data here 5…&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.0103223&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.010505784&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.015682763&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.046567258&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.032731622&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.129644172&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.101578312&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.652967789&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;.&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;Finance&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.652967789&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl67" height="19" style="border-top: none;"&gt;Text data here 6…&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.012279932&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.019225879&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.00598867&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.104194986&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.033799801&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.065556881&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.621594848&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.137359003&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;149&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;Game&lt;/TD&gt;&lt;TD class="xl67" style="border-top: none; border-left: none;"&gt;0.621594848&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;At first I thought since the “Probability of Classification” is the biggest number from the 8 “Predict: Target= ” variables ( which to my understanding it is the posterior probability of that category being assigned), the document should be assigned to the target category with the largest variable but obviously I am wrong. For example, the first obs has “&lt;SPAN style="color: black;"&gt;&lt;STRONG&gt;Predicted: Target=Game&lt;/STRONG&gt;&lt;/SPAN&gt;” value of 0.9694 which is the largest number but this document was assigned to Biology. So how should I interpret those “Predict: Target= ” variable numbers? How can I get the probability or membership-like number of each document to see how much does it belong to each of these 8 categories?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Eric&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 10 Oct 2013 22:36:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/How-to-interpret-the-scoring-data-result-of-Text-Rule-Builder/m-p/110089#M9270</guid>
      <dc:creator>EricWoo</dc:creator>
      <dc:date>2013-10-10T22:36:04Z</dc:date>
    </item>
    <item>
      <title>Re: How to interpret the scoring data result of Text Rule Builder</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/How-to-interpret-the-scoring-data-result-of-Text-Rule-Builder/m-p/110090#M9271</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Eric,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Sometimes translating numerical output into words is a challenge, so let me try my best to offer as clear an explanation I can at the moment.&amp;nbsp; You have posted a good example - thank you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;First, let me note some facts about the Text Rule Builder node that is helpful to keep in mind.&lt;/P&gt;&lt;P&gt;(1) This is a predictive model that creates Boolean rules and assesses the predictive power of each rule as well as the overall classification rates.&lt;/P&gt;&lt;P&gt;(2) Every rule has a posterior probability associated with each Target Level.&amp;nbsp; So Rule #1 (e.g. "sports" and not "channel") has a [potentially different] posterior probability associated with Target=TV, Target=Food, Target-Sports, etc, from your example.&amp;nbsp; These posterior probabilities are based on the training data which trained the predictive model.&lt;/P&gt;&lt;P&gt;(3) The classification is based on the rule-assessments, which are evaluated with a binary response.&amp;nbsp; That is, each rule resulted in a 'True' or 'False' outcome. Or pass/fail or 0/1, if you prefer.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You example shows a column with:&lt;/P&gt;&lt;P&gt;(a)&amp;nbsp; your input variables.&amp;nbsp; In this case, that is only one column:&amp;nbsp; "Document".&lt;/P&gt;&lt;P&gt;(b)&amp;nbsp; &lt;EM&gt;n &lt;/EM&gt;"Predicted: Target=&lt;EM&gt;target level&lt;/EM&gt;" columns which provide a posterior probability that the record belongs to this target level.&amp;nbsp; In this case, there are 8 target levels: TV, Food, ..., Game, Finance.&amp;nbsp;&amp;nbsp; The probabilities in these columns are a result of a naive Bayes algorithm which use the values from (2) above.&lt;/P&gt;&lt;P&gt;(c)&amp;nbsp; a "Why" column which indicates the rule number that determined the 'assigned' Target value.&amp;nbsp; "32" stands for the 32nd rule, which can be found in the Text Rule Builder Results (without number label) or in the &lt;EM&gt;textrule_conj_rule&lt;/EM&gt; table under the field &lt;EM&gt;conj_id.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;(d)&amp;nbsp; an "Into: Target" column, which specifies the assigned Target Value for each record. &lt;/P&gt;&lt;P&gt;(e)&amp;nbsp; a column with the maximum value of the &lt;EM&gt;n&lt;/EM&gt; "Predicted: Target=" values, or the maximum of the values in the columns from (b) above.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Now I've finally set the stage for a respone to your questions.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;As your intuition tells you, most of the time, the target level with the highest probability in the "Predicted: Target=" column corresponds with the Target level that the Text Rule Builder node assigns.&amp;nbsp; In other words, most of the time, the Text Rule Builder will assign the value that corresponds to the category with the "highest probability in which it belongs".&amp;nbsp; &lt;EM&gt;Usually, this close to a 1:1 pattern - my experience with having a good sized training and 'score' sets are all 90+%.&lt;/EM&gt; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;So why would the &lt;SPAN style="text-decoration: underline;"&gt;assigned&lt;/SPAN&gt; Target level ever &lt;EM&gt;not&lt;/EM&gt; correspond to the target value with the highest posterior probability? &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Answer: It is all about the rules!&amp;nbsp; The predictive model will evaluate each of these Boolean rules and assign the category/target level for the triggering rule.&amp;nbsp; For example: &lt;EM&gt;If the text contains "cell" and not "phone" and "reproduction" the assign the Target level = Biology&lt;/EM&gt;. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The probabilities are not probabilities of the category that the document would be &lt;EM&gt;assigned.&amp;nbsp; &lt;/EM&gt;Since we are talking about Boolean rules, those values would be a matrix of 0s and 1s.&amp;nbsp; Instead, the probabilities are a prediction of where the document &lt;EM&gt;may belong&lt;/EM&gt; based on the posterior probabilities of the rules.&amp;nbsp; This is also still a good predictor of the &lt;EM&gt;assigned &lt;/EM&gt;category, so it can be used as such, to answer your last question. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In practice, I think it is fair to say that those records which are assigned to a category other than the one with the highest probability are good candidates for review.&amp;nbsp; They may indicate reasons to augment the training data set and rerun the node, or it may represent a document that may cross two categories.&amp;nbsp; For example, an article about game theory applied to ecosystems might explain your first example.&amp;nbsp; Or something about cyber espionage may cross "Computer" and "Politics" such as in your second.&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hopefully the clarification - that the probabilities represent a prediction of category based upon posterior probabilities where the assignment is based upon True/False rules - is helpful. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Justin&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 16 Oct 2013 19:02:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/How-to-interpret-the-scoring-data-result-of-Text-Rule-Builder/m-p/110090#M9271</guid>
      <dc:creator>JustinPlumley</dc:creator>
      <dc:date>2013-10-16T19:02:11Z</dc:date>
    </item>
    <item>
      <title>Re: How to interpret the scoring data result of Text Rule Builder</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/How-to-interpret-the-scoring-data-result-of-Text-Rule-Builder/m-p/110091#M9272</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Justin,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you so much for giving this excellent explanation which I can't even find in the SAS EM help documentation! Your explanation is logically arranged and easy to understand!I got most of your points but still I need to dig into it a little bit.&lt;/P&gt;&lt;P&gt;If I understand you correctly, this output contains two algorithms to predict the target categories: One is based on the posterior probability given by Naive Bayes, another is based on the rules created by the Text Rule Builder node. And the "into: Target" column represents the predicting result of the rules-based prediction. Correct me if I am wrong please.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have experience creating Naive Bayes Classifier in Python NLTK for machine learning before so I have no problem with the Naive Bayes, prior and posterior probability things. My follow-up questions would be:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;how does SAS Text Rule Builder node come up with those rules? Aren't these Boolean rules derived from Naive Bayes likelihood P(word|category) ? &lt;/LI&gt;&lt;LI&gt;In the Result&amp;gt;output windows of Text rule builder, it gave the Target Percentage (Precision) and Outcome Percentage (Recall) for training and validation datasets and I mainly use these numbers to evaluate the classification result. So on which algorithms (Naive Bayes probability prediction or Boolean text rules) does these numbers based?&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Your replies are appreciated! Thanks,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Eric&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 18 Oct 2013 20:47:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/How-to-interpret-the-scoring-data-result-of-Text-Rule-Builder/m-p/110091#M9272</guid>
      <dc:creator>EricWoo</dc:creator>
      <dc:date>2013-10-18T20:47:36Z</dc:date>
    </item>
    <item>
      <title>Re: How to interpret the scoring data result of Text Rule Builder</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/How-to-interpret-the-scoring-data-result-of-Text-Rule-Builder/m-p/110092#M9273</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Eric,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You have it. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In response to your first question, the Text Rule Builder node uses a sequential approach of examining terms/combinations for those with the highest estimated precision, and iteratively looks at smaller and smaller subsets of the data after removing matches for earlier rules. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The Precision and Recall are based on the Boolean rules (all rules up to the line that you are looking at in particular) for their calculation.&amp;nbsp; It is easiest to tell with the first rules since the arithmetic is easier to check quickly. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Justin&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 31 Oct 2013 21:10:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/How-to-interpret-the-scoring-data-result-of-Text-Rule-Builder/m-p/110092#M9273</guid>
      <dc:creator>JustinPlumley</dc:creator>
      <dc:date>2013-10-31T21:10:21Z</dc:date>
    </item>
    <item>
      <title>Re: How to interpret the scoring data result of Text Rule Builder</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/How-to-interpret-the-scoring-data-result-of-Text-Rule-Builder/m-p/110093#M9274</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thank you Justin. You are the master of it!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 01 Nov 2013 16:47:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/How-to-interpret-the-scoring-data-result-of-Text-Rule-Builder/m-p/110093#M9274</guid>
      <dc:creator>EricWoo</dc:creator>
      <dc:date>2013-11-01T16:47:15Z</dc:date>
    </item>
  </channel>
</rss>

