<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: SAS EM Prior Probabilities in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/SAS-EM-Prior-Probabilities/m-p/481601#M7225</link>
    <description>&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN&gt;I am building a behavioral scoring model to calculate probability of default. Where could i see the estimated probability of my model in SAS Enterprise miner??&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;The additional columns I described in my previous note are added to any train/validate/test data set that is passed to a modeling node in SAS Enterprise Miner as well as any score (Role=Score) data set passed to a subsequent Score node in SAS Enterprise Miner.&amp;nbsp; &amp;nbsp;You can view&amp;nbsp;a sample of the data containing these additional columns by clicking on a modeling node and then clicking on the ellipsis (...) to the right of Exported Data in the properties sheet to the left.&amp;nbsp; &amp;nbsp;You can then highlight the row corresponding to any of the available data sets and then click on Browse or Explore&amp;nbsp;to see the variables that have been added.&amp;nbsp; &amp;nbsp;By default, they will be labeled something like&amp;nbsp; "Predicted: &amp;lt; target variable name &amp;gt; = &amp;lt; target variable level &amp;gt; "&amp;nbsp; for a categorical target variable.&amp;nbsp; &amp;nbsp;You can right-click on a column heading and choose "Name" to see the actual variable name which is of the form&amp;nbsp; "P_&amp;lt; target variable name &amp;gt; &amp;lt; target variable level &amp;gt;".&amp;nbsp; &amp;nbsp;So in my previous example, a target variable BAD taking on values BAD=1 or BAD=0 would contain probabilities in a column named P_BAD1 or P_BAD0&amp;nbsp; which would have lables&amp;nbsp; &amp;nbsp;"Predicted: BAD=1" or "Predicted: BAD=0" respectively.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Hope this helps!&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;BR /&gt;Cordially,&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Doug&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 26 Jul 2018 17:11:13 GMT</pubDate>
    <dc:creator>DougWielenga</dc:creator>
    <dc:date>2018-07-26T17:11:13Z</dc:date>
    <item>
      <title>SAS EM Prior Probabilities</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/SAS-EM-Prior-Probabilities/m-p/327655#M4921</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm trying to understand how EM calculates the no. of events vs non-events in each&amp;nbsp;ranked demi-decile&amp;nbsp;after adjusting for prior probabilities.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In my original data, I have 1% events and 99% non-events.&lt;/P&gt;&lt;P&gt;In my sample data for model development, I have 20% events and 80% non-events.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I&amp;nbsp;apply a random forest to my sample data.&amp;nbsp;&amp;nbsp;The model predicts that I&amp;nbsp;have in my 1st bin (i.e. demi-decile with the highest scores), 343&amp;nbsp;true&amp;nbsp;events&amp;nbsp;and 23&amp;nbsp;true non-events.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;After applying the decision node to my model results, I now have in my 1st bin (i.e. the demi-decile with the highest ADJUSTED scores), 36 true events and 332 true non-events.&amp;nbsp; How was this actually&amp;nbsp;determined?&amp;nbsp; I understand how the posterior probabilities are adjusted but I don't understand how the&amp;nbsp;no. of true&amp;nbsp;events and non-events are&amp;nbsp;adjusted.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Appreciate if someone can help to explain this.&lt;/P&gt;</description>
      <pubDate>Thu, 26 Jan 2017 09:05:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/SAS-EM-Prior-Probabilities/m-p/327655#M4921</guid>
      <dc:creator>PCKW</dc:creator>
      <dc:date>2017-01-26T09:05:01Z</dc:date>
    </item>
    <item>
      <title>Re: SAS EM Prior Probabilities</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/SAS-EM-Prior-Probabilities/m-p/386649#M5726</link>
      <description>&lt;P&gt;There are two different issues involved here -- the first is obtaining probabilities centered near your population estimates and the other is determining how to classify each observation based on that probability (adjusted for priors or not) and a decision weight if you have incorporated one. &amp;nbsp; By default, SAS Enterprise Miner generates a misclassification chart for the Train &amp;amp; Validate data sets&amp;nbsp;based on two variables which have the form&amp;nbsp;&lt;/P&gt;
&lt;DIV class="lia-quilt-column lia-quilt-column-20 lia-quilt-column-right lia-quilt-column-main-right"&gt;
&lt;DIV class="lia-quilt-column-alley lia-quilt-column-alley-right"&gt;
&lt;DIV class="lia-component-body"&gt;
&lt;DIV id="messagebodydisplay_0_0" class="lia-message-body"&gt;
&lt;DIV class="lia-message-body-content"&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;F_&amp;lt;target variable name&amp;gt; : the actual target level&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;I _&amp;lt;target variable name&amp;gt; : the predicted target level&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;SAS Enterprise Miner will compute a predicted probability (adjusted for priors if requested) for each level of the target of the form&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;P_&amp;lt;target variable name&amp;gt;&amp;lt;target variable level&amp;gt; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So for a target variable named 'BAD' with levels 0 or 1, it will generate&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;P_BAD1 : &amp;nbsp;the predicted probability that BAD=1&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;P_BAD0&amp;nbsp;&lt;SPAN&gt;: &amp;nbsp;the&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;predicted probability that BAD=0&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Using my example, the variable F_BAD is simply the actual target level (0 or 1) and the variable I_BAD will take the level associated with the highest predicted probability P_BAD1 and P_BAD0. &amp;nbsp; It is reasonable to assign observations to the target level which is most likely but this presents problems in rare event scenarios. &amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;In your oversampled data, your target level of interest occurred 20% of the time overall. &amp;nbsp;Using my example, suppose that BAD=1 occurs 20% of the time in the sample. &amp;nbsp; To have P_BAD1 &amp;gt;&amp;nbsp;P_BAD0, the observation had to have P_BAD1 &amp;gt; 50% which represents someone&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;U&gt;at least&lt;/U&gt;(50%) / (20%) = 2.5 times as likely to have the event compared to the overall average. &amp;nbsp; After adjusting for the prior probabilities to have the overall average only 1%, you would now need someone who was&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;U&gt;at least&lt;/U&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;(50%) / (1%) = 50&amp;nbsp;times as likely to have the rare event as the predicted event. &amp;nbsp; &amp;nbsp;Since there are far fewer people in this category, there are far fewer people (possibly none!) classified as having the rare event according to I_BAD (using my example). &amp;nbsp; This is why the number of predicted events changes so dramatically in your example. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In these situations, you can consider using a target weight to put more weight on the rare event. &amp;nbsp; If you do add Decision weights (either in the Decisions node or in the Input Data Source node), SAS&amp;nbsp;Enterprise Miner will also generate a D _&amp;lt;target variable name&amp;gt; which contains the 'decision' outcome based on the 'most profitable' or 'least costly' outcome. &amp;nbsp;In this situation, the decision weight is multiplied by the adjusted probability to get the 'expected value' of the decision and the outcome is assigned based on the best outcome. &amp;nbsp; &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Assigning outcomes based on putting extra decision weight on rare events can also pose challenges since those outcomes will be predicted to occur more often than they actually do. &amp;nbsp; If you click on the button 'Default with Inverse Prior Weights', SAS Enterprise Miner will take the specified prior and divide it into 1 to obtain the weight. &amp;nbsp;Suppose the prior probabilities were specified as 20% and 80%. &amp;nbsp; Then using the 'Default with Inverse Prior Weights' button would yield weights of &amp;nbsp;1 / 0.2 = 5 for the rare event and 1 / 0.8 = 1.25 for the common event. &amp;nbsp;You will notice that the ratio of weights&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;5 / 1.25 = 4&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;is in the same ratio as the prior probabilities&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; 80% / 20% = 4&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;so simply leaving the weight on the common event as 1 and changing the rare event to have a weight of 4 will have the same impact. Notice now that for the 'average' observation who has a probability of the rare event as 20% (or 0.2) and probability of the common event of 80% (or 0.8), you can see the expected value is the same using the weights as described above:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;Level &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Prior &amp;nbsp; &amp;nbsp; &amp;nbsp;Weight &amp;nbsp; &amp;nbsp; &amp;nbsp; Expected Value&lt;/P&gt;
&lt;P&gt;&amp;nbsp;rare event &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;0.2 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 4 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;0.2 * 4 = 0.8&lt;/P&gt;
&lt;P&gt;common event &amp;nbsp; &amp;nbsp;0.8 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;0.8 * 1 = 0.8&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;which suggests that using the 'Default with Inverse Prior Weights' will assign anyone with a probability higher than 0.2 (in this scenario) to have the target event which corresponds to anyone with a higher predicted probability than average. &amp;nbsp; This will generate a lot more predicted events based on the D_&amp;lt;variable name&amp;gt; variable since it is not unlikely that half or more of the observations have a predicted probability higher than average. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So what do you do? &amp;nbsp;Understand that the overall misclassification rate of the data set is not what is critical. &amp;nbsp; Look at the rate in each percentile of the data and determine how deep you want to go. &amp;nbsp;Then you can choose your own Decision threshold (e.g. probability higher than 0.35) above which you get a satisfactory misclassification rate. &amp;nbsp;The approach taken by SAS Enterprise Miner is a reasonable one since it has no business knowledge to base the outcome on other than what is provided -- either pick the most likely outcome or the most valuable outcome based on your weights -- but your best decisions will always incorporate your analytical needs.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For example, in some cases you might need an extremely low misclassification rate (e.g. maybe only looking at the top 1% or 2% of the scored data) because you are searching for fraud and don't want to annoy customers that are not acting fraudulently. &amp;nbsp;In other cases, you might be looking for a minimum response rate to make money (e.g. some direct mail advertisers only need a 2% response rate to be profitable). &amp;nbsp;Your best 'decision' should always incorporate your analytical and/or business objectives. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I hope this helps!&lt;/P&gt;
&lt;P&gt;Doug&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Wed, 09 Aug 2017 14:18:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/SAS-EM-Prior-Probabilities/m-p/386649#M5726</guid>
      <dc:creator>DougWielenga</dc:creator>
      <dc:date>2017-08-09T14:18:43Z</dc:date>
    </item>
    <item>
      <title>Re: SAS EM Prior Probabilities</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/SAS-EM-Prior-Probabilities/m-p/476314#M7168</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am building a behavioral scoring model to calculate probability of default. Where could i see the estimated probability of my model in SAS Enterprise miner??&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Anshul.&lt;/P&gt;</description>
      <pubDate>Sun, 08 Jul 2018 20:16:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/SAS-EM-Prior-Probabilities/m-p/476314#M7168</guid>
      <dc:creator>AnshulS</dc:creator>
      <dc:date>2018-07-08T20:16:37Z</dc:date>
    </item>
    <item>
      <title>Re: SAS EM Prior Probabilities</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/SAS-EM-Prior-Probabilities/m-p/481601#M7225</link>
      <description>&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN&gt;I am building a behavioral scoring model to calculate probability of default. Where could i see the estimated probability of my model in SAS Enterprise miner??&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;The additional columns I described in my previous note are added to any train/validate/test data set that is passed to a modeling node in SAS Enterprise Miner as well as any score (Role=Score) data set passed to a subsequent Score node in SAS Enterprise Miner.&amp;nbsp; &amp;nbsp;You can view&amp;nbsp;a sample of the data containing these additional columns by clicking on a modeling node and then clicking on the ellipsis (...) to the right of Exported Data in the properties sheet to the left.&amp;nbsp; &amp;nbsp;You can then highlight the row corresponding to any of the available data sets and then click on Browse or Explore&amp;nbsp;to see the variables that have been added.&amp;nbsp; &amp;nbsp;By default, they will be labeled something like&amp;nbsp; "Predicted: &amp;lt; target variable name &amp;gt; = &amp;lt; target variable level &amp;gt; "&amp;nbsp; for a categorical target variable.&amp;nbsp; &amp;nbsp;You can right-click on a column heading and choose "Name" to see the actual variable name which is of the form&amp;nbsp; "P_&amp;lt; target variable name &amp;gt; &amp;lt; target variable level &amp;gt;".&amp;nbsp; &amp;nbsp;So in my previous example, a target variable BAD taking on values BAD=1 or BAD=0 would contain probabilities in a column named P_BAD1 or P_BAD0&amp;nbsp; which would have lables&amp;nbsp; &amp;nbsp;"Predicted: BAD=1" or "Predicted: BAD=0" respectively.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Hope this helps!&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;BR /&gt;Cordially,&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Doug&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 26 Jul 2018 17:11:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/SAS-EM-Prior-Probabilities/m-p/481601#M7225</guid>
      <dc:creator>DougWielenga</dc:creator>
      <dc:date>2018-07-26T17:11:13Z</dc:date>
    </item>
    <item>
      <title>Re: SAS EM Prior Probabilities</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/SAS-EM-Prior-Probabilities/m-p/767869#M8877</link>
      <description>&lt;P&gt;"&lt;SPAN&gt;SAS Enterprise Miner will compute a predicted probability (adjusted for priors if requested) for each level of the target of the form"&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Doug, would you have any detail on exactly how SAS makes the adjustment for priors? I'm looking to understand the calculation and the rationale behind the calculation.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 15 Sep 2021 11:25:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/SAS-EM-Prior-Probabilities/m-p/767869#M8877</guid>
      <dc:creator>MB1983_</dc:creator>
      <dc:date>2021-09-15T11:25:13Z</dc:date>
    </item>
  </channel>
</rss>

