<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Oversampling and Decision tree help Plz! in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Oversampling-and-Decision-tree-help-Plz/m-p/270197#M3991</link>
    <description>&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;Hi Jason,&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;thank you for responding&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt; orphans: auto; text-align: start; widows: 1; -webkit-text-stroke-width: 0px; word-spacing: 0px;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;I dont think I was clear from the begining. let me walk you through the steps I have taken.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt; orphans: auto; text-align: start; widows: 1; -webkit-text-stroke-width: 0px; word-spacing: 0px;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;I have an origninal dataset that I oversampled &amp;nbsp;,patitioned, placed a decisions node to adjust my posterior probabilities and lastly I used the decision tree to model it, (I have taken all these steps in SAS Enterprise Miner only, I havent used base sas) here is the view:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;&lt;IMG src="https://communities.sas.com/t5/image/serverpage/image-id/3143i2C1774F48DA03F9C/image-size/original?v=v2&amp;amp;px=-1" border="0" alt="workflow.PNG" title="workflow.PNG" /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;Now in the original dataset the event rate is 2% and the non-event rate is 98%, when I oversample &amp;nbsp;the event rate becomes 30% and the non-event rate is 70% .&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;In data partion node my training dataset &amp;nbsp;contains :&amp;nbsp;&lt;STRONG&gt;3035 non-event rate and&amp;nbsp;1301&lt;SPAN&gt;&amp;nbsp; event rate for a total of 4336 observations&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;In the decion node: I adjust the priors to 2% event and 98% non-event as shown below:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;&lt;IMG src="https://communities.sas.com/t5/image/serverpage/image-id/3148i5F9AF3B60DCFBB8A/image-size/original?v=v2&amp;amp;px=-1" border="0" alt="decision.PNG" title="decision.PNG" /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;Now, onto the decision tree:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;if&lt;STRONG&gt; I dont use &lt;/STRONG&gt;the decision node to adjust the priors , I get &amp;nbsp;these proportions (30% event, 70% non-event) &lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;and counts (1301 events ,3035 non-event)at the root node:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;&lt;IMG src="https://communities.sas.com/t5/image/serverpage/image-id/3150i4ABBE69F9586AC84/image-size/original?v=v2&amp;amp;px=-1" border="0" alt="oversample.PNG" title="oversample.PNG" /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;which is correct given I didnt adjust for priors.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;Now when I use the decision node to adjust the priors,&lt;/SPAN&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt; I get these proportions (2% event,98%non-event) &lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;and counts (86.72 event,4249 non-event) at the root node:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;&lt;IMG src="https://communities.sas.com/t5/image/serverpage/image-id/3152iB688B8AFA61A4F0B/image-size/original?v=v2&amp;amp;px=-1" border="0" alt="root.PNG" title="root.PNG" /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;what I am trying to understand is that does sas enterprise miner think that I have only 86.72 events instead of 1301 &amp;nbsp;or what is going on here?&lt;STRONG&gt; ( I am really confused about this) (I know the total number of observation is correct =4336)&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;Also when I build a logistic regression on the same oversampled dataset , I open the results and under view -&amp;gt;SAS Code , I get the updated probabilities as such:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;*** Update Posterior Probabilities;&lt;BR /&gt;_P0 = _P0 * 0.02 / 0.2997;&lt;BR /&gt;_P1 = _P1 * 0.98 / 0.7003;&lt;BR /&gt;drop _sum; _sum = _P0 + _P1 ;&lt;BR /&gt;if _sum &amp;gt; 4.135903E-25 then do;&lt;BR /&gt; _P0 = _P0 / _sum;&lt;BR /&gt; _P1 = _P1 / _sum;&lt;BR /&gt;end;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;that's how I know that sas adjusted my posterior probabilities ,on the other hand when using decion trees, I dont get this code.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;I hope I explained myself better this time.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;Thanks you Jason so much for your help&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 12 May 2016 20:31:07 GMT</pubDate>
    <dc:creator>nismail1976</dc:creator>
    <dc:date>2016-05-12T20:31:07Z</dc:date>
    <item>
      <title>Oversampling and Decision tree help Plz!</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Oversampling-and-Decision-tree-help-Plz/m-p/267675#M3963</link>
      <description>&lt;P&gt;&amp;nbsp;hello everyone,&lt;/P&gt;
&lt;P&gt;I am using SAS Enterprise Miner to create a model for a categorical response variable (0,1)..&lt;/P&gt;
&lt;P&gt;since my event rate is about 2%&amp;nbsp;and&amp;nbsp;non-event rate is 98%, I have oversampled so that I have the following Proportions 30% event, 70% nonevent rate.&lt;/P&gt;
&lt;P&gt;these are the Results from Oversampling&lt;/P&gt;
&lt;P&gt;Data=TRAIN&lt;BR /&gt; Variable &amp;nbsp;Value &amp;nbsp;Count &amp;nbsp;Percent &lt;BR /&gt; Resp &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 0 &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;STRONG&gt; 3035&lt;/STRONG&gt; &amp;nbsp; &amp;nbsp; &amp;nbsp;70% &lt;/P&gt;
&lt;P&gt;Resp &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1 &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;STRONG&gt; 1301&lt;/STRONG&gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; 30%&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;At this point I correct for the bias in the sample by adding a decision node to adjust the priors right before placing a decision tree node. here is the flow process.&lt;/P&gt;
&lt;P&gt;&lt;IMG src="https://communities.sas.com/t5/image/serverpage/image-id/2992iA45727387521314A/image-size/original?v=mpbl-1&amp;amp;px=-1" border="0" alt="Process Flow.PNG" title="Process Flow.PNG" /&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;my question is as follows: since after the oversampling, the number event instances is &lt;STRONG&gt;1301 &lt;/STRONG&gt;why do I get only&lt;STRONG&gt; 86.72 &lt;/STRONG&gt;event instances in the root node:&lt;/P&gt;
&lt;P&gt;&lt;IMG src="https://communities.sas.com/t5/image/serverpage/image-id/2993i11221931C2B114DC/image-size/original?v=mpbl-1&amp;amp;px=-1" border="0" alt="root.PNG" title="root.PNG" /&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;just to be clear : when I have oversampled I got 1301 for event and 3035 non-event. when I add decision node I get 86.75 event and 4249 non-event. why is that?&lt;/P&gt;
&lt;P&gt;Thank you in advance&lt;/P&gt;
&lt;P&gt;your help is greatly appreciated&lt;/P&gt;</description>
      <pubDate>Mon, 02 May 2016 17:18:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Oversampling-and-Decision-tree-help-Plz/m-p/267675#M3963</guid>
      <dc:creator>nismail1976</dc:creator>
      <dc:date>2016-05-02T17:18:21Z</dc:date>
    </item>
    <item>
      <title>Re: Oversampling and Decision tree help Plz!</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Oversampling-and-Decision-tree-help-Plz/m-p/268156#M3968</link>
      <description>Hi, &lt;BR /&gt;What is the purpose of adding the Decision Node after you already oversampled it? could you share details inside the Decision Node? Apparently it flips back to pre-oversample ratio. Hope this helps. Jason Xin</description>
      <pubDate>Wed, 04 May 2016 03:17:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Oversampling-and-Decision-tree-help-Plz/m-p/268156#M3968</guid>
      <dc:creator>JasonXin</dc:creator>
      <dc:date>2016-05-04T03:17:32Z</dc:date>
    </item>
    <item>
      <title>Re: Oversampling and Decision tree help Plz!</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Oversampling-and-Decision-tree-help-Plz/m-p/268272#M3971</link>
      <description>&lt;P&gt;hi,&lt;/P&gt;
&lt;P&gt;thank you for responding,&lt;/P&gt;
&lt;P&gt;shouldnt I add a decision node after &amp;nbsp;I oversample? or should I&amp;nbsp;add&amp;nbsp;it&amp;nbsp;before hand? this is how I corrected for the oversampling in the decision node:&lt;/P&gt;
&lt;P&gt;&lt;IMG src="https://communities.sas.com/t5/image/serverpage/image-id/3038iF707B2C922930D6F/image-size/original?v=v2&amp;amp;px=-1" border="0" alt="decision.PNG" title="decision.PNG" /&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;if you could tell me what am I doing wrong ( is my flow process wrong)?, I would really apreaciate it. I am really&amp;nbsp;stuck here.&lt;/P&gt;
&lt;P&gt;Thanks for you help&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 04 May 2016 13:29:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Oversampling-and-Decision-tree-help-Plz/m-p/268272#M3971</guid>
      <dc:creator>nismail1976</dc:creator>
      <dc:date>2016-05-04T13:29:39Z</dc:date>
    </item>
    <item>
      <title>Re: Oversampling and Decision tree help Plz!</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Oversampling-and-Decision-tree-help-Plz/m-p/268338#M3972</link>
      <description>No problem. Let me try. 

There are two things here, One is physically re-sampling your input file towards model building. The other is logically re-sampling meaning you don't physically change the input data set but tell the modeling node to try the sample as if it has been physically re-sampled. 

Now your case obviously is to change from the initial 'response rate' =2% to 30%, for whatever reasons. You did it in physically way (you probably ran it through BASE code....). Which is fine. As indicated by your screen shot, you already accomplished since the input to the Decision Node shows 30%=1. You don't really need to add the Decision Node. Because if you do, as you did, you place 0.02 here it flips back to 2%. 

More often we don't go back to BASE.. to re-code the data physically. A 'better' practice is we carry the raw data set, apply Decision Node where you can reset the ratio. I would encourage you to click through all the 4 tabs on the top, TARGET, prior probabilities, Decisions, and Decision Weights, to have a fuller understanding of what each one means. As you will see, Decision Node is very flexibility. Hope this helps. Jason Xin</description>
      <pubDate>Wed, 04 May 2016 17:42:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Oversampling-and-Decision-tree-help-Plz/m-p/268338#M3972</guid>
      <dc:creator>JasonXin</dc:creator>
      <dc:date>2016-05-04T17:42:13Z</dc:date>
    </item>
    <item>
      <title>Re: Oversampling and Decision tree help Plz!</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Oversampling-and-Decision-tree-help-Plz/m-p/268697#M3973</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;thank you responding!&lt;/P&gt;
&lt;P&gt;If I understand you correctly, I will need to place decision code before oversampling. &lt;STRONG&gt;right?&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;thank you&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks&lt;/P&gt;
&lt;P&gt;nabil&lt;/P&gt;</description>
      <pubDate>Thu, 05 May 2016 22:16:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Oversampling-and-Decision-tree-help-Plz/m-p/268697#M3973</guid>
      <dc:creator>nismail1976</dc:creator>
      <dc:date>2016-05-05T22:16:55Z</dc:date>
    </item>
    <item>
      <title>Re: Oversampling and Decision tree help Plz!</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Oversampling-and-Decision-tree-help-Plz/m-p/268780#M3974</link>
      <description>&lt;P&gt;would this flow process be more appropriate and will it adjust my posterior probabilities?&lt;/P&gt;</description>
      <pubDate>Fri, 06 May 2016 12:40:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Oversampling-and-Decision-tree-help-Plz/m-p/268780#M3974</guid>
      <dc:creator>nismail1976</dc:creator>
      <dc:date>2016-05-06T12:40:46Z</dc:date>
    </item>
    <item>
      <title>Re: Oversampling and Decision tree help Plz!</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Oversampling-and-Decision-tree-help-Plz/m-p/268782#M3975</link>
      <description>&lt;P&gt;&lt;SPAN&gt;would this flow process be more appropriate and will it adjust my posterior probabilities?&lt;/SPAN&gt;&lt;IMG src="https://communities.sas.com/t5/image/serverpage/image-id/3068iFD7AB49E6119F130/image-size/original?v=v2&amp;amp;px=-1" border="0" alt="flow.PNG" title="flow.PNG" /&gt;&lt;/P&gt;
&lt;P&gt;Thanks again, you have been awesome!&lt;/P&gt;</description>
      <pubDate>Fri, 06 May 2016 12:43:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Oversampling-and-Decision-tree-help-Plz/m-p/268782#M3975</guid>
      <dc:creator>nismail1976</dc:creator>
      <dc:date>2016-05-06T12:43:09Z</dc:date>
    </item>
    <item>
      <title>Re: Oversampling and Decision tree help Plz!</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Oversampling-and-Decision-tree-help-Plz/m-p/268959#M3976</link>
      <description>If you put Decision Node right after the Data set node, to effect the ratio change (logically reweighting, that is), the job you initially wanted is essentially done. Placing Sample Node: the only legit purpose I can imagine, to not to alter the ratio reweighting you just did, is to proportionately sample the data set down? Not sure why you put sample node here in the flow. Typically Decision Node is used if you want to change the ratio logically. Sample Node is used if you desire to physically have a different data set (to reflect the new ratio). So I would say the two nodes are either or, but not both. (unless the data set is big in size, and you like to have a subset to represent it)&lt;BR /&gt;&lt;BR /&gt;Data partition Node is different: you are creating training, validation, testing. If this is your goal, it then stays in the flow. Hope this helps. Jason Xin</description>
      <pubDate>Sat, 07 May 2016 01:47:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Oversampling-and-Decision-tree-help-Plz/m-p/268959#M3976</guid>
      <dc:creator>JasonXin</dc:creator>
      <dc:date>2016-05-07T01:47:06Z</dc:date>
    </item>
    <item>
      <title>Re: Oversampling and Decision tree help Plz!</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Oversampling-and-Decision-tree-help-Plz/m-p/270197#M3991</link>
      <description>&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;Hi Jason,&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;thank you for responding&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt; orphans: auto; text-align: start; widows: 1; -webkit-text-stroke-width: 0px; word-spacing: 0px;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;I dont think I was clear from the begining. let me walk you through the steps I have taken.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt; orphans: auto; text-align: start; widows: 1; -webkit-text-stroke-width: 0px; word-spacing: 0px;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;I have an origninal dataset that I oversampled &amp;nbsp;,patitioned, placed a decisions node to adjust my posterior probabilities and lastly I used the decision tree to model it, (I have taken all these steps in SAS Enterprise Miner only, I havent used base sas) here is the view:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;&lt;IMG src="https://communities.sas.com/t5/image/serverpage/image-id/3143i2C1774F48DA03F9C/image-size/original?v=v2&amp;amp;px=-1" border="0" alt="workflow.PNG" title="workflow.PNG" /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;Now in the original dataset the event rate is 2% and the non-event rate is 98%, when I oversample &amp;nbsp;the event rate becomes 30% and the non-event rate is 70% .&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;In data partion node my training dataset &amp;nbsp;contains :&amp;nbsp;&lt;STRONG&gt;3035 non-event rate and&amp;nbsp;1301&lt;SPAN&gt;&amp;nbsp; event rate for a total of 4336 observations&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;In the decion node: I adjust the priors to 2% event and 98% non-event as shown below:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;&lt;IMG src="https://communities.sas.com/t5/image/serverpage/image-id/3148i5F9AF3B60DCFBB8A/image-size/original?v=v2&amp;amp;px=-1" border="0" alt="decision.PNG" title="decision.PNG" /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;Now, onto the decision tree:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;if&lt;STRONG&gt; I dont use &lt;/STRONG&gt;the decision node to adjust the priors , I get &amp;nbsp;these proportions (30% event, 70% non-event) &lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;and counts (1301 events ,3035 non-event)at the root node:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;&lt;IMG src="https://communities.sas.com/t5/image/serverpage/image-id/3150i4ABBE69F9586AC84/image-size/original?v=v2&amp;amp;px=-1" border="0" alt="oversample.PNG" title="oversample.PNG" /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;which is correct given I didnt adjust for priors.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;Now when I use the decision node to adjust the priors,&lt;/SPAN&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt; I get these proportions (2% event,98%non-event) &lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;and counts (86.72 event,4249 non-event) at the root node:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;&lt;IMG src="https://communities.sas.com/t5/image/serverpage/image-id/3152iB688B8AFA61A4F0B/image-size/original?v=v2&amp;amp;px=-1" border="0" alt="root.PNG" title="root.PNG" /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;what I am trying to understand is that does sas enterprise miner think that I have only 86.72 events instead of 1301 &amp;nbsp;or what is going on here?&lt;STRONG&gt; ( I am really confused about this) (I know the total number of observation is correct =4336)&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;Also when I build a logistic regression on the same oversampled dataset , I open the results and under view -&amp;gt;SAS Code , I get the updated probabilities as such:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;*** Update Posterior Probabilities;&lt;BR /&gt;_P0 = _P0 * 0.02 / 0.2997;&lt;BR /&gt;_P1 = _P1 * 0.98 / 0.7003;&lt;BR /&gt;drop _sum; _sum = _P0 + _P1 ;&lt;BR /&gt;if _sum &amp;gt; 4.135903E-25 then do;&lt;BR /&gt; _P0 = _P0 / _sum;&lt;BR /&gt; _P1 = _P1 / _sum;&lt;BR /&gt;end;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;that's how I know that sas adjusted my posterior probabilities ,on the other hand when using decion trees, I dont get this code.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;I hope I explained myself better this time.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&lt;SPAN style="font-size: 10.5pt; font-family: 'Helvetica','sans-serif'; color: #333333;"&gt;Thanks you Jason so much for your help&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; margin-bottom: .0001pt; line-height: 15.0pt;"&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 12 May 2016 20:31:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Oversampling-and-Decision-tree-help-Plz/m-p/270197#M3991</guid>
      <dc:creator>nismail1976</dc:creator>
      <dc:date>2016-05-12T20:31:07Z</dc:date>
    </item>
    <item>
      <title>Re: Oversampling and Decision tree help Plz!</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Oversampling-and-Decision-tree-help-Plz/m-p/270379#M3992</link>
      <description>&lt;P&gt;Hi.&lt;/P&gt;
&lt;P&gt;No, SAS EM does not think you only have 86.72 events. &amp;nbsp;The display is adjusting the counts to reflect the adjusted priors. There might be a display setting that turns the adjustment off (I don't know). &amp;nbsp;In any case, the computational code knows about all the observations.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The adjustments can change the computations in three places: 1. Depending on user-properties, the split search will act as if there are 86.72 events or as if there are 1301 events. 2. Depending on user-properties, the tree can be retrospectively pruned based on the adjusted numbers or the unadjusted numbers. &amp;nbsp;3. The posterior probabilities will be adjusted. &amp;nbsp;I am guessing that the default behaviour for 1 and 2 is to not incorporate the adjustments. &amp;nbsp;Why: because typically the adjustment makes the event look more rare, and rare events typically fool trees into being too small.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As you point out, the data step code coming out of logistic includes code at the end to adjust the posterior probabilities. &amp;nbsp; The decision tree code does not output corresponding code because it outputs posterior probabilities that are already adjusted: &amp;nbsp;decision tree computes the adjustments before outputing the data step code.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Let us know if you still have questions.&lt;/P&gt;
&lt;P&gt;-Padraic&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 13 May 2016 15:34:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Oversampling-and-Decision-tree-help-Plz/m-p/270379#M3992</guid>
      <dc:creator>PadraicGNeville</dc:creator>
      <dc:date>2016-05-13T15:34:29Z</dc:date>
    </item>
    <item>
      <title>Re: Oversampling and Decision tree help Plz!</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Oversampling-and-Decision-tree-help-Plz/m-p/270401#M3993</link>
      <description>Thank you so much, you are a life savior</description>
      <pubDate>Fri, 13 May 2016 16:26:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Oversampling-and-Decision-tree-help-Plz/m-p/270401#M3993</guid>
      <dc:creator>nismail1976</dc:creator>
      <dc:date>2016-05-13T16:26:25Z</dc:date>
    </item>
    <item>
      <title>Re: Oversampling and Decision tree help Plz!</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Oversampling-and-Decision-tree-help-Plz/m-p/270403#M3994</link>
      <description>&lt;P&gt;do Decision trees compute the adjusted posteriors the same as logistic regression?&lt;/P&gt;
&lt;P&gt;Thanks again!&lt;/P&gt;</description>
      <pubDate>Fri, 13 May 2016 16:28:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Oversampling-and-Decision-tree-help-Plz/m-p/270403#M3994</guid>
      <dc:creator>nismail1976</dc:creator>
      <dc:date>2016-05-13T16:28:41Z</dc:date>
    </item>
    <item>
      <title>Re: Oversampling and Decision tree help Plz!</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Oversampling-and-Decision-tree-help-Plz/m-p/270413#M3995</link>
      <description>&lt;P&gt;Yes.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;P(class j) = scale * unadjusted_P( j) * prior(j) / proportion_in_data(j),&amp;nbsp;&lt;/P&gt;
&lt;P&gt;where the scale is chosen to get sum over j of P(j) = 1.&lt;/P&gt;</description>
      <pubDate>Fri, 13 May 2016 17:20:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Oversampling-and-Decision-tree-help-Plz/m-p/270413#M3995</guid>
      <dc:creator>PadraicGNeville</dc:creator>
      <dc:date>2016-05-13T17:20:17Z</dc:date>
    </item>
    <item>
      <title>Re: Oversampling and Decision tree help Plz!</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Oversampling-and-Decision-tree-help-Plz/m-p/270414#M3996</link>
      <description>&lt;P&gt;Thank you very much for your help, I really appreciate that!&lt;/P&gt;</description>
      <pubDate>Fri, 13 May 2016 17:22:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Oversampling-and-Decision-tree-help-Plz/m-p/270414#M3996</guid>
      <dc:creator>nismail1976</dc:creator>
      <dc:date>2016-05-13T17:22:31Z</dc:date>
    </item>
  </channel>
</rss>

