<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Why do you require adjust probability after over sampling? in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Why-do-you-require-adjust-probability-after-over-sampling/m-p/753494#M8791</link>
    <description>&lt;P&gt;Hi, I am having the same scenario..i have oversampled my data and now im stuck on how to calibrate the predicted probabilities..kindly assist&lt;/P&gt;</description>
    <pubDate>Mon, 12 Jul 2021 13:10:59 GMT</pubDate>
    <dc:creator>Solly7</dc:creator>
    <dc:date>2021-07-12T13:10:59Z</dc:date>
    <item>
      <title>Why do you require adjust probability after over sampling?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Why-do-you-require-adjust-probability-after-over-sampling/m-p/752224#M8788</link>
      <description>&lt;P&gt;Many articles are saying you need adjust probability after over sampling for rare target. I am really confused here. I thought the purpose of oversampling is you believe you target subgroup is underrepresented so you just do some copy and paste work for rare target group. So you might end up with a higher probability when predicting for a given X variable. But if you are required to adjust your probability using the oversampled data by using the original odds and oversampled odds ratio, isn’t what you are doing is to revert everything back to the original status without oversampling? So why?&lt;/P&gt;</description>
      <pubDate>Tue, 06 Jul 2021 07:53:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Why-do-you-require-adjust-probability-after-over-sampling/m-p/752224#M8788</guid>
      <dc:creator>gyambqt</dc:creator>
      <dc:date>2021-07-06T07:53:23Z</dc:date>
    </item>
    <item>
      <title>Re: Why do you require adjust probability after over sampling?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Why-do-you-require-adjust-probability-after-over-sampling/m-p/753347#M8790</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Hello,&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;gt; But if you are required to adjust your probability using the oversampled data by using the original odds and oversampled odds ratio, isn’t what you are doing is to revert everything back to the original status without oversampling?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;No, because the predictive model has already been built then and the model building process has profited from oversampling the very rare target.&amp;nbsp;I say deliberately "very rare" target, because many analysts are too quick to oversample I find. Oversampling should&amp;nbsp;be used primarily to avoid too much input data and for the smoothness and speed of modelling as most techniques can perfectly deal with a rare outcome category.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Anyway you have to correct for the real priors (that seems logic to me) otherwise your predicted probabilities are not honest. Not correcting for the correct priors will still give you the correct ranking but your predicted probabilities for the rare category are artificially high. In general, you do not want the latter and you want the probabilities to be honest (such that they can really be interpreted as probabilities / likelihoods).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But,&amp;nbsp;&lt;SPAN&gt;in order to reassure you,&lt;/SPAN&gt; correcting for the real priors is not a complex analysis task. It's just an option in a procedure or checking a box in Enterprise Miner / Model Studio VDMML.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Kind regards,&lt;/P&gt;
&lt;P&gt;Koen&lt;/P&gt;</description>
      <pubDate>Sat, 10 Jul 2021 15:08:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Why-do-you-require-adjust-probability-after-over-sampling/m-p/753347#M8790</guid>
      <dc:creator>sbxkoenk</dc:creator>
      <dc:date>2021-07-10T15:08:11Z</dc:date>
    </item>
    <item>
      <title>Re: Why do you require adjust probability after over sampling?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Why-do-you-require-adjust-probability-after-over-sampling/m-p/753494#M8791</link>
      <description>&lt;P&gt;Hi, I am having the same scenario..i have oversampled my data and now im stuck on how to calibrate the predicted probabilities..kindly assist&lt;/P&gt;</description>
      <pubDate>Mon, 12 Jul 2021 13:10:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Why-do-you-require-adjust-probability-after-over-sampling/m-p/753494#M8791</guid>
      <dc:creator>Solly7</dc:creator>
      <dc:date>2021-07-12T13:10:59Z</dc:date>
    </item>
    <item>
      <title>Re: Why do you require adjust probability after over sampling?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Why-do-you-require-adjust-probability-after-over-sampling/m-p/753495#M8792</link>
      <description>&lt;P&gt;How to adjust probability after oversampling&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://support.sas.com/kb/22/601.html" target="_blank"&gt;22601 - Adjusting for oversampling the event level in a binary logistic model (sas.com)&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 12 Jul 2021 13:15:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Why-do-you-require-adjust-probability-after-over-sampling/m-p/753495#M8792</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2021-07-12T13:15:48Z</dc:date>
    </item>
    <item>
      <title>Re: Why do you require adjust probability after over sampling?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Why-do-you-require-adjust-probability-after-over-sampling/m-p/754237#M8796</link>
      <description>Hi let‘s consider a extreme case. &lt;BR /&gt;Assume you have 1001 records， within 1001 rows 1000 of them are nonevent and 1 is event. So the ratio is 1000:1. You build a logistic regression model based on 1001 rows. You get logit(p/（1-p))=1+5*age. You use this model to score training data. For the event you get 0.1 probability &lt;BR /&gt;Now you use oversampling to boost rare target so you get 1000:1000 ratio for event vs nonevent. You build a model logit(p/（1-p))=1+200*age and you use this to score training data and for each event record you get probability let’s say 0.9.  So this 0.9 is unadjusted probability and need to be adjusted according to original data proportion. So you do some math after adjusting probability you may get 0.1 probability as adjusted probability. So what is the point doing the whole thing? Please ignore the calculations above and only used for demonstrating.</description>
      <pubDate>Thu, 15 Jul 2021 05:53:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Why-do-you-require-adjust-probability-after-over-sampling/m-p/754237#M8796</guid>
      <dc:creator>gyambqt</dc:creator>
      <dc:date>2021-07-15T05:53:07Z</dc:date>
    </item>
    <item>
      <title>Re: Why do you require adjust probability after over sampling?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Why-do-you-require-adjust-probability-after-over-sampling/m-p/754260#M8797</link>
      <description>&lt;P&gt;Your model has a better chance of predicting the bad if there are more bad in the data set used to create the model.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You will get a different model fit, and different predicted probabilities, if your data set is 1000:1 versus 50% good and 50% bad.&lt;/P&gt;</description>
      <pubDate>Thu, 15 Jul 2021 12:34:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Why-do-you-require-adjust-probability-after-over-sampling/m-p/754260#M8797</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2021-07-15T12:34:56Z</dc:date>
    </item>
  </channel>
</rss>

