<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to predict PD with logistic regression? in SAS Forecasting and Econometrics</title>
    <link>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/604444#M3660</link>
    <description>&lt;P&gt;As Paige said your predict probability is too small 220/40000=0.004 .&lt;/P&gt;
&lt;P&gt;proc logistic have no effect for such small probability event.&lt;/P&gt;
&lt;P&gt;I advice to oversample to enhance this probability . like&amp;nbsp; :&amp;nbsp; good:bad =&amp;nbsp; 1000:220 . use this 1220 to build a model.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Or try option PEVENT= to adjust population 's probability of bad .&lt;/P&gt;
&lt;PRE class=" language-sas"&gt;&lt;CODE class="  language-sas"&gt;&lt;SPAN class="token procnames"&gt;Proc&lt;/SPAN&gt; &lt;SPAN class="token procnames"&gt;logistic&lt;/SPAN&gt; &lt;SPAN class="token procnames"&gt;data&lt;/SPAN&gt;&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;TRAINING_DATA&lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt;
&lt;SPAN class="token statement"&gt;class&lt;/SPAN&gt; CATEGORY_VAR1&lt;SPAN class="token punctuation"&gt;(&lt;/SPAN&gt;PARAM&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;REF REF&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;&lt;SPAN class="token string"&gt;'FIRST'&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;)&lt;/SPAN&gt; CATEGORY_CAR2&lt;SPAN class="token punctuation"&gt;(&lt;/SPAN&gt;PARAM&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;ref ref&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;&lt;SPAN class="token string"&gt;'FIRST'&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt; 
&lt;SPAN class="token procnames"&gt;model&lt;/SPAN&gt; default&lt;SPAN class="token punctuation"&gt;(&lt;/SPAN&gt;event&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;&lt;SPAN class="token string"&gt;'1'&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt; VAR1&lt;SPAN class="token operator"&gt;-&lt;/SPAN&gt;&lt;SPAN class="token operator"&gt;-&lt;/SPAN&gt;VAR50&lt;SPAN class="token operator"&gt;/&lt;/SPAN&gt;selection&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;stepwise &lt;STRONG&gt; pevent=0.004&lt;/STRONG&gt; &lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt;
&lt;SPAN class="token procnames"&gt;run&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt; &lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 15 Nov 2019 12:29:34 GMT</pubDate>
    <dc:creator>Ksharp</dc:creator>
    <dc:date>2019-11-15T12:29:34Z</dc:date>
    <item>
      <title>How to predict PD with logistic regression?</title>
      <link>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/604349#M3657</link>
      <description>&lt;P&gt;Hi guys,&lt;/P&gt;&lt;P&gt;I have tried to find another topic that could help me out, but still no succes to do that.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Let me start by telling about my dataset:&lt;BR /&gt;I've an application dataset based from the real world. It is a collection based on people who have tried to lend some money, I have information about them as income, age, children, married, LTV, ... etc. almost 200 variables, my response variable is their default status. Whether they have defaulted in the first year or not.&amp;nbsp;&lt;/P&gt;&lt;P&gt;My dataset includes 40.000 observations and 220 defaults (default value=1).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have tried to do clear the dataset by missing&amp;gt;5% =&amp;gt; removing the variable, missing&amp;lt;5% =&amp;gt; removing the rows.&lt;/P&gt;&lt;P&gt;Now I am down to approx. 50 variables, furthermore I divided the original dataset to a training- and test-dataset (70% training, 30% test).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;To investigate which variables I should work further with I do the following:&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;Proc logistic data=TRAINING_DATA;
class CATEGORY_VAR1(PARAM=REF REF='FIRST') CATEGORY_CAR2(PARAM=ref ref='FIRST'); 
model default(event='1')= VAR1--VAR50/selection=stepwise;
run;&amp;nbsp;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This gives 6 significant variables, an c-value of: 0.701, Somers' &lt;span class="lia-unicode-emoji" title=":anguished_face:"&gt;😧&lt;/span&gt; 0.42, AIC: 2340,40.&lt;/P&gt;&lt;P&gt;I'm not very happy of the c-value, but I can live with it.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;My next point is to try and calculate the probability of default given these 6 variables. By using the following:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;PROC LOGISTIC DATA = TRAINING_DATA descending;
class CATEGORY_VAR1(PARAM=REF REF='FIRST');
MODEL default(event='1') = VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 / link = probit ctable
pprob=(0.05 to 1 by 0.05);
output out=PREDICTED_PROB predicted=PD_probit;
RUN;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;(also tried with link=logit).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;When I then test these predictions to see how many of them actually are correct hits, by the following:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data CHECK;
set&amp;nbsp;PREDICTED_PROB;
where&amp;nbsp;PD_probit &amp;gt; 0.5 and default=1;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I got 0 hits! These indicates that my model cannot predict anything...&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What am I doing wrong? How should I approach it?&amp;nbsp;&lt;BR /&gt;My wish would be: Check how many the model gave me correct on, in percent (hopefully a lot), and then use the model to try it of on the test-dataset.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Sorry if the post is to long, let me know if there is something I should add/remove. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Best regards.&lt;/P&gt;</description>
      <pubDate>Sun, 17 Nov 2019 16:20:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/604349#M3657</guid>
      <dc:creator>Norbit</dc:creator>
      <dc:date>2019-11-17T16:20:10Z</dc:date>
    </item>
    <item>
      <title>Re: How to predict PD with logistic regression?</title>
      <link>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/604361#M3658</link>
      <description>&lt;P&gt;You have about 0.5% of your data gets a Y=1. It's not surprising that this particular logistic regression doesn't predict any observations will default. The huge mass of data that is driving the regression did not default. Try something called oversampling. Go to your favorite internet search engine and type in&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;logistic regression oversampling in sas&lt;/P&gt;</description>
      <pubDate>Fri, 15 Nov 2019 00:51:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/604361#M3658</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2019-11-15T00:51:25Z</dc:date>
    </item>
    <item>
      <title>Re: How to predict PD with logistic regression?</title>
      <link>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/604362#M3659</link>
      <description>&lt;P&gt;If you look at the number of defaults in your data, the chances of randomly selecting an account that will default is 0.55%. I suggest you compare that with the average PD of your defaulting accounts. If the average PD is significantly greater than 0.55% then I'd suggest your model has some predictability as it is doing better than a random selection.&lt;/P&gt;</description>
      <pubDate>Fri, 15 Nov 2019 00:55:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/604362#M3659</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2019-11-15T00:55:08Z</dc:date>
    </item>
    <item>
      <title>Re: How to predict PD with logistic regression?</title>
      <link>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/604444#M3660</link>
      <description>&lt;P&gt;As Paige said your predict probability is too small 220/40000=0.004 .&lt;/P&gt;
&lt;P&gt;proc logistic have no effect for such small probability event.&lt;/P&gt;
&lt;P&gt;I advice to oversample to enhance this probability . like&amp;nbsp; :&amp;nbsp; good:bad =&amp;nbsp; 1000:220 . use this 1220 to build a model.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Or try option PEVENT= to adjust population 's probability of bad .&lt;/P&gt;
&lt;PRE class=" language-sas"&gt;&lt;CODE class="  language-sas"&gt;&lt;SPAN class="token procnames"&gt;Proc&lt;/SPAN&gt; &lt;SPAN class="token procnames"&gt;logistic&lt;/SPAN&gt; &lt;SPAN class="token procnames"&gt;data&lt;/SPAN&gt;&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;TRAINING_DATA&lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt;
&lt;SPAN class="token statement"&gt;class&lt;/SPAN&gt; CATEGORY_VAR1&lt;SPAN class="token punctuation"&gt;(&lt;/SPAN&gt;PARAM&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;REF REF&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;&lt;SPAN class="token string"&gt;'FIRST'&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;)&lt;/SPAN&gt; CATEGORY_CAR2&lt;SPAN class="token punctuation"&gt;(&lt;/SPAN&gt;PARAM&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;ref ref&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;&lt;SPAN class="token string"&gt;'FIRST'&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt; 
&lt;SPAN class="token procnames"&gt;model&lt;/SPAN&gt; default&lt;SPAN class="token punctuation"&gt;(&lt;/SPAN&gt;event&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;&lt;SPAN class="token string"&gt;'1'&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt; VAR1&lt;SPAN class="token operator"&gt;-&lt;/SPAN&gt;&lt;SPAN class="token operator"&gt;-&lt;/SPAN&gt;VAR50&lt;SPAN class="token operator"&gt;/&lt;/SPAN&gt;selection&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;stepwise &lt;STRONG&gt; pevent=0.004&lt;/STRONG&gt; &lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt;
&lt;SPAN class="token procnames"&gt;run&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt; &lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 15 Nov 2019 12:29:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/604444#M3660</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2019-11-15T12:29:34Z</dc:date>
    </item>
    <item>
      <title>Re: How to predict PD with logistic regression?</title>
      <link>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/604700#M3661</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/18408"&gt;@Ksharp&lt;/a&gt;&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13976"&gt;@SASKiwi&lt;/a&gt;&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/10892"&gt;@PaigeMiller&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you guys for the suggestions. I will try them out!&lt;/P&gt;&lt;P&gt;I will let you know if it went well/bad. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/18408"&gt;@Ksharp&lt;/a&gt;&amp;nbsp;: Could you please&amp;nbsp;elaborate a bit more, in a theoretically way if you can, about the "pevent="-statement?&amp;nbsp;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Again, thank you for your time! &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 16 Nov 2019 14:27:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/604700#M3661</guid>
      <dc:creator>Norbit</dc:creator>
      <dc:date>2019-11-16T14:27:54Z</dc:date>
    </item>
    <item>
      <title>Re: How to predict PD with logistic regression?</title>
      <link>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/604779#M3662</link>
      <description>&lt;P&gt;Sorry. I bad. since your data is all data, NOT sample from population. therefore, you don't need PEVENT= .&lt;/P&gt;
&lt;P&gt;Still I suggest you to over-sample your data to enhance ratio of bad:good . and also use PEVENT= to adjust probability of model event .&lt;/P&gt;</description>
      <pubDate>Sun, 17 Nov 2019 10:36:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/604779#M3662</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2019-11-17T10:36:00Z</dc:date>
    </item>
    <item>
      <title>Re: How to predict PD with logistic regression?</title>
      <link>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/604845#M3663</link>
      <description>&lt;P&gt;When I oversample, should it be done before or after splitting the data into training and validation?&amp;nbsp;I am still concerned by doing this kind of trick.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Another question; am I totally wrong by saying that I want a model which can predict a PD where I can conclude: "If pd &amp;gt;0.5 (50%) then default=1." Which leads me back to the topic, I want to be able to find out&amp;nbsp;&lt;SPAN&gt;which applicants can be separated as possible defaulters from the time of application.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 17 Nov 2019 21:52:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/604845#M3663</guid>
      <dc:creator>Norbit</dc:creator>
      <dc:date>2019-11-17T21:52:57Z</dc:date>
    </item>
    <item>
      <title>Re: How to predict PD with logistic regression?</title>
      <link>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/604852#M3664</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/295941"&gt;@Norbit&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;When I oversample, should it be done before or after splitting the data into training and validation?&amp;nbsp;I am still concerned by doing this kind of trick.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Another question; am I totally wrong by saying that I want a model which can predict a PD where I can conclude: "If pd &amp;gt;0.5 (50%) then default=1." Which leads me back to the topic, I want to be able to find out&amp;nbsp;&lt;SPAN&gt;which applicants can be separated as possible defaulters from the time of application.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;You oversample, then randomly split the resulting data into training and validation data sets.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regarding your 2nd paragraph, you are not wrong.&lt;/P&gt;</description>
      <pubDate>Sun, 17 Nov 2019 22:58:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/604852#M3664</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2019-11-17T22:58:25Z</dc:date>
    </item>
    <item>
      <title>Re: How to predict PD with logistic regression?</title>
      <link>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/605587#M3668</link>
      <description>&lt;P&gt;And again... As you guys said I should try with oversampling, I started reading more and more about it. It seems to be the "correct" way to handle the data, BUT there is another question in my head after reading this quote:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;"&lt;STRONG&gt;Oversampling&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;the minority class using SMOTE or other algorithms has the disadvantage that it suffers from over-fitting. That is, you may perform well on the training set but on the test set your performance may suffer badly. Similarly&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;under-sampling&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;the majority class may under-fit your algorithm if the minority class is very small.&lt;/SPAN&gt;"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is there any way where I can use some Penelized Logistic regression? Where the regression knows that I have oversampled the minority class of the data.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Best regards&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 20 Nov 2019 01:08:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/605587#M3668</guid>
      <dc:creator>Norbit</dc:creator>
      <dc:date>2019-11-20T01:08:30Z</dc:date>
    </item>
    <item>
      <title>Re: How to predict PD with logistic regression?</title>
      <link>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/605596#M3669</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/295941"&gt;@Norbit&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;And again... As you guys said I should try with oversampling, I started reading more and more about it. It seems to be the "correct" way to handle the data, BUT there is another question in my head after reading this quote:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;"&lt;STRONG&gt;Oversampling&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;the minority class using SMOTE or other algorithms has the disadvantage that it suffers from over-fitting. That is, you may perform well on the training set but on the test set your performance may suffer badly. Similarly&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;under-sampling&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;the majority class may under-fit your algorithm if the minority class is very small.&lt;/SPAN&gt;"&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Is there any way where I can use some Penelized Logistic regression? Where the regression knows that I have oversampled the minority class of the data.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Did you even try oversampling to compare the training and validation data set on your data?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 20 Nov 2019 02:06:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/605596#M3669</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2019-11-20T02:06:33Z</dc:date>
    </item>
    <item>
      <title>Re: How to predict PD with logistic regression?</title>
      <link>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/605677#M3671</link>
      <description>&lt;P&gt;Yes. That is why you need option PEVENT=0.005 to adjust predict probability(0.005 is the event probability in population) .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;"&lt;SPAN&gt;Is there any way where I can use some Penelized Logistic regression?&amp;nbsp;"&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;PROC LOGISTIC has such method (option : firth):&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;model good_bad(event='good')= &amp;amp;varlist /outroc=roc lackfit scale=none aggregate rsquare&lt;FONT color="#FF0000"&gt;&lt;STRONG&gt; firth&lt;/STRONG&gt;&lt;/FONT&gt; corrb ;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Or if your data size is small ,you can also try exact logistic regresssion via EXACT statement .&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 20 Nov 2019 11:21:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/605677#M3671</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2019-11-20T11:21:23Z</dc:date>
    </item>
    <item>
      <title>Re: How to predict PD with logistic regression?</title>
      <link>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/605949#M3673</link>
      <description>&lt;P&gt;Hi again,&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I tried to the over oversample (or call it under-sample) where I used the following code:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data OVERSAMPLING; 
set TRAINING_DATASET;
   if y=1 then output;
   if y=0 then do;
     if ranuni(10000)&amp;lt;1/20 then output;
   end;
run;

proc freq data=OVERSAMPLING;
tables y;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;actually followed this&amp;nbsp;&lt;A href="https://stackoverflow.com/questions/38196015/offsetting-oversampling-in-sas-for-rare-events-in-logistic-regression" target="_self"&gt;LINK&lt;/A&gt;&amp;nbsp;and the default rate went up to 10.5% where the non-default fall to 89.5%. Now the frequencies are (approx.) 0's: 1904, and 1's: 200. I also tried to put the offset calculation, it gave me almost the same intercept as before oversampling.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;When counting defaults for&amp;nbsp; PD &amp;gt;0.025 (2.5%) I had hits on 9,42%. Now I hit on 96,85% it is only 3.15% which has a PD under 2.5%.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;.. but!&amp;nbsp;&lt;/P&gt;&lt;P&gt;With oversample I do have a high percentages of sensitivity (97.4), low specificity (10.7), False positive (88.9), False negative (2.7), correct (19.6). How should I interpret these now?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 20 Nov 2019 21:16:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/605949#M3673</guid>
      <dc:creator>Norbit</dc:creator>
      <dc:date>2019-11-20T21:16:33Z</dc:date>
    </item>
    <item>
      <title>Re: How to predict PD with logistic regression?</title>
      <link>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/605958#M3674</link>
      <description>&lt;P&gt;I am still confused on how to use it correcly?&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;As my reply obove, I tried to oversample my data.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But how do I use this on the validation data and predict the correct defaults..&amp;nbsp;&lt;BR /&gt;I still don't have a model where I can say: "If PD &amp;gt;0.5 then this applicant would be counted as going default in one year".&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 20 Nov 2019 22:03:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/605958#M3674</guid>
      <dc:creator>Norbit</dc:creator>
      <dc:date>2019-11-20T22:03:20Z</dc:date>
    </item>
    <item>
      <title>Re: How to predict PD with logistic regression?</title>
      <link>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/606078#M3675</link>
      <description>&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN&gt;I still don't have a model where I can say: "If PD &amp;gt;0.5 then this applicant would be counted as going default in one year".&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Use options&lt;/SPAN&gt;&amp;nbsp; PEVENT=&lt;SPAN&gt;&amp;nbsp;"event prob in population"&lt;/SPAN&gt;&amp;nbsp; and FIRTH in MODEL statement, and score test data, not validate data(which avoid overfit problem).&lt;/P&gt;
&lt;P&gt;There are many score ways,like CODE statement,SCORE statement, PROC PLM. And calling&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13633"&gt;@StatDave&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://blogs.sas.com/content/iml/2019/02/11/proc-plm-regression-models-sas.html" target="_blank" rel="noopener"&gt;https://blogs.sas.com/content/iml/2019/02/11/proc-plm-regression-models-sas.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://blogs.sas.com/content/iml/2019/11/20/predicted-values-generalized-linear-models-ilink-sas.html" target="_blank" rel="noopener"&gt;https://blogs.sas.com/content/iml/2019/11/20/predicted-values-generalized-linear-models-ilink-sas.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 21 Nov 2019 12:08:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/606078#M3675</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2019-11-21T12:08:36Z</dc:date>
    </item>
    <item>
      <title>Re: How to predict PD with logistic regression?</title>
      <link>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/606164#M3676</link>
      <description>&lt;P&gt;Going back to your original post, use the PREDPROBS=INDIVIDUAL option in the OUTPUT statement rather than the PREDICTED= option. The resulting data set will contain a variable containing the predicted response categorical, F_default. These predicted response categories are determined using a maximum predicted probability rule, meaning that whichever predicted probability is larger - event or nonevent - determines the predicted response category. With such a rare event, it is unlikely that any predicted event probability will exceed 0.5, but some will likely exceed the observed event rate that you say is 220/40000=.0055. Oversampling is probably not necessary. See the description of the PREDPROBS= option and the description of the resulting data set in the "Input and output data sets" Details section of the PROC LOGISTIC documentation.&lt;/P&gt;</description>
      <pubDate>Thu, 21 Nov 2019 15:02:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/How-to-predict-PD-with-logistic-regression/m-p/606164#M3676</guid>
      <dc:creator>StatDave</dc:creator>
      <dc:date>2019-11-21T15:02:32Z</dc:date>
    </item>
  </channel>
</rss>

