Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- Urgent,how to adjust probabilities after oversampl...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-03-2014 04:43 AM

Hi,

I have oversampled my data to build a logistic regression model (50/50). The original response rate was for example 0.6%.

Is there any formula that will help me to adjust my scores? I have found this below online (attached PDF), but I am struggling to understand how it works...

Your help would be much appreciated.

Many Thanks

Accepted Solutions

Solution

07-06-2017
02:36 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-10-2014 08:51 AM

Hi Kanyange,

You can use this equation:

P_i** = ( P_i* x *R_0 *x *P_1) / *( (1-P_i*) (*R_1)(P_0) + *(P_i*)(*R_0)(P_1) )*

where:

P_i* is the unadjusted probability you get from your model

*R_0 *and R*_1 *are the sample proportions of 1 and 0 respectively

*P_0 *and P*_1 *are the original event and non_event rates (population rates)

P_i** is the true probability

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-03-2014 10:45 AM

Use priorevent=0.6 in score statement of proc logistic to get adjusted scores.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-06-2014 06:59 AM

Hi

Thank you for your response....

I have used Enterpise Miner to build the model. I would like to adjust the formula , outsise Enterprise Miner,,

I have found this formula in this forum, I think it's what I need : 1/(1+(1/population proportion)/(1/sample proportion-1)*(1/score-1));

Thank you

Alice

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-03-2014 11:21 AM

Hi Kanyange,

The way I do it in Enterprise Miner is I add a Decisions node after my 50/50 sample.

Click the Decisions ellipsis, then go to the Decisions tab, and Specify "Do you want to use decisions" as Yes.

Go to the Decision Weights and fill the matrix according to each level.

Remember that you are using the inverse of the prior probabilities. For your example, if the event A happens 0.6% or 0.006, then the inverse prior probability is 1/0.006=166.66. This means that event B has a prior probability of 0.994 and an inverse prior probability of 1.006

Then your table would look like:

Level | Decision1 | Decision2 |
---|---|---|

A | 166.66 | 0 |

B | 0 | 1.006 |

The decision node will take care of adjusting the weights of your model.

Your flow can look like: Data->Sample->Decisions->Partition->Regression

Is this the answer you were looking for?

I generally don't use logistic regression for rare events, does logistic work well with your data?

Thanks,

Miguel

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-27-2015 07:45 PM

Hi ,

I don't understand how to use Decisions node for adjusting probabilities. I have 0.7% response rate with 6,000 responses. I used Sample node and created a sample with 50/50 response rate. When I run the sample with Neural Network and Random Forest, I see the results with ROC but my score data was not adjusted to 0.7%. Then I redid everhthing with Decision node; Data-->Sample-->Decisions-->Data Portition (as described above with Decision Weights. In Prior Probabilities, Prior is not 50% it is already 0.7% . I still changed the adjusted prior to 0.7%. Count is 6000/6000). Now, all my results changed. All I see is the baseline with all the models overlapped in my ROC (Zero True Positive and Zero False Positive). I have the score data with adjusted probabilities but I am not sure if I am doing the right thing. I just don't understant how EM is using this process with the sample data. I appreciate any of your comments.

Thanks.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-09-2014 08:58 AM

This link might be useful:

22601 - How do I adjust for oversampling the event level in a binary logistic model?

Solution

07-06-2017
02:36 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-10-2014 08:51 AM

Hi Kanyange,

You can use this equation:

P_i** = ( P_i* x *R_0 *x *P_1) / *( (1-P_i*) (*R_1)(P_0) + *(P_i*)(*R_0)(P_1) )*

where:

P_i* is the unadjusted probability you get from your model

*R_0 *and R*_1 *are the sample proportions of 1 and 0 respectively

*P_0 *and P*_1 *are the original event and non_event rates (population rates)

P_i** is the true probability

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-16-2016 11:26 AM

Hi Aysin, you problem is similar to mine, did you get any soulution?As I understand you used two sample nodes to get the prior adjusted. Can I used two decision nodes to make it done? Looks like a solution to me. The advange with the decision node is that EM do all the adjustment by itself.