turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- A small percentage of response : Please Help Thank...

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-25-2013 06:55 AM

Hi All, I would like to predict customers who have a propensity to buy tickets for basketball. My whole database is 1,800,000 and Only 23,509 have purchased tickets in the past. (1%) How shall I proceed? Your help would be much appreciated. Many thanks

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Question

04-25-2013 11:05 AM

First of all, the methodology is logistic regression. but, there are two ways to do the prediction:

1. select whole database as your targeted customers. In this case, since you only 1% response rate, the predicted probability won't be high (p_1 = 0.1 could be higher enough to say this guy will buy the ticket).

2. select part of your database as your targeted customers. In this case, you have to do pre data mining to reduce the data size and increase the response rate, then the predicted probability will increase too.

keep in mind that either way will NOT keep all potential buyers. There is no way to cover all buyers except you communicate the whole database.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Question

04-25-2013 11:33 AM

When rate (expectation) is so small modeling should be based on Poisson distribution, right?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to marxyst

04-25-2013 11:40 AM

Never heard of that one, what is the reason?

Poisson is usually used for count data instead.

You can always oversample your data and then use bayesion priors to correct for the oversampling.

I'd make sure I used several different samples/simulations to get a better idea. This has its benefits and drawbacks, which can be found through some googling .

You can use proc logistic, I think proc discrim is also an option.

Are you using JMP, EG, EM or Base SAS?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

04-25-2013 12:31 PM

proc discrim may do the job as logistic regression, but since LR is a well designed method for this case, the best and easiest way is LR.

In order to get better result, deep data mining and modeling skills are necessary.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to marxyst

04-25-2013 12:18 PM

Poisson regression assumes the dependent variable follows Poisson distribution, which means Y has non-negative integer values. In this case, Y only has two values -- buy or not.

Also, 23,509 is not small amount.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to marxyst

04-25-2013 12:36 PM

The limit distribution of the Binomial(p,N) when p is small and N is large is Poisson(pN). That's probably the origin of the confusion. Poisson regression could model the number of buyers per group of 10000 randomly selected persons, for instance.

hth

PG

PG