Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

A small percentage of response : Please Help Thank you

Reply
Frequent Contributor
Posts: 96

A small percentage of response : Please Help Thank you

Hi All, I would like to predict customers who have a propensity to buy tickets for basketball. My whole database is 1,800,000 and Only 23,509 have purchased tickets in the past. (1%) How shall I proceed? Your help would be much appreciated. Many thanks

Contributor jf
Contributor
Posts: 22

Re: A small percentage of response : Please Help Thank you

First of all, the methodology is logistic regression. but, there are two ways to do the prediction:

1. select whole database as your targeted customers. In this case,  since you only 1% response rate, the predicted probability won't be high (p_1 = 0.1 could be higher enough to say this guy will buy the ticket).

2. select part of your database as your targeted customers. In this case, you have to do pre data mining to reduce the data size and increase the response rate, then the predicted probability will increase too.

keep in mind that either way will NOT keep all potential buyers. There is no way to cover all buyers except you communicate the whole database.

N/A
Posts: 1

Re: A small percentage of response : Please Help Thank you

When rate (expectation) is so small modeling should be based on Poisson distribution, right?

Super User
Posts: 17,819

Re: A small percentage of response : Please Help Thank you

Never heard of that one, what is the reason?

Poisson is usually used for count data instead.

You can always oversample your data and then use bayesion priors to correct for the oversampling.

I'd make sure I used several different samples/simulations to get a better idea. This has its benefits and drawbacks, which can be found through some googling Smiley Happy.

You can use proc logistic, I think proc discrim is also an option.

Are you using JMP, EG, EM or Base SAS?

Contributor jf
Contributor
Posts: 22

Re: A small percentage of response : Please Help Thank you

proc discrim may do the job as logistic regression, but since LR is a well designed method for this case, the best and easiest way is LR.


In order to get better result, deep data mining and modeling skills are necessary.

Contributor jf
Contributor
Posts: 22

Re: A small percentage of response : Please Help Thank you

Poisson regression assumes the dependent variable follows Poisson distribution, which means Y has non-negative integer values. In this case, Y only has two values -- buy or not.

Also, 23,509 is not small amount.

Respected Advisor
Posts: 4,646

Re: A small percentage of response : Please Help Thank you

The limit distribution of the Binomial(p,N) when p is small and N is large is Poisson(pN). That's probably the origin of the confusion. Poisson regression could model the number of buyers per group of 10000 randomly selected persons, for instance.

hth

PG

PG
Ask a Question
Discussion stats
  • 6 replies
  • 486 views
  • 1 like
  • 5 in conversation