BookmarkSubscribeRSS Feed
Question
Fluorite | Level 6

Hi All, I would like to predict customers who have a propensity to buy tickets for basketball. My whole database is 1,800,000 and Only 23,509 have purchased tickets in the past. (1%) How shall I proceed? Your help would be much appreciated. Many thanks

6 REPLIES 6
jf
Fluorite | Level 6 jf
Fluorite | Level 6

First of all, the methodology is logistic regression. but, there are two ways to do the prediction:

1. select whole database as your targeted customers. In this case,  since you only 1% response rate, the predicted probability won't be high (p_1 = 0.1 could be higher enough to say this guy will buy the ticket).

2. select part of your database as your targeted customers. In this case, you have to do pre data mining to reduce the data size and increase the response rate, then the predicted probability will increase too.

keep in mind that either way will NOT keep all potential buyers. There is no way to cover all buyers except you communicate the whole database.

marxyst
Calcite | Level 5

When rate (expectation) is so small modeling should be based on Poisson distribution, right?

Reeza
Super User

Never heard of that one, what is the reason?

Poisson is usually used for count data instead.

You can always oversample your data and then use bayesion priors to correct for the oversampling.

I'd make sure I used several different samples/simulations to get a better idea. This has its benefits and drawbacks, which can be found through some googling Smiley Happy.

You can use proc logistic, I think proc discrim is also an option.

Are you using JMP, EG, EM or Base SAS?

jf
Fluorite | Level 6 jf
Fluorite | Level 6

proc discrim may do the job as logistic regression, but since LR is a well designed method for this case, the best and easiest way is LR.


In order to get better result, deep data mining and modeling skills are necessary.

jf
Fluorite | Level 6 jf
Fluorite | Level 6

Poisson regression assumes the dependent variable follows Poisson distribution, which means Y has non-negative integer values. In this case, Y only has two values -- buy or not.

Also, 23,509 is not small amount.

PGStats
Opal | Level 21

The limit distribution of the Binomial(p,N) when p is small and N is large is Poisson(pN). That's probably the origin of the confusion. Poisson regression could model the number of buyers per group of 10000 randomly selected persons, for instance.

hth

PG

PG

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1872 views
  • 1 like
  • 5 in conversation