08-25-2015 11:17 AM
I cannot be specific about the data because it is proprietary.
However, suppose we have data for a large number of customers for about 150 consecutive days. On each day, each customer can decide to renew or not. Some people never renew, some people do. Lets call that variable RENEW. So, renew has 150 values. Either 150 0's or 149 0's and a single 1. There is also cumulative renew, which is similar but, once a person renews, it stays at 1. So, each person could has 150 values again, any number of 0's and any number of 1s
I have been asked to model cumulative renew with logistic regression on a number of other variables about the client.
I pointed out that this violates independence of errors and said I thought we should measure ever renew, which would be a single value for each customer. But my client insists on the former. I also suggested survival analysis.
I don't think even a multilevel model (with GLIMMIX) will work here, since the repeat pattern is so strong.
Am I missing something?