10-24-2012 04:28 PM
I need an understanding of the usage of Weight statement.
Background - I had to build a logistic regression response model on rare event data with event rate as low as 0.008%. I increased the event rate to 3% by creating two separate datasets through oversampling (increasing the number of events) and undersampling(decreasing the number of non-events). It is believed that the model equation obtained after such sampling has a change only in the intercept term, however, the coefficients remain the same. To make an adjustment, weights are used. Suppose, I would have sampled (1/10)th of the non event group then I would have taken the weight for the event group as
1 and that for the non event group as 10. To have a correction in the intercept term, I use the weight statement in proc logistic. But the problem starts from there. The concordance falls from 70% in the non-adjusted model to 20% in the weight adjusted model (when I use the weight statement). I didn't use the weight statement in any proc during my analysis before building the model but I used only at the time of getting the final model output to get the correct intercept. Is it because I never used the weight statement in bivariate profiling (proc freq, proc summary, proc univariate, etc) ever and created variable transformations, made indicator variables, etc depending upon my bivariate analysis, I am getting a model with abysmally low concordance with the weight statement?
If yes, then my question is - if I had used the weight statement in my initial steps of analysis then each step would've tried replicating the event rate of the original dataset (which was 0.008%) and if that is what I wanted to do then there was no need of oversampling or undersampling. Or, am I missing something here?
Please help me solve this conundrum..I need to know how exactly the weight statement works? and why is my concordance falling to such a low value?