Hello All, I am working on a data set that has 800,000 records, of which only 80 are Events (Target = 1) and all the remaining records are non-events. I do not want to do oversampling (Taking all the events observations and matching with equal number of non-events, as I will just have 160 records). so, I decided to do weighting. That is weighed up all the events and weighed down all the non-events to make the proportion of events to non-events 50:50, using a weight variable called good_bad_wgt which I used in my logistic regression. proc logistic data = dummies outset = est; model Target (event = '1') = %goodvariables/selection = stepwise slstay = 0.05 slentry = 0.05; weight = good_bad_wgt; run; What I want to know is: 1. Are the resulting probabilities over-estimated? 2. If so, How do I adjust the probabilities. If someone can help me better understand how the weight statement in Proc logistic works, I would really appreciate it. Thanks.
... View more