Hi Jason,
thank you for responding
I dont think I was clear from the begining. let me walk you through the steps I have taken.
I have an origninal dataset that I oversampled ,patitioned, placed a decisions node to adjust my posterior probabilities and lastly I used the decision tree to model it, (I have taken all these steps in SAS Enterprise Miner only, I havent used base sas) here is the view:
Now in the original dataset the event rate is 2% and the non-event rate is 98%, when I oversample the event rate becomes 30% and the non-event rate is 70% .
In data partion node my training dataset contains : 3035 non-event rate and 1301 event rate for a total of 4336 observations
In the decion node: I adjust the priors to 2% event and 98% non-event as shown below:
Now, onto the decision tree:
if I dont use the decision node to adjust the priors , I get these proportions (30% event, 70% non-event)
and counts (1301 events ,3035 non-event)at the root node:
which is correct given I didnt adjust for priors.
Now when I use the decision node to adjust the priors, I get these proportions (2% event,98%non-event)
and counts (86.72 event,4249 non-event) at the root node:
what I am trying to understand is that does sas enterprise miner think that I have only 86.72 events instead of 1301 or what is going on here? ( I am really confused about this) (I know the total number of observation is correct =4336)
Also when I build a logistic regression on the same oversampled dataset , I open the results and under view ->SAS Code , I get the updated probabilities as such:
*** Update Posterior Probabilities; _P0 = _P0 * 0.02 / 0.2997; _P1 = _P1 * 0.98 / 0.7003; drop _sum; _sum = _P0 + _P1 ; if _sum > 4.135903E-25 then do; _P0 = _P0 / _sum; _P1 = _P1 / _sum; end;
that's how I know that sas adjusted my posterior probabilities ,on the other hand when using decion trees, I dont get this code.
I hope I explained myself better this time.
Thanks you Jason so much for your help
... View more