07-20-2015 05:52 AM
Actually, I work on modelling an imbalanced dataset. I try to improve my results with the cutoff node in SAS enterprise miner. For instance, I set the value (0.05) and I connect sas code node to cutoff node with the following instruction (http://support.sas.com/resources/papers/proceedings12/127-2012.pdf):
I_top_order = EM_CUTOFF;
After sas code node,I connect an other model comparison node and finally I connect a score node in order to obtain sas code.
This sas code seems to be ok, it contains my new instruction, but just after it create 10 segments, and values are exactly the same than modelling without cutoff. See this extract of sas code :
(P_top_order1 ge 0.01470274377704) then do;
(P_top_order1 ge 0.00634736646348) then do;
b_top_order = 2;
(P_top_order1 ge 0.00470666288792) then do;
b_top_order = 3;
How can I do to change these values regarding my cutoff value? Maybe can you tell me how can I calculate these different values?
Thanks for your assistance.
07-27-2015 01:51 PM
The workaround that Yogen suggested in the paper you cited will only work if you have a binary target and you are predicting the event "1".
How Yogen's code works
I took a look at the score code produced by the Cutoff node. It creates the EM_Cutoff variable as a flag of whether the probability of event is higher than a certain cutoff. You can find that cutoff through specific methods, or specify your own in the Cutoff node properties.
For example, I created a binary model to predict the event "good" on the binary target "good_bad". Then I used the Cutoff node with mehtod "Event Precision Equal Recall", which found that 0.75 was a better cutoff. The Cutoff node created the flag EM_Cutoff based on the new cutoff as below:
IF P_good_badgood > 0.75 THEN EM_CUTOFF = 1;
ELSE EM_CUTOFF = 0;
Yogen's workaround is not going to work for my example because it does not make sense to set the "into" variable I_good_bad to be equal to EM_Cutoff which is a binary flag.
How to fix this
Instead of doing I_good_bad=EM_Cutoff, I should do as below in my SAS Code node.
if EM_cutoff = 1 then I_good_bad="good";
Now this works!
This fixed this workaround for my example. I was not exactly sure what you were doing and what b_top_order variables were for your example. But I think this fixed workaround should help you. If not, please explain a bit more and someone from this community or myself can help you fix this code some more.