BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ajosh
Calcite | Level 5

Hi All,

This is in continuation with my earlier post on whether the original priors (which is not 50:50, but 2:98 for Y and N) have to be used when I am modeling on a balanced dataset in SAS EM. Thanks for your response. I could see that the original priors will have to be used to remove the "bias" introduced due to oversampling the minority instances. Later on, I can use these results on the same dataset to check how many instances who were originally tagged as "Y" have been correctly classified.

Some questions that I am not able to understand in this analysis are:

1) There is another strategy (to handle class imbalance, called cost sensitive learning) of using the profit/loss matrix in the decision processing option for the input dataset. I specify the profit/loss/weights for each decision of TP, TN, FP, FN in this decision processing node. No oversampling is used. Decision trees are supposed to consider that decision which yields the max profit by considering profit/weights as well as proportions of Y and N at each node. But it seems that here again SAS EM has chosen 0.5 as the decision threshold, because of which only 75 out of 1840 minority classes have been chosen as TP.

2) In the exported dataset of the above decision tree model, I did a comparison of certain flags (From_Target_Flag, Into_Target_Flag) and (Actual Target Flag and Decision_Target_Flag) and found the following results:

a) The above mentioned count of TP = 75 occurs for only 2 nodes of the decision tree where the following condition is satisfied: From_Target_Flag = Y and Into_Target_Flag = Y. These nodes have predicted probability > 0.5. (one has 0.53 and other has 0.9).

b) The count of TP (could be proxy) = 1271 occurs for around 7 nodes of the decision tree where the following condition is satisfied: Actual Target Flag = Y and Decision_Target_Flag = Y. These nodes have predicted probability which varies from 0.14 to 0.9.

So to summarize, which is the correct approach for finding the count of TP from question 2) and if necessary, how to implement the cut off node for selecting a smaller decision threshold from question 1).

Also attached is the table from cut off node for default value of 0.5 used post running the decision tree.

Thanks and Regards,

Aditya.

Cut OffCumulative Expected ProfitCount of TPCount of FPCount of TNCount of FNCount of Predicted PositivesCount of Predicted NegativesCount of FP and FNCount of TP and TNOverall Classification RateChange Count TPChange Count FPTP RateTN RateFP RateEvent Precision RateNon Event Precision RateOverall Precision RateDataRole
0.99-4.929543480063042128606432812866304298.0008710001000NaN98.00087054NaNTRAIN
0.99-7.89698399800270205520275725522702097.9979690001000NaN97.99796895NaNVALIDATE
0.98-9.8590869610063042128606432812866304298.0008710001000NaN98.00087054NaNTRAIN
0.98-15.79396800270205520275725522702097.9979690001000NaN97.99796895NaNVALIDATE
0.97-14.788630440063042128606432812866304298.0008710001000NaN98.00087054NaNTRAIN
0.97-23.6909519900270205520275725522702097.9979690001000NaN97.99796895NaNVALIDATE
0.96-19.718173920063042128606432812866304298.0008710001000NaN98.00087054NaNTRAIN
0.96-31.5879359900270205520275725522702097.9979690001000NaN97.99796895NaNVALIDATE
0.95-24.64771740063042128606432812866304298.0008710001000NaN98.00087054NaNTRAIN
0.95-39.4849199900270205520275725522702097.9979690001000NaN97.99796895NaNVALIDATE
0.94-29.577260880063042128606432812866304298.0008710001000NaN98.00087054NaNTRAIN
0.94-47.3819039900270205520275725522702097.9979690001000NaN97.99796895NaNVALIDATE
0.93-34.506804360063042128606432812866304298.0008710001000NaN98.00087054NaNTRAIN
0.93-55.2788879900270205520275725522702097.9979690001000NaN97.99796895NaNVALIDATE
0.92-39.436347840063042128606432812866304298.0008710001000NaN98.00087054NaNTRAIN
0.92-63.1758719900270205520275725522702097.9979690001000NaN97.99796895NaNVALIDATE
0.91-44.36589132102630401276126431612786305098.0133071020.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.91-71.0728559852270185477275655492702398.00885520.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.9-49.2954348102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.9-78.9698399852270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.89-54.22497828102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.89-86.8668239852270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.88-59.15452176102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.88-94.7638079852270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.87-64.08406524102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.87-102.66079252270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.86-69.01360873102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.86-110.55777652270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.85-73.94315221102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.85-118.4547652270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.84-78.87269569102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.84-126.35174452270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.83-83.80223917102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.83-134.24872852270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.82-88.73178265102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.82-142.14571252270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.81-93.66132613102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.81-150.04269652270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.8-98.59086961102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.8-157.9396852270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.79-103.5204131102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.79-165.83666452270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.78-108.4499566102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.78-173.73364852270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.77-113.3795102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.77-181.63063252270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.76-118.3090435102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.76-189.52761652270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.75-123.238587102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.75-197.424652270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.74-128.1681305102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.74-205.32158452270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.73-133.097674102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.73-213.21856852270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.72-138.0272175102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.72-221.11555252270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.71-142.9567609102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.71-229.012535952270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.7-147.8863044102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.7-236.909519952270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.69-152.8158479102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.69-244.806503952270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.68-157.7453914102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.68-252.703487952270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.67-162.6749349102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.67-260.600471952270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.66-167.6044783102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.66-268.497455952270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.65-172.5340218102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.65-276.394439952270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.64-177.4635653102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.64-284.291423952270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.63-182.3931088102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.63-292.188407952270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.62-187.3226523102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.62-300.085391952270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.61-192.2521957102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.61-307.982375952270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.6-197.1817392102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.6-315.879359952270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.59-202.1112827102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.59-323.776343952270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.58-207.0408262102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.58-331.673327952270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.57-211.9703697102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.57-339.570311952270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.56-216.8999131102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.56-347.467295952270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.55-221.8294566102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.55-355.364279952270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.54-226.7590001102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.54-363.261263952270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.53-231.6885436102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.53-371.158247952270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.52-236.6180871102630401276126431612786305098.013307000.77760599.9968280.0031724983.3333333398.0160457790.67468955TRAIN
0.52-379.055231952270185477275655492702398.00885000.905797199.9925980.0074019271.4285714398.0155994984.72208546VALIDATE
0.51-241.547630555896295312311446418413206300897.94801645874.276827499.8588240.1411757238.1944444498.0820765368.13826049TRAIN
0.51-386.952215920462697453266275065782699497.9036715443.623188499.8297560.1702442630.303030398.0658765464.18445342VALIDATE
0.5-246.47717455896295312311446418413206300897.948016004.276827499.8588240.1411757238.1944444498.0820765368.13826049TRAIN
0.5-394.849199920462697453266275065782699497.90367003.623188499.8297560.1702442630.303030398.0658765464.18445342VALIDATE
0.49-251.406717555896295312311446418413206300897.948016004.276827499.8588240.1411757238.1944444498.0820765368.13826049TRAIN
0.49-402.746183920462697453266275065782699497.90367003.623188499.8297560.1702442630.303030398.0658765464.18445342VALIDATE
0.48-256.33626155896295312311446418413206300897.948016004.276827499.8588240.1411757238.1944444498.0820765368.13826049TRAIN
0.48-410.643167920462697453266275065782699497.90367003.623188499.8297560.1702442630.303030398.0658765464.18445342VALIDATE
0.47-261.265804551511106193277116256270318816244797.075923460102140.04665698.2392691.7607309431.6923076998.7703937665.23135073TRAIN
0.47-418.540151921946426556333683268897972677597.10938619941839.67391398.2827541.7172464832.0644216798.7615753765.41299852VALIDATE
0.46-266.195347951511106193277116256270318816244797.0759230040.04665698.2392691.7607309431.6923076998.7703937665.23135073TRAIN
0.46-426.437135921946426556333683268897972677597.1093860039.67391398.2827541.7172464832.0644216798.7615753765.41299852VALIDATE
0.45-271.124891451511106193277116256270318816244797.0759230040.04665698.2392691.7607309431.6923076998.7703937665.23135073TRAIN
0.45-434.334119921946426556333683268897972677597.1093860039.67391398.2827541.7172464832.0644216798.7615753765.41299852VALIDATE
0.44-276.054434951511106193277116256270318816244797.0759230040.04665698.2392691.7607309431.6923076998.7703937665.23135073TRAIN
0.44-442.231103921946426556333683268897972677597.1093860039.67391398.2827541.7172464832.0644216798.7615753765.41299852VALIDATE
0.43-280.983978451511106193277116256270318816244797.0759230040.04665698.2392691.7607309431.6923076998.7703937665.23135073TRAIN
0.43-450.128087921946426556333683268897972677597.1093860039.67391398.2827541.7172464832.0644216798.7615753765.41299852VALIDATE
0.42-285.913521951511106193277116256270318816244797.0759230040.04665698.2392691.7607309431.6923076998.7703937665.23135073TRAIN
0.42-458.025071921946426556333683268897972677597.1093860039.67391398.2827541.7172464832.0644216798.7615753765.41299852VALIDATE
0.41-290.843065351511106193277116256270318816244797.0759230040.04665698.2392691.7607309431.6923076998.7703937665.23135073TRAIN
0.41-465.922055921946426556333683268897972677597.1093860039.67391398.2827541.7172464832.0644216798.7615753765.41299852VALIDATE
0.4-295.772608851511106193277116256270318816244797.0759230040.04665698.2392691.7607309431.6923076998.7703937665.23135073TRAIN
0.4-473.819039921946426556333683268897972677597.1093860039.67391398.2827541.7172464832.0644216798.7615753765.41299852VALIDATE
0.39-300.702152351511106193277116256270318816244797.0759230040.04665698.2392691.7607309431.6923076998.7703937665.23135073TRAIN
0.39-481.716023921946426556333683268897972677597.1093860039.67391398.2827541.7172464832.0644216798.7615753765.41299852VALIDATE
0.38-305.631695853411726187075217066262219246240497.009078196241.52410698.1409221.8590780731.3012895798.7991440765.05021682TRAIN
0.38-489.613007922847926541324707268658032676997.08762591541.30434898.2272391.7727609232.2489391898.7939698565.52145451VALIDATE
0.37-310.561239353411726187075217066262219246240497.0090780041.52410698.1409221.8590780731.3012895798.7991440765.05021682TRAIN
0.37-497.509991922847926541324707268658032676997.0876250041.30434898.2272391.7727609232.2489391898.7939698565.52145451VALIDATE
0.36-315.490782753411726187075217066262219246240497.0090780041.52410698.1409221.8590780731.3012895798.7991440765.05021682TRAIN
0.36-505.406975922847926541324707268658032676997.0876250041.30434898.2272391.7727609232.2489391898.7939698565.52145451VALIDATE
0.35-320.420326253411726187075217066262219246240497.0090780041.52410698.1409221.8590780731.3012895798.7991440765.05021682TRAIN
0.35-513.303959922847926541324707268658032676997.0876250041.30434898.2272391.7727609232.2489391898.7939698565.52145451VALIDATE
0.34-325.349869753411726187075217066262219246240497.0090780041.52410698.1409221.8590780731.3012895798.7991440765.05021682TRAIN
0.34-521.200943922847926541324707268658032676997.0876250041.30434898.2272391.7727609232.2489391898.7939698565.52145451VALIDATE
0.33-330.279413254612206182274017666256219606236896.953115124842.45723298.0647821.9352177930.9173272998.8171733664.86725033TRAIN
0.33-529.097927923350926511319742268308282674496.99695353042.21014598.116211.8837897931.4016172598.8110324365.10632484VALIDATE
0.32-335.208956759514246161869120196230921156221396.7121634920446.26749697.7411882.2588115929.4700346798.8910109364.1805228TRAIN
0.32-536.994911925562126399297876266969182665496.6705352211246.19565297.7017022.2982975629.1095890498.8874737863.99853141VALIDATE
0.31-340.138500159514246161869120196230921156221396.7121630046.26749697.7411882.2588115929.4700346798.8910109364.1805228TRAIN
0.31-544.891895925562126399297876266969182665496.6705350046.19565297.7017022.2982975629.1095890498.8874737863.99853141VALIDATE
0.3-345.068043659514246161869120196230921156221396.7121630046.26749697.7411882.2588115929.4700346798.8910109364.1805228TRAIN
0.3-552.788879925562126399297876266969182665496.6705350046.19565297.7017022.2982975629.1095890498.8874737863.99853141VALIDATE
0.29-349.997587159514246161869120196230921156221396.7121630046.26749697.7411882.2588115929.4700346798.8910109364.1805228TRAIN
0.29-560.685863925562126399297876266969182665496.6705350046.19565297.7017022.2982975629.1095890498.8874737863.99853141VALIDATE
0.28-354.927130659514246161869120196230921156221396.7121630046.26749697.7411882.2588115929.4700346798.8910109364.1805228TRAIN
0.28-568.582847925562126399297876266969182665496.6705350046.19565297.7017022.2982975629.1095890498.8874737863.99853141VALIDATE
0.27-359.856674159514246161869120196230921156221396.7121630046.26749697.7411882.2588115929.4700346798.8910109364.1805228TRAIN
0.27-576.479831925562126399297876266969182665496.6705350046.19565297.7017022.2982975629.1095890498.8874737863.99853141VALIDATE
0.26-364.786217562515976144566122226210622586207096.4898643017348.60031197.4667682.5332318128.1278127898.9356905963.53175169TRAIN
0.26-584.376815926569526325287960266129822659096.438416107448.00724697.4278312.5721687627.6041666798.9215391663.26285291VALIDATE
0.25-369.71576168319376110560326206170825406178896.0514865834053.1104296.9274453.072554826.0687022999.0228171462.54575971TRAIN
0.25-592.27379992898292619126311182645410922648096.039462413452.35507296.9319023.0680977125.8497316699.0058214362.42777654VALIDATE
0.24-374.645304568319376110560326206170825406178896.0514860053.1104296.9274453.072554826.0687022999.0228171462.54575971TRAIN
0.24-600.17078392898292619126311182645410922648096.039460052.35507296.9319023.0680977125.8497316699.0058214362.42777654VALIDATE
0.23-379.57484868319376110560326206170825406178896.0514860053.1104296.9274453.072554826.0687022999.0228171462.54575971TRAIN
0.23-608.06776792898292619126311182645410922648096.039460052.35507296.9319023.0680977125.8497316699.0058214362.42777654VALIDATE
0.22-384.504391568319376110560326206170825406178896.0514860053.1104296.9274453.072554826.0687022999.0228171462.54575971TRAIN
0.22-615.96475192898292619126311182645410922648096.039460052.35507296.9319023.0680977125.8497316699.0058214362.42777654VALIDATE
0.21-389.433934968519526109060126376169125536177596.03127721553.26594196.9036523.0963484725.9764884399.0257898262.50113913TRAIN
0.21-623.86173592908352618526211252644710972647596.0213261652.53623296.9096973.0903034825.7777777899.0093394362.39355861VALIDATE
0.2-394.363478468519526109060126376169125536177596.0312770053.26594196.9036523.0963484725.9764884399.0257898262.50113913TRAIN
0.2-631.75871992908352618526211252644710972647596.0213260052.53623296.9096973.0903034825.7777777899.0093394362.39355861VALIDATE
0.19-399.293021969320206102259327136161526136171595.93800586853.88802596.7957873.2042130625.5436785899.0375720262.2906253TRAIN
0.19-639.65570392938652615525911582641411242644895.92340133053.0797196.7986683.2013323525.3022452599.0194593862.16085231VALIDATE
0.18-404.222565481731405990246939576037136096071994.38969124112063.53032795.0191944.9808064520.6469547699.2231369459.93504585TRAIN
0.18-647.552687934613112570920616572591515172605594.4980415344662.68115995.1480384.8519615120.8811104499.2050935860.04310201VALIDATE
0.17-409.152108981731405990246939576037136096071994.389690063.53032795.0191944.9808064520.6469547699.2231369459.93504585TRAIN
0.17-655.449671934613112570920616572591515172605594.4980410062.68115995.1480384.8519615120.8811104499.2050935860.04310201VALIDATE
0.16-414.081652481731405990246939576037136096071994.389690063.53032795.0191944.9808064520.6469547699.2231369459.93504585TRAIN
0.16-663.346655934613112570920616572591515172605594.4980410062.68115995.1480384.8519615120.8811104499.2050935860.04310201VALIDATE
0.15-419.011195882131835985946540046032436486068094.32906444363.84136994.9509855.0490149420.504495599.2291625259.86682901TRAIN
0.15-671.243639834713252569520516722590015302604294.45089211462.86231995.0962254.9037749820.7535885299.2084942159.98104136VALIDATE
0.14-423.940739387238205922241446925963642346009493.4181075163767.80715493.9405486.0594524318.5848252399.3057884558.94530684TRAIN
0.14-679.140623837515972542317719722560017742579893.5659362827267.93478394.0895635.9104367119.0162271899.3085937559.16241047VALIDATE
0.13-428.870282887238205922241446925963642346009493.4181070067.80715493.9405486.0594524318.5848252399.3057884558.94530684TRAIN
0.13-687.037607837515972542317719722560017742579893.5659360067.93478394.0895635.9104367119.0162271899.3085937559.16241047VALIDATE
0.12-433.799826387238205922241446925963642346009493.4181070067.80715493.9405486.0594524318.5848252399.3057884558.94530684TRAIN
0.12-694.934591837515972542317719722560017742579893.5659360067.93478394.0895635.9104367119.0162271899.3085937559.16241047VALIDATE
0.11-438.729369888840745896839849625936644725985693.0481281625469.05132293.5376426.4623584317.8960096799.3295825958.61279613TRAIN
0.11-702.831575838317202530016921032546918892568393.148847812369.38405893.6343456.3656550718.2120779899.3364482358.77426311VALIDATE
0.1-443.658913288840745896839849625936644725985693.0481280069.05132293.5376426.4623584317.8960096799.3295825958.61279613TRAIN
0.1-710.728559838317202530016921032546918892568393.1488470069.38405893.6343456.3656550718.2120779899.3364482358.77426311VALIDATE
0.09-448.588456788840745896839849625936644725985693.0481280069.05132293.5376426.4623584317.8960096799.3295825958.61279613TRAIN
0.09-718.625543838317202530016921032546918892568393.1488470069.38405893.6343456.3656550718.2120779899.3364482358.77426311VALIDATE
0.08-453.518000289742805876238951775915146695965992.741885920669.75116693.2108756.7891247117.3266370599.3423610858.33449906TRAIN
0.08-726.522527838617992522116621852538719652560792.87320537969.92753693.3419696.6580310917.6659038999.3461220358.50601296VALIDATE
0.07-458.447543789742805876238951775915146695965992.7418850069.75116693.2108756.7891247117.3266370599.3423610858.33449906TRAIN
0.07-734.419511838617992522116621852538719652560792.8732050069.92753693.3419696.6580310917.6659038999.3461220358.50601296VALIDATE
0.06-463.377087292551135792936160385829054745885491.4904862883371.9284691.8895348.1104660415.3196422799.3806827957.35016253TRAIN
0.06-742.316495839721632485715525602501223182525491.592921136471.9202991.9948198.0051813515.507812599.3802974657.44405498VALIDATE
0.05-468.3066306110211308517341841241051918114925283682.135307177619585.69206882.06275217.93724828.87993553699.6455949854.26276526TRAIN
0.05-750.21347984674841221798553082226449262264682.13404970267884.60144982.08364217.91635838.79804069399.6182177554.20812922VALIDATE
0.04-473.2361741110411394516481821249851830115765275282.00472628685.84758981.92633518.07366528.83341334699.6488520254.24113268TRAIN
0.04-758.11046384674882221388553492222349672260581.98534704184.60144981.93190218.06809778.73060385199.6175133954.17405862VALIDATE
0.03-478.1657176115614391486511301554748781145214980777.42662652299789.89113577.17236122.82763877.43551810699.733502853.58451045TRAIN
0.03-766.00744784906135208856266252094761972137577.524323125388.76811677.29459722.70540347.39622641599.7040148953.55012065VALIDATE
0.02-483.095261111941769345349921888745441177854654372.3526338330292.84603471.93458328.06541676.32180865199.7975396753.05967416TRAIN
0.02-773.90443185117539194814180501952275801999272.50834221140492.57246472.09844627.90155446.34782608799.7899805353.06890331VALIDATE
0.01-488.024804612041897544067822017944149190574527170.37526410128293.62363969.90101830.09898165.96659893999.8142653352.89043213TRAIN
0.01-781.80141585148061189593885751899780991947370.625997352293.11594270.16654329.83345675.99416909699.7999684252.89706876VALIDATE
0-492.954348128663042006432806304212861.9991295824406710001001.999129462NaNNaNTRAIN
0-789.69839985522702000275720270205522.002031381895910001002.002031046NaNNaNVALIDATE
1 ACCEPTED SOLUTION

Accepted Solutions
M_Maldonado
Barite | Level 11

Hi Aditya,

Thanks for including your results so clearly. It sure helps!

Comments to your questions

If I read your post right, cost sensitive learning is available in SAS Enterprise Miner using the profit matrix in the Decision Weights options?

It is correct that if a profit matrix is defined, the Decision Tree node will select the tree that has the largest average profit and smallest average loss.

When you say that this pruned tree has "0.5 decision threshold", you mean that the predicted probability cutoff is 0.5? This default cutoff is expected for all models, unless you change it in the SAS cutoff node. The fact that you used a profit matrix in your decision weights to find a pruned tree does not override your 0.5 cutoff for your predicted probabilities.

For 2), you are checking the number of true positives, correct? Is that what you are doing? A suggestion on how to do it below.

Connect a SAS Code node to your diagram, open the editor, paste the below, and run it. You will get tables for the TP FP TN FN of your training and validation data. It will work as long as you have defined a target on your metadata and you have both training and validation data.

          title1 "Training data TP FP TN FN";

          proc tabulate data=&EM_IMPORT_DATA;

          class f_%EM_TARGET i_%EM_TARGET;

          table f_%EM_TARGET,i_%EM_TARGET*n;

          run;

          title1"";

          title1 "Validation data TP FP TN FN";

          proc tabulate data=&EM_IMPORT_VALIDATE;

          class f_%EM_TARGET i_%EM_TARGET;

          table f_%EM_TARGET,i_%EM_TARGET*n;

          run;

          title1"";

Additional comments

It seems to me that your decision tree is not helping you very much. I plotted your Positive Rates and your Cumulative Expected profit. Your positive curve rates do not look too good. And your Cummulative expected profit is basically telling you "accept everybody so that you don't lose money and break even". I would suggest to try to get a better model before paying too much attention to the cumulative expected profits.

for ajosh.png

Further reading

For an example of what "good" positive rates look like, see the positive rates of the example in this tip ().

Since you are dealing with a rare target, the only other doc I can think of is the "Detecting Rare Classes" in the SAS Enterprise Miner reference help. It is in the Predictive Modeling chapter.

I hope it helps. Good luck!

Miguel

View solution in original post

7 REPLIES 7
M_Maldonado
Barite | Level 11

Hi Aditya,

Thanks for including your results so clearly. It sure helps!

Comments to your questions

If I read your post right, cost sensitive learning is available in SAS Enterprise Miner using the profit matrix in the Decision Weights options?

It is correct that if a profit matrix is defined, the Decision Tree node will select the tree that has the largest average profit and smallest average loss.

When you say that this pruned tree has "0.5 decision threshold", you mean that the predicted probability cutoff is 0.5? This default cutoff is expected for all models, unless you change it in the SAS cutoff node. The fact that you used a profit matrix in your decision weights to find a pruned tree does not override your 0.5 cutoff for your predicted probabilities.

For 2), you are checking the number of true positives, correct? Is that what you are doing? A suggestion on how to do it below.

Connect a SAS Code node to your diagram, open the editor, paste the below, and run it. You will get tables for the TP FP TN FN of your training and validation data. It will work as long as you have defined a target on your metadata and you have both training and validation data.

          title1 "Training data TP FP TN FN";

          proc tabulate data=&EM_IMPORT_DATA;

          class f_%EM_TARGET i_%EM_TARGET;

          table f_%EM_TARGET,i_%EM_TARGET*n;

          run;

          title1"";

          title1 "Validation data TP FP TN FN";

          proc tabulate data=&EM_IMPORT_VALIDATE;

          class f_%EM_TARGET i_%EM_TARGET;

          table f_%EM_TARGET,i_%EM_TARGET*n;

          run;

          title1"";

Additional comments

It seems to me that your decision tree is not helping you very much. I plotted your Positive Rates and your Cumulative Expected profit. Your positive curve rates do not look too good. And your Cummulative expected profit is basically telling you "accept everybody so that you don't lose money and break even". I would suggest to try to get a better model before paying too much attention to the cumulative expected profits.

for ajosh.png

Further reading

For an example of what "good" positive rates look like, see the positive rates of the example in this tip ().

Since you are dealing with a rare target, the only other doc I can think of is the "Detecting Rare Classes" in the SAS Enterprise Miner reference help. It is in the Predictive Modeling chapter.

I hope it helps. Good luck!

Miguel

ajosh
Calcite | Level 5

Hi Miguel,

I would like to reiterate few aspects related to this analysis (which you also have pointed out in your reply above):

1) Using profit/cost matrix is one of the solutions/strategies to address the class imbalance problem (in my case Y:N = 2%:98%).

2) Even if we use the profit/cost matrix, this doesnt override the default probabilities for forming decisions(predictions) Y or N. Hence I have used a cut off node after the model comparison (trees v/s logistic regression).

3) I would like to state that I had used the following weights (with maximize option) in decision properties: 5, -100, -10, 1 for TP, FN, FP and TN respectively. The issue I see here is that FN have been given 20 times more weightage than TP. Hence is there a specific logic by which these weight assignments can be done correctly.

I would like to specify that this analysis is about identification of fraudulent persons which are already low in number than the overall non fraudulent ones. Hence I had given highest penalty (-ve profit) for those people who commit fraud but are tagged as non fraudulent by the model. Do suggest me if there is an alternative scheme of correctly specifying the profit/loss matrix in SAS E-Miner.

Thanks a lot for your comments in this regard.

Aditya.

ajosh
Calcite | Level 5

Hi Miguel,

As a follow up reply, I forgot to mention that one of the methods which tells us how to specify the optimal cut off (which could be sub 0.5) is as follows:

1) We first calculate the total profit at each cut off value (from 0.01 to 0.99) as: weight for TP*TP count + weight for FN*FN count + weight for FP*FP count + weight of TN*TN count (note in my case weights for FN and FP are negative, and should be so considered while deriving the total profit).

2) Then we calculate the average profit at each cut off node as follows: Total Profit/Predicted Positives (where predicted positives are sum of TP and FP).

3) We pick that cut off value which gives us the highest average profit in validation dataset and is almost similar to the average profit in training dataset as well.

Do let me know if you could help me with assigning of proper weights to each outcomes as well as if the above formula needs to be modified for calculation for average profit. I think in general for fraud detection we need to have high TP and low FP and low FN isnt it?

Thanks,

Aditya.

ajosh
Calcite | Level 5

Hi Miguel,

I would like to reiterate few aspects related to this analysis (which you also have pointed out in your reply above):

1) Using profit/cost matrix is one of the solutions/strategies to address the class imbalance problem (in my case Y:N = 2%:98%).

2) Even if we use the profit/cost matrix, this doesnt override the default probabilities for forming decisions(predictions) Y or N. Hence I have used a cut off node after the model comparison (trees v/s logistic regression).

3) I would like to state that I had used the following weights (with maximize option) in decision properties: 5, -100, -10, 1 for TP, FN, FP and TN respectively. The issue I see here is that FN have been given 20 times more weightage than TP. Hence is there a specific logic by which these weight assignments can be done correctly.

I would like to specify that this analysis is about identification of fraudulent persons which are already low in number than the overall non fraudulent ones. Hence I had given highest penalty (-ve profit) for those people who commit fraud but are tagged as non fraudulent by the model. Do suggest me if there is an alternative scheme of correctly specifying the profit/loss matrix in SAS E-Miner.

As a follow up reply, I forgot to mention that one of the methods which tells us how to specify the optimal cut off (which could be sub 0.5) is as follows:

1) We first calculate the total profit at each cut off value (from 0.01 to 0.99) as: weight for TP*TP count + weight for FN*FN count + weight for FP*FP count + weight of TN*TN count (note in my case weights for FN and FP are negative, and should be so considered while deriving the total profit).

2) Then we calculate the average profit at each cut off node as follows: Total Profit/Predicted Positives (where predicted positives are sum of TP and FP).

3) We pick that cut off value which gives us the highest average profit in validation dataset and is almost similar to the average profit in training dataset as well.

Do let me know if you could help me with assigning of proper weights to each outcomes as well as if the above formula needs to be modified for calculation for average profit. I think in general for fraud detection we need to have high TP and low FP and low FN isnt it?

Thanks,

Aditya.

ajosh
Calcite | Level 5

Hi All,

Could someone please provide me with pointers on how to construct the misclassification matrix and how to choose the optimal cut off probability other than 0.5 in SAS Enterprise Miner, for the detailed explained in my earlier reply?

M_Maldonado
Barite | Level 11

Hi Aditya,

Sorry, I haven't had time to look at all your questions in detail. See my crosspost in the discussion .

Also, check out this paper. The authors use the Cutoff node and custom coding to do something similar than what I think you are trying to do. Link to the paper: http://support.sas.com/resources/papers/proceedings12/127-2012.pdf

I hope it helps,

Miguel

DougWielenga
SAS Employee

ajosh,

 

Modeling rare events (which is actually quite common) is often challenging for several reasons:  

  * The null model is highly accurate (2% response rate means any model assigning all to the nonevent is 98% accurate)

  * Failing to put any additional weight on correctly predicting the rare event can lead to a null model (for the reasons above)

  * Increasing the weight on correctly predicting the rare event results in picking far more observations having the event than actually do

 

It might be helpful to separate the tasks of modeling an outcome and taking action on the outcome. When modeling a rare event, you must often either oversample the rare event, add weight to correctly predicting the rare event, choose a model selection criteria that is not based on the classification, or some combination of these.  For reason stated above, misclassification is typically not a good selection criteria for modeling.   SAS Enterprise Miner always provides a classification based on which outcome is most likely.  When a target profile is created and decision weights are employed, SAS Enterprise Miner will also create variables containing the most profitable outcome based on the target profile you created.  The meaningfulness of that prediction is directly related to the applicability of the target profile weights.  

 

In general, modeling itself is more clear cut in that each analyst can pick and choose their criteria for building the 'best' model and then build the model.  The resulting probabilities can then be used to order the resulting observations.  Unfortunately for decision tree models, all of the observations in a single node are given the same score which is why some people run additional models within each terminal node to further separate the observations.  The choice of what to do with the ordered observations typically involves business decisioning.   The choice to investigate fraud can be costly, particularly if the person investigated is an honest loyal customer who just had an unusual situation.  The amount of money at stake, the customer's longevity/profitability with the business, and the future expected value of the customer are just a few things that might be considered.  This business decisioning usually creates far more complex criteria than can be simplified to a misclassification matrix which does not take the amount of money at risk into account. 

 

Simply put, whether you take the default decision based on the most likely outcome (typically inappropriate in a rare event), use the decision-weighted predicted outcome (assuming the decision profile accurately represents the business decisioning), or use some other strategy for selecting cases to investigate  (based on available resources, amount at risk, likelihood of fraud, etc...), the TP and FP come from the strategy you employ.  I clearly advocate business decisioning in determining how to proceed because the simple classification rate itself is not meaningful enough in rare events.  Even looking at the expected value of money at risk (e.g. the product of the probability of fraud and the amount at risk) will yield a different ordering of observations.  So there isn't a great answer to the question which cutoff to use without fully understanding the business objectives and priorities.  I tend to use some oversampling (but not to 50/50 because it under-represents the non-event) and decision weights with priors to allow variable selection and to get reasonable probabilities but then combine those probabilities with other information to determine the final prioritization/action for observations based on some more complex rules. 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 4105 views
  • 0 likes
  • 3 in conversation