Hi ,
My data source target variable has (~0.0019%) and overall there are 7,240,251 observations,please can you explain through oversample approach(step by step please) or is there any other method to build a model.
I am using sample node now but it seems not working for me
Thanks,
Sathya.
So actually I think you want to be using the other approach for dealing with rare targets, which is to adjust the posterior probabilities instead of entering the decision weights (those only affect profit, not other fit statistics). So do that, in the Decisions node, you would no longer use the inverse priors on the diagonal of the decision matrix but just revert those to 1's, then you want to click Refresh on the Targets tab, then on the Prior Probabilities tab, enter the original priors for your target (the very rare proportion for your event, e.g.). Now this will apply an adjustment to your posterior probabilities - hopefully you will see better results this way.
To answer your other question, EM_CLASSIFICATION is the generically named variable containing the predictions based on your model. Here are more details about those variables from the Score node:
EM_PROBABILITY |
Probability of Classification |
Posterior probability associated with the predicted classification. That is, it corresponds the maximum of the posterior probabilities, max(P1, P2, ..., Pk). |
EM_EVENTPROBABILITY |
Probability for level n of vnm |
Posterior probability associated with target event. |
EM_CLASSIFICATION |
Prediction for vnm |
I_variable, the prediction variable for a class target. |
Please take a look at this post: https://communities.sas.com/t5/SAS-Communities-Library/Tip-How-to-model-a-rare-target-using-an-overs.... This outlines one of the ways you can deal with rare targets. There is also a section "Detecting Rare Classes" under Analytics>Predictive Modeling in the SAS Enterprise Miner Reference Help.
Hope this helps!
Wendy
and also after scoring ,it is creating some variables ex:
EM_EVENTPROBABILITY
EM_PROBABILITY
.EM_CLASSIFICATION (in 0s and 1s),
which variable do I need to consider from them as a predicted column
Thanks,
Sathya.
So actually I think you want to be using the other approach for dealing with rare targets, which is to adjust the posterior probabilities instead of entering the decision weights (those only affect profit, not other fit statistics). So do that, in the Decisions node, you would no longer use the inverse priors on the diagonal of the decision matrix but just revert those to 1's, then you want to click Refresh on the Targets tab, then on the Prior Probabilities tab, enter the original priors for your target (the very rare proportion for your event, e.g.). Now this will apply an adjustment to your posterior probabilities - hopefully you will see better results this way.
To answer your other question, EM_CLASSIFICATION is the generically named variable containing the predictions based on your model. Here are more details about those variables from the Score node:
EM_PROBABILITY |
Probability of Classification |
Posterior probability associated with the predicted classification. That is, it corresponds the maximum of the posterior probabilities, max(P1, P2, ..., Pk). |
EM_EVENTPROBABILITY |
Probability for level n of vnm |
Posterior probability associated with target event. |
EM_CLASSIFICATION |
Prediction for vnm |
I_variable, the prediction variable for a class target. |
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.