Enterprise miner 14.1
Hello,
I am following this example https://communities.sas.com/t5/SAS-Communities-Library/Tip-How-to-model-a-rare-target-using-an-overs... to familiarize myself with Oversampling. As an additional learning, I connected a score node to the model comparison node. My thought is to copy the original data set and the first sample and score this data set. So, I added set a copy of the original German Credit with a role of score and copied the first sample node (same seed, same sample size, and same event percent .05/.95) and ran the workflow.
Class Variable Summary Statistics
Data Role=SCORE Output Type=CLASSIFICATION
Numeric Formatted Frequency
Variable Value Value Count Percent
I_good_bad . BAD 204 34
I_good_bad . GOOD 396 66
Data Role=SCORE Output Type=MODELDECISION
Numeric Formatted Frequency
Variable Value Value Count Percent
D_good_bad . BAD 226 37.6667
D_good_bad . GOOD 374 62.3333
I had expected the results to be closer to the sample proportions (Bad .05 vs Good . 95), but the results appear close to the original data set. When I look at the score code, I see the original data set's posterior probabilities with no adjustment.
Label P_good_badgood='Predicted: good_bad=good';
P_good_badgood = 0.7;
Label P_good_badbad='Predicted: good_bad=bad';
P_good_badbad = 0.3;
Am I just approaching this problem incorrectly? Have I made an error or just an error in understanding? I've attached a copy of my workflow, I renamed it .jpg. If you drop this you should be able to import into EM. Thanks!
I took a deeper dive into the example listed above and I realize there are many inputs that affect the score percentages. The change I had questioned below, the scoring percentages being closer to the original data set percentages, was the effect of the sample proportion. I adjusted the data partition percentages from Train/validate 50/50 to 70/30 and noticed the change in the model. This change, in turn, affected the scoring proportions. I also did see the updated prior probabilities in the SAS score code node. In short, it was doing what it was supposed to do, and I learned a bit. Any suggestions on topics to follow up on from here?
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.