Enterprise miner 14.1
Hello,
I am following this example https://communities.sas.com/t5/SAS-Communities-Library/Tip-How-to-model-a-rare-target-using-an-overs... to familiarize myself with Oversampling. As an additional learning, I connected a score node to the model comparison node. My thought is to copy the original data set and the first sample and score this data set. So, I added set a copy of the original German Credit with a role of score and copied the first sample node (same seed, same sample size, and same event percent .05/.95) and ran the workflow.
Class Variable Summary Statistics
Data Role=SCORE Output Type=CLASSIFICATION
Numeric Formatted Frequency
Variable Value Value Count Percent
I_good_bad . BAD 204 34
I_good_bad . GOOD 396 66
Data Role=SCORE Output Type=MODELDECISION
Numeric Formatted Frequency
Variable Value Value Count Percent
D_good_bad . BAD 226 37.6667
D_good_bad . GOOD 374 62.3333
I had expected the results to be closer to the sample proportions (Bad .05 vs Good . 95), but the results appear close to the original data set. When I look at the score code, I see the original data set's posterior probabilities with no adjustment.
Label P_good_badgood='Predicted: good_bad=good';
P_good_badgood = 0.7;
Label P_good_badbad='Predicted: good_bad=bad';
P_good_badbad = 0.3;
Am I just approaching this problem incorrectly? Have I made an error or just an error in understanding? I've attached a copy of my workflow, I renamed it .jpg. If you drop this you should be able to import into EM. Thanks!
I took a deeper dive into the example listed above and I realize there are many inputs that affect the score percentages. The change I had questioned below, the scoring percentages being closer to the original data set percentages, was the effect of the sample proportion. I adjusted the data partition percentages from Train/validate 50/50 to 70/30 and noticed the change in the model. This change, in turn, affected the scoring proportions. I also did see the updated prior probabilities in the SAS score code node. In short, it was doing what it was supposed to do, and I learned a bit. Any suggestions on topics to follow up on from here?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.