BookmarkSubscribeRSS Feed
jlh368
Fluorite | Level 6

Enterprise miner 14.1

Hello,

I am following this example https://communities.sas.com/t5/SAS-Communities-Library/Tip-How-to-model-a-rare-target-using-an-overs... to familiarize myself with Oversampling.   As an additional learning, I connected a score node to the model comparison node. My thought is to copy the original data set and the first sample and score this data set.  So, I added set a copy of the original German Credit with a role of score and copied the first sample node (same seed, same sample size, and same event percent .05/.95) and ran the workflow.  

Class Variable Summary Statistics

Data Role=SCORE Output Type=CLASSIFICATION

Numeric Formatted Frequency
Variable Value Value Count Percent

I_good_bad . BAD     204 34
I_good_bad . GOOD 396 66


Data Role=SCORE Output Type=MODELDECISION

Numeric Formatted Frequency
Variable Value Value Count Percent

D_good_bad . BAD     226 37.6667
D_good_bad . GOOD 374 62.3333

 

I had expected the results to be closer to the sample proportions (Bad .05 vs Good . 95), but the results appear close to the original data set.  When I look at the score code, I see the original data set's posterior probabilities with no adjustment.

Label P_good_badgood='Predicted: good_bad=good';
P_good_badgood = 0.7;
Label P_good_badbad='Predicted: good_bad=bad';
P_good_badbad = 0.3; 

 

Am I just approaching this problem incorrectly? Have I made an error or just an error in understanding? I've attached a copy of my workflow, I renamed it .jpg.  If you drop this you should be able to import into EM.  Thanks!

 

1 REPLY 1
jlh368
Fluorite | Level 6

I took a deeper dive into the example listed above and I realize there are many inputs that affect the score percentages. The change I had questioned below, the scoring percentages being closer to the original data set percentages, was the effect of the sample proportion.   I adjusted the data partition percentages from Train/validate 50/50 to 70/30 and noticed the change in the model. This change, in turn, affected the scoring proportions. I also did see the updated prior probabilities in the SAS score code node.   In short, it was doing what it was supposed to do, and I learned a bit.  Any suggestions on topics to follow up on from here?

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1922 views
  • 0 likes
  • 1 in conversation