BookmarkSubscribeRSS Feed
jlh368
Fluorite | Level 6

Enterprise miner 14.1

Hello,

I am following this example https://communities.sas.com/t5/SAS-Communities-Library/Tip-How-to-model-a-rare-target-using-an-overs... to familiarize myself with Oversampling.   As an additional learning, I connected a score node to the model comparison node. My thought is to copy the original data set and the first sample and score this data set.  So, I added set a copy of the original German Credit with a role of score and copied the first sample node (same seed, same sample size, and same event percent .05/.95) and ran the workflow.  

Class Variable Summary Statistics

Data Role=SCORE Output Type=CLASSIFICATION

Numeric Formatted Frequency
Variable Value Value Count Percent

I_good_bad . BAD     204 34
I_good_bad . GOOD 396 66


Data Role=SCORE Output Type=MODELDECISION

Numeric Formatted Frequency
Variable Value Value Count Percent

D_good_bad . BAD     226 37.6667
D_good_bad . GOOD 374 62.3333

 

I had expected the results to be closer to the sample proportions (Bad .05 vs Good . 95), but the results appear close to the original data set.  When I look at the score code, I see the original data set's posterior probabilities with no adjustment.

Label P_good_badgood='Predicted: good_bad=good';
P_good_badgood = 0.7;
Label P_good_badbad='Predicted: good_bad=bad';
P_good_badbad = 0.3; 

 

Am I just approaching this problem incorrectly? Have I made an error or just an error in understanding? I've attached a copy of my workflow, I renamed it .jpg.  If you drop this you should be able to import into EM.  Thanks!

 

1 REPLY 1
jlh368
Fluorite | Level 6

I took a deeper dive into the example listed above and I realize there are many inputs that affect the score percentages. The change I had questioned below, the scoring percentages being closer to the original data set percentages, was the effect of the sample proportion.   I adjusted the data partition percentages from Train/validate 50/50 to 70/30 and noticed the change in the model. This change, in turn, affected the scoring proportions. I also did see the updated prior probabilities in the SAS score code node.   In short, it was doing what it was supposed to do, and I learned a bit.  Any suggestions on topics to follow up on from here?

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1446 views
  • 0 likes
  • 1 in conversation