BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
andreas_zaras
Pyrite | Level 9

Hello!

 

I am testing Model Studio DM and ML using the pva97nk data set. This data set is oversampled so the proportion of 1's and 0's has become 50% - 50% from an initial proportion of 5% - 95% (1's - 0's again). 

 

In SAS EM in order to adjust for balanced sampling we had to put 0.05 - 0.95 in the prior probabilities tab of the decision property. Now in SAS Model Studio of SAS Viya i go to the project settings and i put Event=5 Non-Event=95 in the Event Based Sampling Tab. 

 

When i run the pipeline it stops in the data node and the following message is output in the log:

NOTE: Oversampling is activated.
NOTE: Using SEED=12345 for sampling.
ERROR: There is not enough observations from non-event level to satisfy the event proportion.
ERROR: The action stopped due to errors.

 

What is wrong with the above process?

 

Thanks in advance,

 

Andreas

1 ACCEPTED SOLUTION

Accepted Solutions
WendyCzika
SAS Employee

In Model Studio, you can't specify priors for data that has already been sampled.  The percentages that you enter in Project Settings are the ones that you want to achieve via sampling the full data.  So since you already have a 50/50 sample, it is trying to create a 5/95 sample, which can't be done.  Hope that makes sense.

View solution in original post

5 REPLIES 5
WendyCzika
SAS Employee

In Model Studio, you can't specify priors for data that has already been sampled.  The percentages that you enter in Project Settings are the ones that you want to achieve via sampling the full data.  So since you already have a 50/50 sample, it is trying to create a 5/95 sample, which can't be done.  Hope that makes sense.

BrettWujek
SAS Employee

One additional note here that might clear up some confusion. @WendyCzika is exactly right that since you have already oversampled then you should not be specifying it here (and this setting in Model Studio is NOT for specification of priors for posterior adjustment of predictions). But what might still be confusing is why this failed and didn't just oversample AGAIN (ie, add continue to sample from the non-events with replacement until it gets the specified 5-95 proportions (which, again, is not really what you wanted to do here)). The reality is that Model Studio is actually UNDERsampling the non-event observations for the event-based sampling. So as you can imagine here, since you already balanced the data set, there are not enough non-event observations to down-sample to get the proportions you specified.

 

The log message is misleading...it is not oversampling. We should fix this message until we enhance the event-based sampling to add oversampling (which has been discussed in R&D).

 

Hope this helps.


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

ravnen55
Calcite | Level 5

One related question:

 

How do you oversample in model studio? 

 

If by oversampling we simply mean duplicate some of the rareevents cases, and we do this before partitioning, then chances are that some of the cases in the train and the test datasets er identical, and that seems to me like a clear data leakage.

 

But how do you, in model studio, oversample only to the train dataset? Or should you do something else?

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 3149 views
  • 2 likes
  • 4 in conversation