BookmarkSubscribeRSS Feed
Fluorite | Level 6

Problem – I have a data set where we have a few independent variables that are highly predictive but their frequency is low. An example is claims where claimants are diagnosed with mood disorders (less than 5% of the claimants) but among these 5% of the claimant the probability that the claim will be litigated is very high (> 30%). If you go through the variable selection exercise, the methods are dropping the independent variable as they are low frequency. In such a scenario, what is typically done to make sure that we do not lose this highly predictive variable?


I have a feeling that oversampling is an approach, but i need an explanation even if I am correct.


Thank you,


SAS Employee

Hello Prajna_450 --


One solution is to use the Manual Selection property of the Variable Selection node.  You can use that property to override any of the Role settings that the node determined.  If a variable is rejected even though you want to include it, just change the role to Input.


Have a good week.



Fluorite | Level 6



Firstly thanks for the reply, but my question is how do I increase the number of observations or records for those predictors whose frequency is low.


Help me in finding a method to handle the situation.


Thank you



Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.


Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 2 in conversation