BookmarkSubscribeRSS Feed
Prajna_450
Fluorite | Level 6

Problem – I have a data set where we have a few independent variables that are highly predictive but their frequency is low. An example is claims where claimants are diagnosed with mood disorders (less than 5% of the claimants) but among these 5% of the claimant the probability that the claim will be litigated is very high (> 30%). If you go through the variable selection exercise, the methods are dropping the independent variable as they are low frequency. In such a scenario, what is typically done to make sure that we do not lose this highly predictive variable?

 

I have a feeling that oversampling is an approach, but i need an explanation even if I am correct.

 

Thank you,

Prajna

2 REPLIES 2
MikeStockstill
SAS Employee

Hello Prajna_450 --

 

One solution is to use the Manual Selection property of the Variable Selection node.  You can use that property to override any of the Role settings that the node determined.  If a variable is rejected even though you want to include it, just change the role to Input.

 

Have a good week.

 

 

Prajna_450
Fluorite | Level 6

Hi,

 

Firstly thanks for the reply, but my question is how do I increase the number of observations or records for those predictors whose frequency is low.

 

Help me in finding a method to handle the situation.

 

Thank you

Prajna

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 886 views
  • 0 likes
  • 2 in conversation