BookmarkSubscribeRSS Feed
Prajna_450
Fluorite | Level 6

Problem – I have a data set where we have a few independent variables that are highly predictive but their frequency is low. An example is claims where claimants are diagnosed with mood disorders (less than 5% of the claimants) but among these 5% of the claimant the probability that the claim will be litigated is very high (> 30%). If you go through the variable selection exercise, the methods are dropping the independent variable as they are low frequency. In such a scenario, what is typically done to make sure that we do not lose this highly predictive variable?

 

I have a feeling that oversampling is an approach, but i need an explanation even if I am correct.

 

Thank you,

Prajna

2 REPLIES 2
MikeStockstill
SAS Employee

Hello Prajna_450 --

 

One solution is to use the Manual Selection property of the Variable Selection node.  You can use that property to override any of the Role settings that the node determined.  If a variable is rejected even though you want to include it, just change the role to Input.

 

Have a good week.

 

 

Prajna_450
Fluorite | Level 6

Hi,

 

Firstly thanks for the reply, but my question is how do I increase the number of observations or records for those predictors whose frequency is low.

 

Help me in finding a method to handle the situation.

 

Thank you

Prajna

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 804 views
  • 0 likes
  • 2 in conversation