Problem – I have a data set where we have a few independent variables that are highly predictive but their frequency is low. An example is claims where claimants are diagnosed with mood disorders (less than 5% of the claimants) but among these 5% of the claimant the probability that the claim will be litigated is very high (> 30%). If you go through the variable selection exercise, the methods are dropping the independent variable as they are low frequency. In such a scenario, what is typically done to make sure that we do not lose this highly predictive variable?
I have a feeling that oversampling is an approach, but i need an explanation even if I am correct.
Thank you,
Prajna
Hello Prajna_450 --
One solution is to use the Manual Selection property of the Variable Selection node. You can use that property to override any of the Role settings that the node determined. If a variable is rejected even though you want to include it, just change the role to Input.
Have a good week.
Hi,
Firstly thanks for the reply, but my question is how do I increase the number of observations or records for those predictors whose frequency is low.
Help me in finding a method to handle the situation.
Thank you
Prajna
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.