sas eminer - help on a method for increasing sampling

Prajna_450 · Posted 01-29-2018 12:15 AM

Problem – I have a data set where we have a few independent variables that are highly predictive but their frequency is low. An example is claims where claimants are diagnosed with mood disorders (less than 5% of the claimants) but among these 5% of the claimant the probability that the claim will be litigated is very high (> 30%). If you go through the variable selection exercise, the methods are dropping the independent variable as they are low frequency. In such a scenario, what is typically done to make sure that we do not lose this highly predictive variable?

I have a feeling that oversampling is an approach, but i need an explanation even if I am correct.

Thank you,

Prajna

MikeStockstill · Posted 01-29-2018 09:24 AM

Hello Prajna_450 --

One solution is to use the Manual Selection property of the Variable Selection node. You can use that property to override any of the Role settings that the node determined. If a variable is rejected even though you want to include it, just change the role to Input.

Have a good week.

Prajna_450 · Posted 01-30-2018 01:56 AM

Hi,

Firstly thanks for the reply, but my question is how do I increase the number of observations or records for those predictors whose frequency is low.

Help me in finding a method to handle the situation.

Thank you

Prajna

sas eminer - help on a method for increasing sampling

Re: sas eminer - help on a method for increasing sampling

Re: sas eminer - help on a method for increasing sampling

sas eminer - help on a method for increasing sampling

Re: sas eminer - help on a method for increasing sampling

Re: sas eminer - help on a method for increasing sampling

SAS Innovate 2025: Save the Date