topic Re: Gradient Boosting is performing worse than random - Help please in SAS Data Science

Gradient Boosting is performing worse than random - Help please

Analyze_this — Thu, 06 Mar 2014 20:58:54 GMT

Hello SASers,

I am working on a project with a binary target. The target distribution is 13.6% (event) vs 86.4% (non-event). The decision tree, regression and gradient boosting models are scoring around a 19% missclassification on the validation data. I have two questions, but here are some of the details of my process flow:

I tried using inverse priors with models' assessment statistic set to decision, but switched to missclassification after I realized the models performed marginally better under this setting.

Data partition node is set to 70% (train) and 30% (validation).

I tried oversampling event case to 33% of the data, but the missclassification rate rose to 20%.

First question: If I oversample, does the 20% missclassification rate take into account that I oversampled (ie. the oversampling 20% misscalass. is worse than the non-oversampling 19% missclass)? OR is the oversampled 20% missclass better than the non-oversampled 19% b/c the oversampled event was observed in 33% of the observations and 20% is clearly an improvement?

Second question: Do y'all have any suggestions of what is casing the models to perform worse than random and how suggestions of how I may fix the problem?

Thank y'all so much for your time.

Best,

RWB

Re: Gradient Boosting is performing worse than random - Help please

Analyze_this — Fri, 07 Mar 2014 14:56:15 GMT

Oops, I made a rookie mistake. I calculated the distribution from the histograms derived from the explore variable process and I forgot to change my settings from (Top,Default) to (Random,Max). In actuality, the target distribution is around target distribution is 30% (event) vs 70% (non-event). So the model's are adding to our prediction power.

I'm still curious about the first question I asked above. I'll restate it:

If y'all could help me solve this one, that would be great.

Thank you.

Re: Gradient Boosting is performing worse than random - Help please

WendyCzika — Fri, 14 Mar 2014 16:26:59 GMT

No, oversampling is not being accounted for unless you adjust your prior probabilities and/or decision matrix, either in the Input Data node or a Decisions node after you have sampled. The "Detecting Rare Classes" section under Analytics > Predictive Modeling in the Enterprise Miner Reference Help provides the best practices for handling rare events.

Hope that helps,

Wendy Czika

SAS Enterprise Miner R&D

Re: Gradient Boosting is performing worse than random - Help please

Analyze_this — Fri, 14 Mar 2014 17:46:59 GMT

Thank you Wendy. I'm using inverse priors in the decision matrix, so would the miss classification rate of, lets say a decision tree take into account that the data is sampled. Here's the situation driving my question: In situations where I deal with rare events (event happens in 5% of data), I'll sometimes get a missclass. rate of lets say,15% on validation data. I then try oversampling (w/inverse priors of course), increasing the event proportion from 5% to (10%, or 20%, or 30%, ect.) and I end up getting missclass rates higher than the original 15%. Is there a way to compare against different subsampling proportions? SAS's training material usually suggests oversampling in situations of rare events, but I've been experiencing worse results when I do this.

Re: Gradient Boosting is performing worse than random - Help please

WendyCzika — Fri, 14 Mar 2014 19:27:31 GMT

I'm unclear about what you are doing exactly when you say oversampling with inverse priors. If you are using the Sample node to sample a higher proportion of rare events, then you would need a Decisions node following it to adjust the prior probabilities. When using the same prior probabilities, it is valid to compare the models with different event proportions from oversampling. The "Prior Probabilities" section of the same part of the EM Reference Help that I mentioned above explains this better than I can!