Question about Gradient Boosting on large dataset_SAS EM

YG1992 · Posted 01-30-2018 04:29 AM

Hi everyone,

I have got several large populations with millions of observations and my task is two-class classification. If I apply gradient boosting directly on the whole dataset (train:validate = 70:30), then I always get 0.5 AUC and always the same predicted probabilities of class 1 and 2 for each observation; if I draw a sample of 100k or 200k first and apply gradient boosting with same hyper-parameter settings, the results are relatively normal with some AUCs higher than 0.5 and different probabilities of class 1 and 2 for each observation.

I would like to ask some SAS EM programmer here: could you please explain this situation? I guess the algorithm just stops updating any parameters at the very beginning but I don't no the exact and concrete reason. Last but not least: there is no error message when running GBDT for both large and small datasets.

Thank you very much.

MikeStockstill · Posted 01-30-2018 08:08 AM

Hello YG1992 -

A first step is to check whether either of these notes are relevant to your situation when your AUC is 0.5.

61607 - Gradient Boosting finds no splits, "Will not search for split .. too few acceptable cases" n...-

57674 - A "no chart data" message is displayed in an empty plot, or "too few acceptable cases .. wil...

Have a great day.

Question about Gradient Boosting on large dataset_SAS EM

Re: Question about Gradient Boosting on large dataset_SAS EM

The 2025 SAS Hackathon has begun!