Hi everyone,
I have got several large populations with millions of observations and my task is two-class classification. If I apply gradient boosting directly on the whole dataset (train:validate = 70:30), then I always get 0.5 AUC and always the same predicted probabilities of class 1 and 2 for each observation; if I draw a sample of 100k or 200k first and apply gradient boosting with same hyper-parameter settings, the results are relatively normal with some AUCs higher than 0.5 and different probabilities of class 1 and 2 for each observation.
I would like to ask some SAS EM programmer here: could you please explain this situation? I guess the algorithm just stops updating any parameters at the very beginning but I don't no the exact and concrete reason. Last but not least: there is no error message when running GBDT for both large and small datasets.
Thank you very much.
Hello YG1992 -
A first step is to check whether either of these notes are relevant to your situation when your AUC is 0.5.
Have a great day.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.