BookmarkSubscribeRSS Feed
YG1992
Obsidian | Level 7

Hi everyone,

 

I have got several large populations with millions of observations and my task is two-class classification. If I apply gradient boosting directly on the whole dataset (train:validate = 70:30), then I always get 0.5 AUC and always the same predicted probabilities of class 1 and 2 for each observation; if I draw a sample of 100k or 200k first and apply gradient boosting with same hyper-parameter settings, the results are relatively normal with some AUCs higher than 0.5 and different probabilities of class 1 and 2 for each observation.

 

I would like to ask some SAS EM programmer here: could you please explain this situation? I guess the algorithm just stops updating any parameters at the very beginning but I don't no the exact and concrete reason. Last but not least: there is no error message when running GBDT for both large and small datasets.

 

Thank you very much.

1 REPLY 1

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 677 views
  • 0 likes
  • 2 in conversation