BookmarkSubscribeRSS Feed
YG1992
Obsidian | Level 7

Hi everyone,

 

I have got several large populations with millions of observations and my task is two-class classification. If I apply gradient boosting directly on the whole dataset (train:validate = 70:30), then I always get 0.5 AUC and always the same predicted probabilities of class 1 and 2 for each observation; if I draw a sample of 100k or 200k first and apply gradient boosting with same hyper-parameter settings, the results are relatively normal with some AUCs higher than 0.5 and different probabilities of class 1 and 2 for each observation.

 

I would like to ask some SAS EM programmer here: could you please explain this situation? I guess the algorithm just stops updating any parameters at the very beginning but I don't no the exact and concrete reason. Last but not least: there is no error message when running GBDT for both large and small datasets.

 

Thank you very much.

1 REPLY 1

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1033 views
  • 0 likes
  • 2 in conversation