BookmarkSubscribeRSS Feed
YG1992
Obsidian | Level 7

Hi everyone,

 

I have got several large populations with millions of observations and my task is two-class classification. If I apply gradient boosting directly on the whole dataset (train:validate = 70:30), then I always get 0.5 AUC and always the same predicted probabilities of class 1 and 2 for each observation; if I draw a sample of 100k or 200k first and apply gradient boosting with same hyper-parameter settings, the results are relatively normal with some AUCs higher than 0.5 and different probabilities of class 1 and 2 for each observation.

 

I would like to ask some SAS EM programmer here: could you please explain this situation? I guess the algorithm just stops updating any parameters at the very beginning but I don't no the exact and concrete reason. Last but not least: there is no error message when running GBDT for both large and small datasets.

 

Thank you very much.

1 REPLY 1

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 660 views
  • 0 likes
  • 2 in conversation