BookmarkSubscribeRSS Feed
YG1992
Obsidian | Level 7

Hi everyone,

 

In my project I have to build different models for large datasets and some of them may have more than 3 million observations and hundreds of input variables. For logistic regression (LR) and decision trees (DT) the correspondent nodes work fine; but for some machine learning methods such as SVM, Random Forest, Gradient Boosting, k-Nearest-Neighbors and so on they sometimes fail to complete running with some error messages.

 

If I sample a small subsample and apply those methods with exactly the same hyper-parameter settings then everything is fine. That's why I conclude that those errors are related with sample size.

 

In conclusion, I wonder if there exist a way to allow me to use all the training data (e.g. 3 million x 0.7 = 2.1 million training observations) to build SVM, RF, GBDT, kNN and so on. I think that "Group" nodes may be helpful to do something like "batching" the data, but I am not sure and not clear how it will be like specifically.

 

If you have any suggestions you are welcome to discuss them with me and I would really appreciate it.

Thanks very much!

1 REPLY 1
MikeStockstill
SAS Employee

Hello YG1992 -

 

A first step is to examine the text of the error to find more specific information about the problem.  Based on the text of the error, try some searches on this page:

 

http://support.sas.com/notes/

 

 

Example: if the errors are out-of-memory errors, then try notes such as this one.

 

61376 - Overcoming "insufficient memory ." and "parameter larger than documented limit" error messag...

 

 

If none of that information leads you to a resolution, then turn on the MPRINT option, create a model package, and contact technical support for assistance.

 

Have a great day.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 714 views
  • 0 likes
  • 2 in conversation