We’re smarter together. Learn from this collection of community knowledge and add your expertise.

Autotuning in SAS Visual Data Mining and Machine Learning (VDMML)

by SAS Employee BethEbersole on ‎12-15-2017 04:45 PM - edited on ‎12-18-2017 11:36 AM by Community Manager (2,556 Views)

What is autotuning?

 

When building a model, the data scientist can set the value of hyperparameters for the model.  Examples of hyperparameters are the number of layers in an artificial neural network, the number of trees in a random forest, etc.  The modeler has the power to decide these hyperparameters, providing the flexibility to train the best model.  However, “with great power there must also come--great responsibility!” 

 

Flexibility comes at the expense of added complexity.  So many choices can be overwhelming.  How does the data scientist select the best values for these hyperparameters, especially considering their varied effect on different data sets?

 

autotune1.png

  

In both Enterprise Miner and VA|VS|VDMML, there are reasonable defaults set for these hyperparameters, so the data scientist can take no action and use these defaults.  However, these defaults are not based on the data.  Selecting better hyperparameters using the actual data to improve your model performance can be done in several ways.  Three ways to optimize hyperparameters are:

 

  • Brute force hyperparameter sweep.  The model is trained with every possible combination of a pre-defined set of hyperparameter values that the data scientist has selected.  It is possible in this method that the optimal hyperparameter values are not part of the predefined set the data scientist selected.
  • Randomly select values for the hyperparameters.  The data scientist will determine how many different values to run models on.  Because a random selection is used, the optimal set of hyperparameters may be missed.
  • Use optimization algorithms to intelligently direct the search of the hyperparameter space. This is what SAS’s autotune does (more details below under “how autotuning works”).

VDMML automates the selection of hyperparameter values using an intelligent optimization-based methodology. This capability can significantly improve the accuracy of the resulting model with no additional effort from the modeler.

 

Remember that autotuning uses a lot of computer processing resources because it involves many iterations.  In Viya 3.2, autotuning was made multithreaded, allowing it to be executed in parallel and thus much more quickly.

 

Which machine learning procedures (algorithms) in SAS VA|VS|VDMML 8.2 on Viya 3.3 support autotuning?

 

VDMML Algorithm Hyperparameters Optimized by Autotune:

 

autotune2.png

  

 

How do you use the Autotune feature in VDMML? You can select autotune from either the visual GUI or from SAS Studio tasks.  In either case, you can adjust the maximum seconds, maximum iterations, and maximum evaluations for the autotuning to keep it from running too long.   The screenshot below gives a sneak peek at how this will look in the visual GUI for VDMML 8.2 on Viya 3.3.

 

 

autotune3.png

    

How does SAS autotuning work?

 

SAS VDMML autotuning feature uses optimization algorithms to select the optimal hyperparameters by:  

 

  1. Generating an initial candidate set of hyperparameters using Latin hypercube sampling.
  2. Feeding that set of hyperparameters in as the initial population for a genetic algorithm.  A genetic algorithm is an evolutionary algorithm that mimics the process of natural selection, i.e., survival of the fittest from generation to generation.  Using the genetic algorithm, the best point from the original set is selected, and then a new set of candidate points is generated through the genetic algorithm processes of crossover and mutation.  The best point in the population is carried forward at each iteration but VDMML does not force all mutations and cross-over operations to be with this best point found so far.  Instead, multiple points from the population are used to produce the next generation, so it is a “mu + lambda” type algorithm.
  3. Performing an evaluation for each candidate in the population.  In other words, training a model with the algorithm using the candidate hyperparameter values and assessing that model either using a validation set from partitioning the data or using k-fold cross-validation.
  4. Repeating this process until a user-specified time limit is reached or other specified convergence criterion is met.

Summary

 

We know that Spiderman can spin a web any size. 

 

Yes, that’s right.  Any size.  It must be true, because it’s in the song:

 

“Spiderman, Spiderman

Does whatever a spider can

Spins a web, any size

Catches thieves, just like flies.

Look out!  Here comes the Spiderman.”

 

And just like Spiderman can spin a web any size, we can make a random forest from any number of decision trees.  We could use our spider-sense to guess how many random forests we need.  Or, we can use autotuning with built-in optimization techniques to be sure we pick the best number of decision trees to run.  Likewise for choosing other hyperparameter values.  SAS Viya makes autotuning easy through either the visual GUI or SAS Studio tasks.

 

Just for Fun

 

But really, how does Spiderman decide which size web to spin?  Does he just need enough to create a rope to swing himself from one building to the next?  To snatch a gun?  Or to create a web trampoline?  Or does he need a web big enough to catch a car?   

Maybe he needs extra thread to wrap up a villain.

 

All I can say is that I bet Peter Parker sure wishes he had the power of SAS with autotuning to help him make optimal decisions.

 

 For more information

Contributors
Your turn
Sign In!

Want to write an article? Sign in with your profile.


Looking for the Ask the Expert series? Find it in its new home: communities.sas.com/askexpert.