We’re smarter together. Learn from this collection of community knowledge and add your expertise.

Neural Network Models: Supervised Learning in SAS Visual Data Mining and Machine Learning

by SAS Employee BethEbersole on ‎03-22-2017 11:26 AM - edited on ‎06-14-2017 11:41 AM by Community Manager (954 Views)

Neural Network Models (PROC NNET)

In a previous post, I summarized the supervised learning models (the regressions). In this post, I'll explore neural network models. 

 

Artificial neural networks attempt to mimic the human brain. Neural networks are universal approximators, meaning they can model any input-output relationship. Neural networks are comprised of processing elements (commonly called units or neurons).

 

Examples of neural networks are multilayer perceptrons (MLPs) and radial basis algorithms (RBAs). Multilayer perceptrons can be thought of as regressions on hidden units, and the hidden units as themselves regressions on the original inputs.

 

MLP with Two Hidden Layers.png

 

As the name “multilayer” implies, there are multiple layers. The hidden units include a default link function (an activation function), often the hyperbolic tangent. The connection weights are tuned based on “experience,” i.e., repeated iterations that provide correct or incorrect outputs.

 

SAS PROC NNET trains a multilayer perceptron neural network. This requires that the nonlinear objective function be minimized. Finding a global minimum is impractical, but a good solution can usually be found by training the neural network repeatedly using different sets of initial weight values. The optimization algorithms available in PROC NNET are the limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm (LBFGS) and the stochastic gradient descent algorithm (SGD). Currently up to 10 hidden layers are allowed.  An autoencoder, where the inputs instead of a target are used as the output layer, can be trained as well.

 

Hyperparameter tuning is available in PROC NNET to find the best values for the number of hidden layers and hidden units in each layer, the L1 and L2 regularization parameters, the annealing rate parameter (for SGD optimization), and the learning rate parameter (for SGD optimization).  There are several objective functions to choose from for the optimization algorithm as well as search methods, including one based on a genetic algorithm. 

Contributors
Your turn
Sign In!

Want to write an article? Sign in with your profile.


Looking for the Ask the Expert series? Find it in its new home: communities.sas.com/askexpert.