How to build a deep learning model in SAS Enterprise Miner

husseinmazaar

Question

I have a dataset that represents features from videos and I've read about deep learning. It is a very hot topic in machine learning.

How does SAS implement or support these techniques in SAS/STAT or SAS Enterprise Miner? The HPNeural procedure has appropriate cabablilities to define up to 10 hidden layers and neurons. Can these options help me to build a deep learning model?

Answer

(answered by Patrick Hall, SAS Sr. Staff Scientist)

TL; DR:

Test PROC NEURAL with many layers against PROC HPNEURAL with two layers to see which performs best on test data.

PROC NEURAL doc is here: http://support.sas.com/documentation/onlinedoc/miner/em43/neural.pdf.

PROC HPNEURAL doc is available under the "secure documentation" link here: http://support.sas.com/software/products/miner/index.html#s1=3 (password available from tech. support)

Code examples here:

https://github.com/sassoftware/enlighten-deep

Details:

Is your data encoded video like an mpeg? If so you will need to use something besides SAS to decode your video into pixel intensity values. OpenCV (http://opencv.org/) is one possibility for decoding images and so is JMP (http://ow.ly/Ui4m7). Once your data is in a standard tabular format containing numerical columns (probably with pixels as the columns and frames as the rows), then you can read it into SAS easily using PROC IMPORT or a DATA step. Also, remember to standardize your input data before training a neural network.

If you are training a neural network with two layers PROC HPNEURAL will be fine. (In releases <= 14.1, HPNEURAL does not provide protection against vanishing or exploding gradients caused by many layers.) If you are training a neural network with more than two layers, I would suggest using the FREEZE and THAW statements in PROC NEURAL to conduct layer-wise pre-training, and then train all the layers together again. I would also suggest testing a large two layer network (many hidden units per layer) trained with HPNEURAL against a deeper network trained with PROC NEURAL. I would expect HPNEURAL to be faster than PROC NEURAL, even using PROC NEURAL's multithreading capabilities.

The syntax for PROC HPNEURAL is straightfoward, something like:

proc hpneural
  data=frames;
  input pixel:;
  hidden 1000; /* first layer */
  hidden 500; /* second layer */
  target label / level=nom;
  train numtries=1 maxiter=5000;
  /* nthreads=number of cores you want to use */
  /* if you have SAS HPA then you can use the nodes= */ 
  /* option to use more than 1 machine - vroom, vroom! */
  performance nthreads=12 details; 
  score out=frames_score;
run;

Now for PROC NEURAL ... which is more complicated. PROC NEURAL allows for layerwise pretraining and can help you avoid one of the most common pratfalls in training deep neural networks: vanishing and exploding gradients.

A very short and simple explanation of vanishing and exploding gradients: Prior to deep learning, neural network’s parameters were typically initialized using random number schemes. Then during training, neural networks generally use the gradient of the network's parameters with respect to the network's error to adjust the parameters to better values in each training iteration. In the standard neural network training technique back propagation, evaluating this gradient involves the chain rule and you must multiply each layer's parameters and gradients together across all the layers. This is a lot of multiplication, especially for networks with more than 2 layers. If most of the weights across many layers are less than 1 and they are multiplied many times, then eventually the gradient just vanishes into a machine-zero and training slows to a crawl or stops completely. If most of the parameters across many layers are greater than 1 and they are multiplied many times, then eventually the gradient explodes into a huge number and the training process becomes intractable.

Figure: A diagram representing the layerwise pre-training available in PROC NEURAL.

PROC NEURAL provides a mechanism to help you avoid vanishing and exploding gradients in deep networks by training only one layer of the network at a time. Once all the layers have been initialized through this pre-training process to values that are usually more suitable for the data, you can usually train the deep network using standard techniques without the problem of vanishing and exploding gradients.

The PROC NEURAL syntax looks like this, roughly:

proc neural
  data=frames /* you can assign validation or test data with validdata= or testdata= */ 
  dmdbcat=work.cat_frames /* create required catalog with PROC DMDB */
  random=12345;  /* take advantage of multithreading */    /* may also need to be allowed on SAS invokation or in SASv9.cfg */  /* Fill in <n> below */
  performance compile details cpucount=<n> threads=yes; 
 
  /* L2 regularization */
  netoptions decay=0.1; 
 
  /* define network architecture */
  archi MLP hidden=3;
  hidden 100 / id=h1;
  hidden 50 / id=h2;
  hidden 10 / id=h3;
  /* Fill in <n> below */
  input pixel1-pixel<n> / id=i level=int;  
  target label / id=t level=nom;
 
  /* tuning parameter that reduces the possibility that any neuron becomes */
  /* saturated during initialization */
  /* saturation discussion here: http://ow.ly/TGzuF */
  *initial infan=0.5; 
 
  /* conduct pretraining to find better initilization, time-consuming, */
  /* sometimes problematic for deep nets */
  *prelim 10 preiter=10; 
 
  /* pre-train input layer by freezing all other hidden layers */
  /* (I never freeze the target layer, but you can try that too) */
  freeze h1->h2;
  freeze h2->h3;
  train maxtime=10000 maxiter=5000;
 
  /* pre-train first hidden layer by freezing input layer, */
  /* and thawing first hidden layer */
  freeze i->h1;
  thaw h1->h2;
  train maxtime=10000 maxiter=5000;
 
  /* pre-train second hidden layer by freezing first hidden layer, */
  /* and thawing second hidden layer */
  freeze h1->h2;
  thaw h2->h3;
  train maxtime=10000 maxiter=5000;
 
  /* now that all hidden and input layers have been pre-trained, */
  /* train all layers together by thawing all frozen layers */
  thaw i->h1;
  thaw h1->h2;
  /* you can try the robust backprop optimization technique to help control for */
  /* vanishing/exploding gradients when training all layers */
  train maxtime=10000 maxiter=5000 /* tech=rprop */; 
 
  score
    data=frames
    outfit=frames_fit
    out=frames_score 
    /* you can score validation and test data as well */
    role=train; 

run;

Please be aware that recent advances in deep learning are hot topics at SAS R&D too and we are hoping to provide much more functionality for deep learning in coming releases ... but - as always - no promises or gaurantees about products or timelines.

roushankumar · ‎09-11-2018

Hi,

When I use your code

proc hpneural
  data=frames;
  input pixel:;
  hidden 1000; /* first layer */
  hidden 500; /* second layer */
  target label / level=nom;
  train numtries=1 maxiter=5000;
  /* nthreads=number of cores you want to use */
  /* if you have SAS HPA then you can use the nodes= */ 
  /* option to use more than 1 machine - vroom, vroom! */
  performance nthreads=12 details; 
  score out=frames_score;
run;

Where can I see the activation functions of the hidden layer and output layer? Can I also specify which activation function I want to have for the first layer and which one I want for the second layer?

Thanks!

WendyCzika · ‎09-11-2018

By default, TANH is the activation function for each hidden layer, but you can change it to SIN, IDENTITY, or TANH for each layer:

hidden 1000 / act=sin;

hidden 500 / act = tanh;

roushankumar · ‎09-11-2018

Thanks! Most sas procedures have a 'by' command to create models by a specific class of variable. Does proc hpneural have anything similar? Going around in loops is taking a lot of time.