TL; DR:
Test PROC NEURAL with many layers against PROC HPNEURAL with two layers to see which does best.
PROC NEURAL doc is here: http://support.sas.com/documentation/onlinedoc/miner/em43/neural.pdf.
PROC HPNEURAL doc is available under the "secure documentation" link here: http://support.sas.com/software/products/miner/index.html#s1=3 (password available from tech. support)
Code examples here:
https://github.com/sassoftware/enlighten-deep
https://github.com/sassoftware/enlighten-apply/tree/master/SAS_Neural_PatternRecognition
Details:
Is your data encoded video like an mpeg? If so you will need to use something besides SAS to decode your video into pixel intensity values. I suggest OpenCV. Once your data is in a standard tabular format containing numerical columns (probably with pixels as columns and frames as rows), then you can read it into SAS easily using PROC IMPORT or a DATA step. Also, remember to standardize before training a neural network.
If you are training a neural network with more than two layers, I would suggest using the FREEZE and THAW statements in PROC NEURAL to conduct layer-wise pretraining, and then training all the layers together again. In current releases, HPNEURAL does not provide protection against vanishing or exploding gradients for deep networks - two layers should be fine with HPNEURAL. I would suggest testing a large network two layer network (many hidden units per layer) trained with HPNEURAL against a deeper network trained with PROC NEURAL. I would expect HPNEURAL to be faster than PROC NEURAL, even using PROC NEURAL's multithreading capabilities.
The syntax for PROC HPNEURAL is straightfoward, something like:
proc hpneural
data=frames;
input pixel:;
hidden 1000; /* first layer */
hidden 500; /* second layer */
target label / level=nom;
train numtries=1 maxiter=5000;
/* nthreads=number of cores you want to use */
/* if you have SAS HPA then you can use the nodes= */
/* option to use more than 1 machine - vroom, vroom! */
performance nthreads=12 details;
score out=frames_score;
run;
Now for PROC NEURAL ... which is more complicated. PROC NEURAL allows for layerwise pretraining and can you help you avoid one of the most common pratfalls in training deep neural networks: vanishing/exploding gradients.
What are vanishing/exploding gradients? Prior to deep learning neural networks were typically initialized using random numbers. Neural networks generally use the gradient of the network's parameters w.r.t. to the network's error to adjust the parameters to better values in each training iteration. In back propagation, to evaluate this gradient involves the chain rule and you must multiply each layer's parameters and gradients together across all the layers. This is a lot of multiplication, especially for networks with more than 2 layers. If most of the weights across many layers are less than 1 and they are multiplied many times then eventually the gradient just vanishes into a machine-zero and training stops. If most of the parameters across many layers are greater than 1 and they are multiplied many times then eventually the gradient explodes into a huge number and the training process becomes intractable.
PROC NEURAL provides a mechanism to avoid vanishing/exploding gradients in deep networks, by training only one layer of the network at a time. Once all the layers have been initialized through this pre-training process to values that are more suitable for the data, you can usually train the deep network using gradient descent techniques without the problem of vanishing/exploding gradients. It looks like this, roughly:
proc neural
data=frames /* you can assign validation or test data with validdata= or testdata= */
dmdbcat=work.cat_frames /* create required catalog with PROC DMDB */
random= 12345;
/* take advantage of multithreading */ /* may also need to be allowed on SAS invokation or in SASv9.cfg */
performance compile details cpucount=12 threads= yes;
/* L2 regularization */
netoptions decay= 0.1;
/* define network architecture */
archi MLP hidden= 3;
hidden 100 / id=h1;
hidden 50 / id=h2;
hidden 10 / id=h3;
/* Fill in <n> - I noticed : notation sometimes does not work here */
input pixel1-pixel<n> / id=i level=int;
target label / id=t level=nom;
/* tuning parameter that reduces the possibility that any neuron becomes */
/* saturated during initialization */
/* saturation discussion here: http://ow.ly/TGzuF */
*initial infan=0.5;
/* conduct pretraining to find better initilization, time-consuming, */
/* sometimes problematic for deep nets */
*prelim 10 preiter=10;
/* pre-train input layer by freezing all other hidden layers */
/* (I never freeze the target layer, but you can try that too) */
freeze h1->h2;
freeze h2->h3;
train maxtime=10000 maxiter=5000;
/* pre-train first hidden layer by freezing input layer, */
/* and thawing first hidden layer */
freeze i->h1;
thaw h1->h2;
train maxtime=10000 maxiter=5000;
/* pre-train second hidden layer by freezing first hidden layer, */
/* and thawing second hidden layer */
freeze h1->h2;
thaw h2->h3;
train maxtime=10000 maxiter=5000;
/* now that all hidden and input layers have been pre-trained, */
/* train all layers together by thawing all frozen layers */
thaw i->h1;
thaw h1->h2;
/* you can try the robust backprop optimization technique to help control for */
/* vanishing/exploding gradients when training all layers */
train maxtime=10000 maxiter=5000 /* tech=rprop */;
score
data=frames
outfit=frames_fit
out=frames_score
/* you can score validation and test data as well */
role=train;
run;
Please be aware that recent advances in deep learning are hot topics at SAS R&D too and we are hoping to provide much more functionality for deep learning in coming releases ... but - as always - no promises. Enterprise grade scientific software takes time.
... View more