BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
husseinmazaar
Quartz | Level 8

Dear Colleagues,

 

I have a dataset that represents features from videos and I read about deep learning. It is vey hot topic in machine learning.

I need to know how sas implement or support these techniques in SAS/STAT or sas E-Miner. The HPNeural has appropriate cabablilities to define hidden layers to 10 and hidden neurons. Does these options can claim me to build deep learning model ?

what are the differences between learning in deep and learning in HPneural?

 

I am very pleasure to get a help about this issue.

Best regards

1 ACCEPTED SOLUTION

Accepted Solutions
PatrickHall
Obsidian | Level 7

TL; DR:

 

Test PROC NEURAL with many layers against PROC HPNEURAL with two layers to see which does best.

PROC NEURAL doc is here: http://support.sas.com/documentation/onlinedoc/miner/em43/neural.pdf. 

PROC HPNEURAL doc is available under the "secure documentation" link here: http://support.sas.com/software/products/miner/index.html#s1=3 (password available from tech. support)

 

Code examples here: 

https://github.com/sassoftware/enlighten-deep

https://github.com/sassoftware/enlighten-apply/tree/master/SAS_Neural_PatternRecognition

 

 

Details: 

 

Is your data encoded video like an mpeg? If so you will need to use something besides SAS to decode your video into pixel intensity values. I suggest OpenCV. Once your data is in a standard tabular format containing numerical columns (probably with pixels as columns and frames as rows), then you can read it into SAS easily using PROC IMPORT or a DATA step. Also, remember to standardize before training a neural network.

 

If you are training a neural network with more than two layers, I would suggest using the FREEZE and THAW statements in PROC NEURAL to conduct layer-wise pretraining, and then training all the layers together again. In current releases, HPNEURAL does not provide protection against vanishing or exploding gradients for deep networks - two layers should be fine with HPNEURAL. I would suggest testing a large network two layer network (many hidden units per layer) trained with HPNEURAL against a deeper network trained with PROC NEURAL. I would expect HPNEURAL to be faster than PROC NEURAL, even using PROC NEURAL's multithreading capabilities.

 

The syntax for PROC HPNEURAL is straightfoward, something like: 

 

proc hpneural
  data=frames;
  input pixel:;
  hidden 1000; /* first layer */
  hidden 500; /* second layer */
  target label / level=nom;
  train numtries=1 maxiter=5000;
  /* nthreads=number of cores you want to use */
  /* if you have SAS HPA then you can use the nodes= */ 
  /* option to use more than 1 machine - vroom, vroom! */
  performance nthreads=12 details; 
  score out=frames_score;
run;

 

 

Now for PROC NEURAL ... which is more complicated. PROC NEURAL allows for layerwise pretraining and can you help you avoid one of the most common pratfalls in training deep neural networks: vanishing/exploding gradients. 

 

What are vanishing/exploding gradients? Prior to deep learning neural networks were typically initialized using random numbers. Neural networks generally use the gradient of the network's parameters w.r.t. to the network's error to adjust the parameters to better values in each training iteration. In back propagation, to evaluate this gradient involves the chain rule and you must multiply each layer's parameters and gradients together across all the layers. This is a lot of multiplication, especially for networks with more than 2 layers. If most of the weights across many layers are less than 1 and they are multiplied many times then eventually the gradient just vanishes into a machine-zero and training stops. If most of the parameters across many layers are greater than 1 and they are multiplied many times then eventually the gradient explodes into a huge number and the training process becomes intractable. 

 

PROC NEURAL provides a mechanism to avoid vanishing/exploding gradients in deep networks, by training only one layer of the network at a time. Once all the layers have been initialized through this pre-training process to values that are more suitable for the data, you can usually train the deep network using gradient descent techniques without the problem of vanishing/exploding gradients. It looks like this, roughly:

 

proc neural
  data=frames /* you can assign validation or test data with validdata= or testdata= */ 
  dmdbcat=work.cat_frames /* create required catalog with PROC DMDB */
  random= 12345;
  
  /* take advantage of multithreading */

/* may also need to be allowed on SAS invokation or in SASv9.cfg */   performance compile details cpucount=12 threads= yes;      /* L2 regularization */   netoptions decay= 0.1;      /* define network architecture */   archi MLP hidden= 3;   hidden 100 / id=h1;   hidden 50 / id=h2;   hidden 10 / id=h3; /* Fill in <n> - I noticed : notation sometimes does not work here */   input pixel1-pixel<n> / id=i level=int;   target label / id=t level=nom;     /* tuning parameter that reduces the possibility that any neuron becomes */ /* saturated during initialization */   /* saturation discussion here: http://ow.ly/TGzuF */   *initial infan=0.5;      /* conduct pretraining to find better initilization, time-consuming, */ /* sometimes problematic for deep nets */   *prelim 10 preiter=10;      /* pre-train input layer by freezing all other hidden layers */   /* (I never freeze the target layer, but you can try that too) */   freeze h1->h2;   freeze h2->h3;   train maxtime=10000 maxiter=5000;     /* pre-train first hidden layer by freezing input layer, */ /* and thawing first hidden layer */   freeze i->h1;   thaw h1->h2;   train maxtime=10000 maxiter=5000;     /* pre-train second hidden layer by freezing first hidden layer, */ /* and thawing second hidden layer */   freeze h1->h2;   thaw h2->h3;   train maxtime=10000 maxiter=5000;     /* now that all hidden and input layers have been pre-trained, */ /* train all layers together by thawing all frozen layers */   thaw i->h1;   thaw h1->h2;   /* you can try the robust backprop optimization technique to help control for */ /* vanishing/exploding gradients when training all layers */   train maxtime=10000 maxiter=5000 /* tech=rprop */;      score     data=frames     outfit=frames_fit     out=frames_score  /* you can score validation and test data as well */     role=train; run;

 

 

Please be aware that recent advances in deep learning are hot topics at SAS R&D too and we are hoping to provide much more functionality for deep learning in coming releases ... but - as always - no promises. Enterprise grade scientific software takes time. 

View solution in original post

8 REPLIES 8
PatrickHall
Obsidian | Level 7

TL; DR:

 

Test PROC NEURAL with many layers against PROC HPNEURAL with two layers to see which does best.

PROC NEURAL doc is here: http://support.sas.com/documentation/onlinedoc/miner/em43/neural.pdf. 

PROC HPNEURAL doc is available under the "secure documentation" link here: http://support.sas.com/software/products/miner/index.html#s1=3 (password available from tech. support)

 

Code examples here: 

https://github.com/sassoftware/enlighten-deep

https://github.com/sassoftware/enlighten-apply/tree/master/SAS_Neural_PatternRecognition

 

 

Details: 

 

Is your data encoded video like an mpeg? If so you will need to use something besides SAS to decode your video into pixel intensity values. I suggest OpenCV. Once your data is in a standard tabular format containing numerical columns (probably with pixels as columns and frames as rows), then you can read it into SAS easily using PROC IMPORT or a DATA step. Also, remember to standardize before training a neural network.

 

If you are training a neural network with more than two layers, I would suggest using the FREEZE and THAW statements in PROC NEURAL to conduct layer-wise pretraining, and then training all the layers together again. In current releases, HPNEURAL does not provide protection against vanishing or exploding gradients for deep networks - two layers should be fine with HPNEURAL. I would suggest testing a large network two layer network (many hidden units per layer) trained with HPNEURAL against a deeper network trained with PROC NEURAL. I would expect HPNEURAL to be faster than PROC NEURAL, even using PROC NEURAL's multithreading capabilities.

 

The syntax for PROC HPNEURAL is straightfoward, something like: 

 

proc hpneural
  data=frames;
  input pixel:;
  hidden 1000; /* first layer */
  hidden 500; /* second layer */
  target label / level=nom;
  train numtries=1 maxiter=5000;
  /* nthreads=number of cores you want to use */
  /* if you have SAS HPA then you can use the nodes= */ 
  /* option to use more than 1 machine - vroom, vroom! */
  performance nthreads=12 details; 
  score out=frames_score;
run;

 

 

Now for PROC NEURAL ... which is more complicated. PROC NEURAL allows for layerwise pretraining and can you help you avoid one of the most common pratfalls in training deep neural networks: vanishing/exploding gradients. 

 

What are vanishing/exploding gradients? Prior to deep learning neural networks were typically initialized using random numbers. Neural networks generally use the gradient of the network's parameters w.r.t. to the network's error to adjust the parameters to better values in each training iteration. In back propagation, to evaluate this gradient involves the chain rule and you must multiply each layer's parameters and gradients together across all the layers. This is a lot of multiplication, especially for networks with more than 2 layers. If most of the weights across many layers are less than 1 and they are multiplied many times then eventually the gradient just vanishes into a machine-zero and training stops. If most of the parameters across many layers are greater than 1 and they are multiplied many times then eventually the gradient explodes into a huge number and the training process becomes intractable. 

 

PROC NEURAL provides a mechanism to avoid vanishing/exploding gradients in deep networks, by training only one layer of the network at a time. Once all the layers have been initialized through this pre-training process to values that are more suitable for the data, you can usually train the deep network using gradient descent techniques without the problem of vanishing/exploding gradients. It looks like this, roughly:

 

proc neural
  data=frames /* you can assign validation or test data with validdata= or testdata= */ 
  dmdbcat=work.cat_frames /* create required catalog with PROC DMDB */
  random= 12345;
  
  /* take advantage of multithreading */

/* may also need to be allowed on SAS invokation or in SASv9.cfg */   performance compile details cpucount=12 threads= yes;      /* L2 regularization */   netoptions decay= 0.1;      /* define network architecture */   archi MLP hidden= 3;   hidden 100 / id=h1;   hidden 50 / id=h2;   hidden 10 / id=h3; /* Fill in <n> - I noticed : notation sometimes does not work here */   input pixel1-pixel<n> / id=i level=int;   target label / id=t level=nom;     /* tuning parameter that reduces the possibility that any neuron becomes */ /* saturated during initialization */   /* saturation discussion here: http://ow.ly/TGzuF */   *initial infan=0.5;      /* conduct pretraining to find better initilization, time-consuming, */ /* sometimes problematic for deep nets */   *prelim 10 preiter=10;      /* pre-train input layer by freezing all other hidden layers */   /* (I never freeze the target layer, but you can try that too) */   freeze h1->h2;   freeze h2->h3;   train maxtime=10000 maxiter=5000;     /* pre-train first hidden layer by freezing input layer, */ /* and thawing first hidden layer */   freeze i->h1;   thaw h1->h2;   train maxtime=10000 maxiter=5000;     /* pre-train second hidden layer by freezing first hidden layer, */ /* and thawing second hidden layer */   freeze h1->h2;   thaw h2->h3;   train maxtime=10000 maxiter=5000;     /* now that all hidden and input layers have been pre-trained, */ /* train all layers together by thawing all frozen layers */   thaw i->h1;   thaw h1->h2;   /* you can try the robust backprop optimization technique to help control for */ /* vanishing/exploding gradients when training all layers */   train maxtime=10000 maxiter=5000 /* tech=rprop */;      score     data=frames     outfit=frames_fit     out=frames_score  /* you can score validation and test data as well */     role=train; run;

 

 

Please be aware that recent advances in deep learning are hot topics at SAS R&D too and we are hoping to provide much more functionality for deep learning in coming releases ... but - as always - no promises. Enterprise grade scientific software takes time. 

husseinmazaar
Quartz | Level 8

Thanks alot Mr.Patrick,

 

About the datset, I applied feature extraction to get detector and feature descriptor that represent each video as feature vector with length 14700 and the dataset became tabular with 600 observations (600 videos). The dataset is 600*14700.

 

Thanks again for your cooperation and solution and I hope that SAS release a node to deep learning in next releases.

 

Best regards

Testabcd
Calcite | Level 5

Patrick,

 

I'm trying to run a similar program via Enterprise Guide(7.1) to an EG Server(9.3) and I can never get CPU utilization to go over 25%.  I'm licensed for four cores on the EG Server.  I've tried editing the sasv9.cfg and using CPUCOUNT=4 threads=yes.  I  have Enterprise Miner 13.   

 

Any help would be greatly appreciated!       

 

 

AnnaBrown
Community Manager

Hi Testabcd,

 

I suggest posting your latest question on this thread as a new message on the SAS Enterprise Guide Community. There's a large pool of experts there who will likely help you out.

 

Anna

PatrickHall
Obsidian | Level 7

14700 is too many inputs for PROC NEURAL.

 

Either use less features, say < 500 for PROC NEURAL or use HPNEURAL with 1 or 2 layers.

 

HTH,

p

Testabcd
Calcite | Level 5

Patrick,

Is <500 features a hard number for PROC NEURAL?  I'm currently working with a set that has 730 features.  I'm using your https://github.com/sassoftware/enlighten-deep code to duplicate Hinton's and Salakhutdinov's work for dimensional reduction, by a 730-365-100-2 autoencoder.    

 

 

 

   

PatrickHall
Obsidian | Level 7

No - not a hard number at all, but a bigger problem will take longer and at some point you may run out of resources during training if the training set is too big.

 

To give you some idea - I was able to roughly replicate the paper you referenced using a 300-100-2-100-300 autoencoder built with proc neural, in about 6 hrs. using 12 cores on a server with 128 GB of RAM. Less more/cores + less/more memory = less/more time. 

 

You may find this example code helpful:

https://github.com/sassoftware/enlighten-deep

 

And the exact code I used is in this paper:

https://support.sas.com/resources/papers/proceedings14/SAS313-2014.pdf

 

I suggest using tech=CONGRA for the optimization.

 

Hope that helps ... 

 

Testabcd
Calcite | Level 5

Patrick,

Thank you for the quick reply.  The time you provided for compute with your hardware really puts this in perspective. I'm resource constrained on my SAS server.

 

Thank you for sharing your examples on github!  

 

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 6007 views
  • 4 likes
  • 4 in conversation