Watch this Ask the Expert session to learn the fundamental components of deep learning and how deep learning is different from traditional neural network modeling.
Watch the webinar
You will learn:
The fundamentals of deep learning.
How to use deep learning in SAS.
What autoencoder models are and how they can be used.
The questions from the Q&A segment held at the end of the webinar are listed below. The slides and code from the webinar are attached.
Q&A
What sample size is needed for these models?
Let me address it from two different perspectives. If you're looking at your mini batch, so if you're using stochastic gradient ascent, Adam or a similar variant, then the smaller the mini batch, usually the better the performance up to a certain point. Small mini batches can sometimes result in what's referred to as a floating point exception error and this is a pain in the butt in deep learning, but it's effectively when your model’s errors skyrocket really high. SAS will stop and say, “Whoa, your model is out in space, it needs to come back to Earth, try again.” So, from a mini batch perspective, you typically want to steer towards a smaller mini batch. But just know that that has a tradeoff. It could destabilize your training process where you get that floating point exception, and it becomes a big pain in the butt. From a training sample size space, the more data the better in the sense of generalization if you have the computational cost. One caveat if you're doing a NAS search (neural architecture search) or you're applying SAS autotuning to your deep learning model to search for the optimal model architecture or hyperparameters, you will want to take a sample because you want to iterate lots of times. You want lots of different architectures and designs to be tested. So, from that perspective, you'd probably like a smaller sample, but something that's representative.
How good is SAS for deep learning when you have an unbalanced dataset?
SAS offers you several tools for handling an unbalanced data set. I didn't show this in the webinar because our data is perfectly balanced, but you can incorporate a weights column in your data and in DL Train you'll say, “SAS, I have a weights column, use those weights and apply those weights to the class output,” and so SAS will adjust that probability threshold cutoff. It will take the inverse of the value that you put in and apply that or it may take that value and apply that to adjust the probability cutoff.
We also have image augmentation, which I did not show, but I would use that image action set to create permutations of the images and know thy data. One of my colleagues and I worked on a project where we were taking images from different videos and these videos were taken at different times of day. So, to create a good holdout data set, we would take from different videos and these videos contained vehicles moving around and people moving around. Some of the vehicles were only filmed at night whereas others were only filmed during the day. Know thy data. We saw this and we identified this because we segmented out per entity what we were detecting and we looked at the average pixel density and we noticed, “Hey, some of these entities have really low pixel densities and some have really high.” We looked at the data and saw some are only filmed at night and some are only filmed during the day. We took those day images and we created new representations where we darkened the image and we usually will apply some other geomorphic transformation at the same time. And with the vehicles that were only captured at night, we lightened those images and applied geomorphic transformation, so flipping the images as well and correcting for the coordinates. So, for unbalanced data, augment your data, use the weights variable in the training process as well. And we have GANs so you can generate synthetic data using GANs.
Is there a way to see the misclassified images?
Yes, I have a program for doing that and I didn't include it. So, you saw DL Train and in the Python notebook you saw DL Score, but I didn't include DL Score in the SAS program. After you train your model, you can use DL Score to score data and it will output a data set which contains your probabilities for every class, the class prediction, as well as the original label column. And then from there, you just say where the prediction doesn't equal the actual, let’s call it, a 1 and then you can plot those out. That's what I've done in the past.
Can you please share some examples of deep learning on health care claims data?
To date, I have not applied deep learning to healthcare claims data. If prediction is of highest importance (healthcare hot spotting), then I’d recommend including deep learning in your toolset alongside other models such as SAS’s XGBoost (PROC GRADBOOST). But if inference, or understanding the input-output relationship is important, then you could use SAS deep causal, which may be found here:
https://go.documentation.sas.com/doc/en/pgmsascdc/v_022/casecon/casecon_deepcausal_overview.htm
As a shortcut the import of swat works for other statistical models, like logistics regressions as well? If not, are there other specific libraries in Python for specific functions?
Yes, SWAT allows programmers of R, Python, Java, or Lua to use SAS multithreaded models without having to code in another language. In addition, a user can leverage other tools created by SAS outside of just the models.
Have you ever seen these models applied to survey data?
I personally have not applied deep learning to survey data, but I know SAS has a new procedure that leverages deep learning for causal analyses. Perhaps one could use survey specific procedures, like the ones found in this white paper: https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2020/4635-2020.pdf , with deep causal to better understand the results from the survey. SAS deep causal may be found here:
https://go.documentation.sas.com/doc/en/pgmsascdc/v_022/casecon/casecon_deepcausal_overview.htm
Can these models be run using the Viya GUI?
As of today, no. But creating an interactive GUI for deep learning is under active consideration at SAS. Although, a user can create custom task within SAS Studio which can be used to call the deep learning functionality. I’ve worked on a project where the customer wanted their users to build computer vision models through point-and-click interface, and we developed a custom task to meet their computer vision needs.
Recommended Resources
Deep Learning Using SAS® Software
Deep Learning for Computer Vision with SAS: An Introduction
ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION
How Does Batch Normalization Help Optimization?
Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Robert Blanchard YouTube Playlist
Want more tips? Be sure to subscribe to the Ask the Expert board to receive follow up Q&A, slides and recordings from other SAS Ask the Expert webinars.
... View more