Neural Network Hidden Nodes: How Many Do You Need?

2 Likes

Many people who are new to neural networks will quickly encounter a question: how many hidden nodes do I need for my neural network? This is very common for new modelers, but that question exists for experienced modelers as well. Unfortunately, there is no clear answer to this question. This post will explore why the number of hidden nodes needed is a difficult question to answer.

To explore the issue on the proper number of hidden nodes, we will need a dataset to serve as an example. I will use a fictitious example of a telecommunications company that is having a problem with customer churn. The data is in a SAS table named commsdata. There are 128 columns and 56,000 rows in the table. The target variable is called CHURN and is a binary variable indicating if the customer churned or not. I will not go through all of the variables, but the data will be analyzed using Model Studio in SAS Viya. If you want to get familiar with Model Studio in the context of machine learning, you may want to consider taking the course Machine Learning Using SAS Viya.

In Model Studio you start by establishing some project settings such as how to partition the data into Training and Validation sets. A Test set will not be used for this example. These project settings are an advantage here because once the data is partitioned (which is done at random), the same training and validation data will be used for all models in the project. Further, any other project-wide settings will also remain the same for all models in this project.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

Once the project settings are established, we can start building the modeling pipeline. For this example we will be doing some data cleanup. Although I will not go into the details of what is happening with the cleanup, I will at least present a high-level overview of the data processing before the modeling begins.

Once the data is read into Model Studio, a replacement node is used to replace outliers and/or incorrect data with a more acceptable value. In this case, the majority of the replacements were replacing negative values with zeros.

A transformation node which uses a log transformation for some highly skewed variables is next followed by an imputation node which replaces missing values. Finally there is a text mining node and a variable selection node. The text mining node examines some comments in the file and converts them into some possible input variables. The variable selection node identifies which input variables are most useful in predicting the target variable of churn.

Once all of the data preprocessing steps are completed, the modeling effort can begin. In this case multiple neural network nodes will be used, each with a different number of hidden nodes. I will actually put 20 different neural network models into this pipeline with nodes ranging from 1 to 20. So while it is hard to read, the model looks like this:

Now for those of you who know machine learning models, you might be wondering why I don’t just use autotuning to find the correct number of hidden nodes. Well, autotuning works really well at finding the best settings for many of the hyperparameters of a neural network model. To do this, autotuning usually provides some kind of a search routine to more quickly find the best settings. I wanted to make sure that every number of hidden nodes was used. By doing this, the model comparison node at the end of the pipeline will display all the model fitting statistics for each neural network model. This extra bit of insight might help understand why choosing the correct number of hidden nodes is not clear.

So how did it turn out? Here is a table of each neural network model with some of the fitting statistics. The best model is being chosen by the KS statistic. The table is sorted by the number of nodes in increasing order. The best model is identified by a star.

Looking at the table, you can see that as the number of nodes increases, the KS statistic decreases until 4 nodes, where it takes a fairly large jump. Continuing on down the table, you can see the KS value bouncing around. A graph clearly displays the unexpected behavior.

Looking at the graph, there is no monotonic increase or decrease of the KS statistic as the number of nodes changes. Although the 13-node model had the best results, the 9 and 10 node models are also quite close to being the best. Node 12 is close to one of the worst performing models in spite of it having only one less hidden node than the best model.

Unfortunately, this type of unpredictable pattern is common for neural network models, which is why there is no effective guidance on the best number of hidden nodes to use. Perhaps the best recommendation is to try several different numbers of hidden nodes, or, if possible, use autotuning to help determine the best number of hidden nodes to use. Further guidance can be found in this article written by Warren Sarle.

I hope this post at least illustrates why there is no overall recommendation on the correct number of hidden units to use in a neural network model. While I have not answered how many hidden units to use, hopefully some wise words that I once saw in a cartoon explain it well. We have not succeeded in answering all our questions, we are, at least, confused on a much higher level.

Find more articles from SAS Global Enablement and Learning here.

Neural Network Hidden Nodes: How Many Do You Need?

Ready to see what SAS Viya Copilot can do?

SAS AI and Machine Learning Courses