BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
pvareschi
Quartz | Level 8

Re: Neural Network Modelling

I would appreciate if someone could clarify the following points on using Early Stopping when fitting a Neural Network:

Q1. How do we choose the value of parameter M to calculated the number of hidden units required (see course text at page 3-9 and example at page 3.14)? Does M not depend on the sample size n and the number of hidden unit h itself (i.e. M = n/h)?

Q2. How should h be estimated if the target is multinomial or ordinal (see note at page 3.14)?
Q3. Should we be concerned if the performance between Training and Validation show a significant divergence at iteration 0, like the chart on page 3.15, instead of the more regular pattern as shown on page 3.9? In other words, would that be an indication of possible issues with the initial parameter values?
Q4. Am I right in saying that Early Stopping be used with both Multi Layer Perceptron and Radial Basis Functions networks?

Q5. Can Early Stopping be used with Proc Neural as well? Would it be a matter of partitioning the data and then assessing the performance on Training and Validation datasets? (which would, of course, require some coding)

1 ACCEPTED SOLUTION

Accepted Solutions
RobertBlanchard
SAS Employee

Hey Pvareschi,

Great questions.
A1) The optimal number of neurons (h) is unknown, and therefore estimated using the described method. M can be thought of as a hyperparameter for the estimation method.
A2) I'm not sure. I tried to review the original paper by Harrell et al. 1996, “Multivariate Prognostic Models: Issues in Developing
Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors.” but Wiley would charge me to review. My apologies. I do not have access to the paper.
A3) Yes. This is an indication preliminary training has caused your model to over fit the data. Should this occur, consider reducing the number of steps/iterations for each pre-training session.
A4) Yes. That would follow good practice.
A5) Yes. See the attached program as an example.

View solution in original post

1 REPLY 1
RobertBlanchard
SAS Employee

Hey Pvareschi,

Great questions.
A1) The optimal number of neurons (h) is unknown, and therefore estimated using the described method. M can be thought of as a hyperparameter for the estimation method.
A2) I'm not sure. I tried to review the original paper by Harrell et al. 1996, “Multivariate Prognostic Models: Issues in Developing
Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors.” but Wiley would charge me to review. My apologies. I do not have access to the paper.
A3) Yes. This is an indication preliminary training has caused your model to over fit the data. Should this occur, consider reducing the number of steps/iterations for each pre-training session.
A4) Yes. That would follow good practice.
A5) Yes. See the attached program as an example.