Re: Neural Network Modelling
I would appreciate if someone could clarify the following points on using Early Stopping when fitting a Neural Network:
Q1. How do we choose the value of parameter M to calculated the number of hidden units required (see course text at page 3-9 and example at page 3.14)? Does M not depend on the sample size n and the number of hidden unit h itself (i.e. M = n/h)?
Q2. How should h be estimated if the target is multinomial or ordinal (see note at page 3.14)?
Q3. Should we be concerned if the performance between Training and Validation show a significant divergence at iteration 0, like the chart on page 3.15, instead of the more regular pattern as shown on page 3.9? In other words, would that be an indication of possible issues with the initial parameter values?
Q4. Am I right in saying that Early Stopping be used with both Multi Layer Perceptron and Radial Basis Functions networks?
Q5. Can Early Stopping be used with Proc Neural as well? Would it be a matter of partitioning the data and then assessing the performance on Training and Validation datasets? (which would, of course, require some coding)