I answered my own question. In a different part of the HELP section is this description: In linear models, statisticians routinely use the mean squared error (MSE) as the main measure of fit. The MSE is the sum of squared errors (SSE) divided by the degrees of freedom for error. (DFE is the number of cases less the number of weights in the model.) This process yields an unbiased estimate of the population noise variance under the usual assumptions. For neural networks and decision trees, there is no known unbiased estimator. Furthermore, the DFE is often negative for neural networks. There exist approximations for the effective degrees of freedom, but these are often prohibitively expensive and are based on assumptions that might not hold. Hence, the MSE is not nearly as useful for neural networks as it is for linear models. One common solution is to divide the SSE by the number of cases N, not the DFE. This quantity, SSE/N, is referred to as the average squared error (ASE).
... View more