I am building a predictive model (Neural Network) of interval data using PROC NEURAL. I have run into a difficulty while trying to use Fit Statistics to help select which of a set of candidate models is 'best.'
One option in PROC NEURAL (EM 14.1) is to output fit statistics (e.g. TRAIN outfit = data-set-name). The definitions (and some formulae) of these fit statistics are not easy to locate, but are listed in part here:
and here:
Generally the variable names listed in this second link correspond (in some fashion - typically abbreviations) to a longer descriptive name of the variable. However, this 2nd link lists _AIC_ and _SBC_ as follows:
Fit Statistics Computed According to the Error Function
_AIC_ | Sum of Frequencies |
_AVERR_ | Total Degrees of Freedom |
_ERR_ | Divisor for ASE |
_SBC_ | Train: Average Squared Error |
The first link above gives an equation for SBC that is not "Average Squarred Error," but rather the generally accepted use of Schwarzs' Bayesian Criterion, and is in line with the definition (and formula) in PROC GLMSELECT (see: http://documentation.sas.com/?docsetId=statug&docsetVersion=14.2&docsetTarget=statug_glmselect_detai...).
Are the four PROC NEURAL fit statistics from the second link, and in the copied table therefrom (above), simply mis-defined in teh second link, and earlier documentation (i.e.this just a typo in the EM developers notes)? In: SAS(R) Enterprise Miner(TM) 14.1 Extension Nodes: Developer's Guide / Predictive Modelling / Input and Output Data Sets
More generally, do the definitions and formulae used in PROC NEURAL fit statistics generally line up with those used for PROC GLMSELECT? In the first link above there is reference to the formulae 'adjusting' for the type of training used (e.g. least squares vs. maximum liklihood) for SBC (on Predictive Modelling - Generalization). This suggests that in PROC NEURAL fit statistics, _SBC_ is indeed "SBC" and not "Average Squared Error."
Is this also true for _AIC_ (that it is indeed Akaike's Information Criterion, not "Sum of Frequencies")?
If so, what are the formulae for AIC for each type of training ( least squares, maximum likelihood, or M-estimation - i.e. NETOPTIONS OBJECT = DEV, LIKE, or MEST)?
John
Hey John,
Good catch. It is of my belief that the fit statistics you identified in the SAS(R) Enterprise Miner(TM) 14.1 Extension Nodes: Developer's Guide documentation are incorrect. The table, which currently reads as:
Name |
Label |
_AIC_ |
Sum of Frequencies |
_AVERR_ |
Total Degrees of Freedom |
_ERR_ |
Divisor for ASE |
_SBC_ |
Train: Average Squared Error |
Should instead be written as follows:
Name |
Label |
||
_AIC_ |
Akaike's Information Criterion |
||
_AVERR_ |
|
||
_ERR_ |
Train: Error Function |
||
_SBC_ |
Schwarz's Bayesian criterion |
Their calculations should follow the generally accepted calculations for the respective fit statistic.
Akaike's Information Criterion (AIC) — AIC = n*ln(SSE/n) + 2p
Schwarz's Bayesian Criterion (SBC) — SBC = n*ln(SSE/n) + p*ln(n)
Error Function is determined by the target distribution. The SAS Enterprise Miner documentation has a detailed list of the calculations in the Neural Network Node: Reference section. Do you have a distribution in mind? If the assumed distribution is of the exponential family of distributions (normal, gamma, poisson, bernoulli, etc) then minimizing deviance is equivalent to maximizing likelihood, but is more efficient.
Average Error Function is the average of the negative log likelihood function. That is, (negative log likelihood / sum of frequencies).
Best,
Robert
Hey John,
Good catch. It is of my belief that the fit statistics you identified in the SAS(R) Enterprise Miner(TM) 14.1 Extension Nodes: Developer's Guide documentation are incorrect. The table, which currently reads as:
Name |
Label |
_AIC_ |
Sum of Frequencies |
_AVERR_ |
Total Degrees of Freedom |
_ERR_ |
Divisor for ASE |
_SBC_ |
Train: Average Squared Error |
Should instead be written as follows:
Name |
Label |
||
_AIC_ |
Akaike's Information Criterion |
||
_AVERR_ |
|
||
_ERR_ |
Train: Error Function |
||
_SBC_ |
Schwarz's Bayesian criterion |
Their calculations should follow the generally accepted calculations for the respective fit statistic.
Akaike's Information Criterion (AIC) — AIC = n*ln(SSE/n) + 2p
Schwarz's Bayesian Criterion (SBC) — SBC = n*ln(SSE/n) + p*ln(n)
Error Function is determined by the target distribution. The SAS Enterprise Miner documentation has a detailed list of the calculations in the Neural Network Node: Reference section. Do you have a distribution in mind? If the assumed distribution is of the exponential family of distributions (normal, gamma, poisson, bernoulli, etc) then minimizing deviance is equivalent to maximizing likelihood, but is more efficient.
Average Error Function is the average of the negative log likelihood function. That is, (negative log likelihood / sum of frequencies).
Best,
Robert
Hey John,
Just to follow-up. I would recommend checking out the PROC NEURAL documentation for more details on implementing neural networks in SAS, as well check out Warren Sarle’s FAQ that touches on neural network concepts.
Links:
http://support.sas.com/documentation/onlinedoc/miner/em43/neural.pdf
ftp://ftp.sas.com/pub/neural/FAQ.html#questions
Best,
Robert
Hey John,
For multiple outputs, the SAS neural network will sum 2*-log likelihood calculated for each target, and then multiple the net of that by 2*number of model parameters.
If you’re using the fit statistics, then you would sum(_ERR_target1+……+_ERR_targetk)+2*_DFM_.
I haven’t found any documentation mentioning this. However, I tested this twice to make sure I could match the AIC for the overall model. Once using a neural network to model two binary targets, and then again using a neural network modeling a binary and a multinomial target.
Good questions. Keep them coming!
Best,
Robert
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.