BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
John_O
Calcite | Level 5

I am building a predictive model (Neural Network) of interval data using PROC NEURAL.  I have run into a difficulty while trying to use Fit Statistics to help select which of a set of candidate models is 'best.'  

 

One option in PROC NEURAL (EM 14.1) is to output fit statistics (e.g. TRAIN outfit = data-set-name). The definitions (and some formulae) of these fit statistics are not easy to locate, but are listed in part here:

 

http://support.sas.com/documentation/cdl/en/emxndg/67980/HTML/default/viewer.htm#n002icfvzhfd57n1c2p...  

 

and here:

 

http://support.sas.com/documentation/cdl/en/emxndg/67980/HTML/default/viewer.htm#n0qzrycxmdg039n1kwz...

 

 

Generally the variable names listed in this second link correspond (in some fashion - typically abbreviations) to a longer descriptive name of the variable.  However, this 2nd link lists _AIC_ and _SBC_ as follows: 

 

Fit Statistics Computed According to the Error Function

Name
Label
_AIC_
Sum of Frequencies
_AVERR_
Total Degrees of Freedom
_ERR_
Divisor for ASE
_SBC_
Train: Average Squared Error

 

 

The first link above gives an equation for SBC that is not "Average Squarred Error," but rather the generally accepted use of Schwarzs' Bayesian Criterion, and is in line with the definition (and formula) in PROC GLMSELECT (see: http://documentation.sas.com/?docsetId=statug&docsetVersion=14.2&docsetTarget=statug_glmselect_detai...). 

Are the four PROC NEURAL fit statistics from the second link, and in the copied table therefrom (above), simply mis-defined in teh second link, and earlier documentation (i.e.this just a typo in the EM developers notes)?  In: SAS(R) Enterprise Miner(TM) 14.1 Extension Nodes: Developer's Guide  /  Predictive Modelling  /  Input and Output Data Sets

 

More generally, do the definitions and formulae used in PROC NEURAL fit statistics generally line up with those used for PROC GLMSELECT?  In the first link above there is reference to the formulae 'adjusting' for the type of training used (e.g. least squares vs. maximum liklihood) for SBC (on Predictive Modelling - Generalization). This suggests that in PROC NEURAL fit statistics, _SBC_ is indeed "SBC" and not "Average Squared Error." 

 

Is this also true for _AIC_ (that it is indeed Akaike's Information Criterion, not "Sum of Frequencies")?  

 

If so, what are the formulae for AIC for each type of training ( least squares, maximum likelihood, or M-estimation -  i.e.  NETOPTIONS OBJECT = DEV, LIKE, or MEST)? 

 

John

1 ACCEPTED SOLUTION

Accepted Solutions
RobertBlanchard
SAS Employee

Hey John,

 

Good catch.  It is of my belief that the fit statistics you identified in the SAS(R) Enterprise Miner(TM) 14.1 Extension Nodes: Developer's Guide documentation are incorrect.  The table, which currently reads as:

Name

Label

_AIC_

Sum of Frequencies

_AVERR_

Total Degrees of Freedom

_ERR_

Divisor for ASE

_SBC_

Train: Average Squared Error

 

Should instead be written as follows:

Name

Label

_AIC_

Akaike's Information Criterion

_AVERR_

Train: Average Error Function

 

_ERR_

Train: Error Function

_SBC_

Schwarz's Bayesian criterion

 

Their calculations should follow the generally accepted calculations for the respective fit statistic. 

Akaike's Information Criterion (AIC) — AIC = n*ln(SSE/n) + 2p

Schwarz's Bayesian Criterion (SBC) — SBC = n*ln(SSE/n) + p*ln(n)

Error Function is determined by the target distribution.  The SAS Enterprise Miner documentation has a detailed list of the calculations in the Neural Network Node: Reference section.  Do you have a distribution in mind?  If the assumed distribution is of the exponential family of distributions (normal, gamma, poisson, bernoulli, etc) then minimizing deviance is equivalent to maximizing likelihood, but is more efficient.     

 

Average Error Function is the average of the negative log likelihood function.  That is, (negative log likelihood / sum of frequencies).

 

Best,

  Robert

 

View solution in original post

4 REPLIES 4
RobertBlanchard
SAS Employee

Hey John,

 

Good catch.  It is of my belief that the fit statistics you identified in the SAS(R) Enterprise Miner(TM) 14.1 Extension Nodes: Developer's Guide documentation are incorrect.  The table, which currently reads as:

Name

Label

_AIC_

Sum of Frequencies

_AVERR_

Total Degrees of Freedom

_ERR_

Divisor for ASE

_SBC_

Train: Average Squared Error

 

Should instead be written as follows:

Name

Label

_AIC_

Akaike's Information Criterion

_AVERR_

Train: Average Error Function

 

_ERR_

Train: Error Function

_SBC_

Schwarz's Bayesian criterion

 

Their calculations should follow the generally accepted calculations for the respective fit statistic. 

Akaike's Information Criterion (AIC) — AIC = n*ln(SSE/n) + 2p

Schwarz's Bayesian Criterion (SBC) — SBC = n*ln(SSE/n) + p*ln(n)

Error Function is determined by the target distribution.  The SAS Enterprise Miner documentation has a detailed list of the calculations in the Neural Network Node: Reference section.  Do you have a distribution in mind?  If the assumed distribution is of the exponential family of distributions (normal, gamma, poisson, bernoulli, etc) then minimizing deviance is equivalent to maximizing likelihood, but is more efficient.     

 

Average Error Function is the average of the negative log likelihood function.  That is, (negative log likelihood / sum of frequencies).

 

Best,

  Robert

 

RobertBlanchard
SAS Employee

Hey John,

Just to follow-up.  I would recommend checking out the PROC NEURAL documentation for more details on implementing neural networks in SAS, as well check out Warren Sarle’s FAQ that touches on neural network concepts.

 

Links:

http://support.sas.com/documentation/onlinedoc/miner/em43/neural.pdf

 

ftp://ftp.sas.com/pub/neural/FAQ.html#questions

 

Best,

  Robert

John_O
Calcite | Level 5
Robert,

Thanks for following-up on this. I have been reading the details a lot over the past few weeks. One issue remains unclear, however, even after your recent reply to my inquiry. How is AIC calculated for multi-output Neural Networks in PROC NEURAL? Even going back to Warren Sarle's 1997 writings (your link below), this isn't exactly clear.

In 1999, Warren posted to a google group on this issue (see: https://groups.google.com/forum/#!topic/comp.ai.neural-nets/iUzUBg04Ggs) giving an answer, but he left 'some wiggle room' regarding conditionally independence of the target variables given the inputs. Is his answer, as posted there, how multi-output AIC was implemented in PROC NEURAL? If so, is this multi-output AIC (or even SBC) more fully described anywhere in SAS documentation (i.e. a citable publication)?

John

**************************************
John H Offenberg, Ph.D.
National Exposure Research Laboratory
US Environmental Protection Agency
109 T.W. Alexander Drive
Research Triangle Park, NC 27711
**************************************

RobertBlanchard
SAS Employee

Hey John,

 

For multiple outputs, the SAS neural network will sum 2*-log likelihood calculated for each target, and then multiple the net of that by 2*number of model parameters.

 

If you’re using the fit statistics, then you would sum(_ERR_target1+……+_ERR_targetk)+2*_DFM_.

 

I haven’t found any documentation mentioning this.  However, I tested this twice to make sure I could match the AIC for the overall model.  Once using a neural network to model two binary targets, and then again using a neural network modeling a binary and a multinomial target.   

 

Good questions.  Keep them coming!

Best,

  Robert

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 2813 views
  • 2 likes
  • 2 in conversation