Solved: Re: PROC NEURAL outfit= fit statistics ... AIC and SBC Definitions and...

John_O · Posted 08-30-2017 03:58 PM

I am building a predictive model (Neural Network) of interval data using PROC NEURAL. I have run into a difficulty while trying to use Fit Statistics to help select which of a set of candidate models is 'best.'

One option in PROC NEURAL (EM 14.1) is to output fit statistics (e.g. TRAIN outfit = data-set-name). The definitions (and some formulae) of these fit statistics are not easy to locate, but are listed in part here:

http://support.sas.com/documentation/cdl/en/emxndg/67980/HTML/default/viewer.htm#n002icfvzhfd57n1c2p...

and here:

http://support.sas.com/documentation/cdl/en/emxndg/67980/HTML/default/viewer.htm#n0qzrycxmdg039n1kwz...

Generally the variable names listed in this second link correspond (in some fashion - typically abbreviations) to a longer descriptive name of the variable. However, this 2nd link lists _AIC_ and _SBC_ as follows:

Fit Statistics Computed According to the Error Function

Name

Label

_AIC_	Sum of Frequencies
_AVERR_	Total Degrees of Freedom
_ERR_	Divisor for ASE
_SBC_	Train: Average Squared Error

The first link above gives an equation for SBC that is not "Average Squarred Error," but rather the generally accepted use of Schwarzs' Bayesian Criterion, and is in line with the definition (and formula) in PROC GLMSELECT (see: http://documentation.sas.com/?docsetId=statug&docsetVersion=14.2&docsetTarget=statug_glmselect_detai...).

Are the four PROC NEURAL fit statistics from the second link, and in the copied table therefrom (above), simply mis-defined in teh second link, and earlier documentation (i.e.this just a typo in the EM developers notes)? In: SAS(R) Enterprise Miner(TM) 14.1 Extension Nodes: Developer's Guide / Predictive Modelling / Input and Output Data Sets

More generally, do the definitions and formulae used in PROC NEURAL fit statistics generally line up with those used for PROC GLMSELECT? In the first link above there is reference to the formulae 'adjusting' for the type of training used (e.g. least squares vs. maximum liklihood) for SBC (on Predictive Modelling - Generalization). This suggests that in PROC NEURAL fit statistics, _SBC_ is indeed "SBC" and not "Average Squared Error."

Is this also true for _AIC_ (that it is indeed Akaike's Information Criterion, not "Sum of Frequencies")?

If so, what are the formulae for AIC for each type of training ( least squares, maximum likelihood, or M-estimation - i.e. NETOPTIONS OBJECT = DEV, LIKE, or MEST)?

John

RobertBlanchard · Posted 09-06-2017 03:38 PM

Hey John,

Good catch. It is of my belief that the fit statistics you identified in the SAS(R) Enterprise Miner(TM) 14.1 Extension Nodes: Developer's Guide documentation are incorrect. The table, which currently reads as:

Name	Label
_AIC_	Sum of Frequencies
_AVERR_	Total Degrees of Freedom
_ERR_	Divisor for ASE
_SBC_	Train: Average Squared Error

Should instead be written as follows:

Name

Label

_AIC_

Akaike's Information Criterion

_AVERR_

Train: Average Error Function

_ERR_

Train: Error Function

_SBC_

Schwarz's Bayesian criterion

Their calculations should follow the generally accepted calculations for the respective fit statistic.

Akaike's Information Criterion (AIC) — AIC = n*ln(SSE/n) + 2p

Schwarz's Bayesian Criterion (SBC) — SBC = n*ln(SSE/n) + p*ln(n)

Error Function is determined by the target distribution. The SAS Enterprise Miner documentation has a detailed list of the calculations in the Neural Network Node: Reference section. Do you have a distribution in mind? If the assumed distribution is of the exponential family of distributions (normal, gamma, poisson, bernoulli, etc) then minimizing deviance is equivalent to maximizing likelihood, but is more efficient.

Average Error Function is the average of the negative log likelihood function. That is, (negative log likelihood / sum of frequencies).

Best,

Robert

View solution in original post

RobertBlanchard · Posted 09-06-2017 03:38 PM

Hey John,

Good catch. It is of my belief that the fit statistics you identified in the SAS(R) Enterprise Miner(TM) 14.1 Extension Nodes: Developer's Guide documentation are incorrect. The table, which currently reads as:

Name	Label
_AIC_	Sum of Frequencies
_AVERR_	Total Degrees of Freedom
_ERR_	Divisor for ASE
_SBC_	Train: Average Squared Error

Should instead be written as follows:

Name

Label

_AIC_

Akaike's Information Criterion

_AVERR_

Train: Average Error Function

_ERR_

Train: Error Function

_SBC_

Schwarz's Bayesian criterion

Their calculations should follow the generally accepted calculations for the respective fit statistic.

Akaike's Information Criterion (AIC) — AIC = n*ln(SSE/n) + 2p

Schwarz's Bayesian Criterion (SBC) — SBC = n*ln(SSE/n) + p*ln(n)

Error Function is determined by the target distribution. The SAS Enterprise Miner documentation has a detailed list of the calculations in the Neural Network Node: Reference section. Do you have a distribution in mind? If the assumed distribution is of the exponential family of distributions (normal, gamma, poisson, bernoulli, etc) then minimizing deviance is equivalent to maximizing likelihood, but is more efficient.

Average Error Function is the average of the negative log likelihood function. That is, (negative log likelihood / sum of frequencies).

Best,

Robert

RobertBlanchard · Posted 09-07-2017 11:24 AM

Hey John,

Just to follow-up. I would recommend checking out the PROC NEURAL documentation for more details on implementing neural networks in SAS, as well check out Warren Sarle’s FAQ that touches on neural network concepts.

Links:

http://support.sas.com/documentation/onlinedoc/miner/em43/neural.pdf

ftp://ftp.sas.com/pub/neural/FAQ.html#questions

Best,

Robert

John_O · Posted 09-07-2017 12:42 PM

Robert,

Thanks for following-up on this. I have been reading the details a lot over the past few weeks. One issue remains unclear, however, even after your recent reply to my inquiry. How is AIC calculated for multi-output Neural Networks in PROC NEURAL? Even going back to Warren Sarle's 1997 writings (your link below), this isn't exactly clear.

In 1999, Warren posted to a google group on this issue (see: https://groups.google.com/forum/#!topic/comp.ai.neural-nets/iUzUBg04Ggs) giving an answer, but he left 'some wiggle room' regarding conditionally independence of the target variables given the inputs. Is his answer, as posted there, how multi-output AIC was implemented in PROC NEURAL? If so, is this multi-output AIC (or even SBC) more fully described anywhere in SAS documentation (i.e. a citable publication)?

John

**************************************
John H Offenberg, Ph.D.
National Exposure Research Laboratory
US Environmental Protection Agency
109 T.W. Alexander Drive
Research Triangle Park, NC 27711
**************************************

RobertBlanchard · Posted 09-07-2017 03:38 PM

Hey John,

For multiple outputs, the SAS neural network will sum 2*-log likelihood calculated for each target, and then multiple the net of that by 2*number of model parameters.

If you’re using the fit statistics, then you would sum(_ERR__target1+……+_ERR__targetk)+2*_DFM_.

I haven’t found any documentation mentioning this. However, I tested this twice to make sure I could match the AIC for the overall model. Once using a neural network to model two binary targets, and then again using a neural network modeling a binary and a multinomial target.

Good questions. Keep them coming!

Best,

Robert

PROC NEURAL outfit= fit statistics ... AIC and SBC Definitions and Formula Details?

Re: PROC NEURAL outfit= fit statistics ... AIC and SBC Definitions and Formula Details?

Re: PROC NEURAL outfit= fit statistics ... AIC and SBC Definitions and Formula Details?

Re: PROC NEURAL outfit= fit statistics ... AIC and SBC Definitions and Formula Details?

Re: PROC NEURAL outfit= fit statistics ... AIC and SBC Definitions and Formula Details?

Re: PROC NEURAL outfit= fit statistics ... AIC and SBC Definitions and Formula Details?

PROC NEURAL outfit= fit statistics ... AIC and SBC Definitions and Formula Details?

Re: PROC NEURAL outfit= fit statistics ... AIC and SBC Definitions and Formula Details?

Re: PROC NEURAL outfit= fit statistics ... AIC and SBC Definitions and Formula Details?

Re: PROC NEURAL outfit= fit statistics ... AIC and SBC Definitions and Formula Details?

Re: PROC NEURAL outfit= fit statistics ... AIC and SBC Definitions and Formula Details?

Re: PROC NEURAL outfit= fit statistics ... AIC and SBC Definitions and Formula Details?

2025 SAS Hackathon: There is still time!