topic Re: Calculation of ASE in SAS Academy for Data Science

Calculation of ASE

pvareschi — Mon, 11 May 2020 09:49:29 GMT

Re: Applied Analytics Using SAS Enterprise Miner

Would it be possible to clarify how ASE (Average Square Error) is calculated (its definition is given at page 3.72 of the course notes)?

Asking this because, by looking at the output from any modelling node, it looks like the denominator is based on the total number of cases in the whole sample (Training+Validation), not just Training or Validation datasets (see image at page 3.89 for output from for Decision Tree; same applies to Regression node - see example at page 4-43).

Moreover, in the output from a Regression node, the Mean Square Error (MSE) should be calculated as Sum of Squared Errors (SSE) divided by the Degrees of Freedom or Error (DFE); however, that does not seem to be the case; here is a screenshot based on the model fitted at page 4-42 of the course notes:

Re: Calculation of ASE

gcjfernandez — Wed, 13 May 2020 05:50:55 GMT

Re: Applied Analytics Using SAS Enterprise Miner

Would it be possible to clarify how ASE (Average Square Error) is calculated (its definition is given at page 3.72 of the course notes)?

My Answer:

When the target variable is interval the denominator for ASE is N (Training or Validation sample size) Please see Course PDF 3-72

When the Target variable is Binary the denominator for ASE is N x 2 (2 levels: Event and non event) Please see Course PDF 3-72

In demo data because we are making 50:50 split for Training and validation it appears that the denominator is (Train +validation)

But actually for training ASE = SSE/2N.

My Answer:

In computing MSE for training and validation data DFE is not used in SASEM. It is using N as the denominator. Because in Decision Tree and Neural Net there are no Model degrees of freedom. Therefore no Error DF. Similarly in Validation data no model is fitted. Therefore in order have a comparable Error estimate across DT, Reg, and NN, it is using N as the denominator in MSE.

Re: Calculation of ASE

pvareschi — Wed, 13 May 2020 11:48:41 GMT

Thank you for your explanations.

Based on the formula at page 3-72, is it correct to say that the numerator part of ASE takes account of both predicted probabilities, primary and secondary class, for a given case Yi? (i.e. essentially counting "residuals" twice)

Is that expression used to calculate the SSE value for the Regression Node as well? In that case, it would be different from the "classic" definition where only differences against the predicted probability of the primary event are considered.

Last point: in course "Predictive Modeling Using Logistic Regression", at page B-11 of the course notes, the way ASE is calculated within macro %ASSESS is not based on the expression from page 3-72 (likewise SSE is just the sum of the squared differences between observed and fitted value): would the two approached yield different numerical results or would they just be equivalent for a binary outcome?

Re: Calculation of ASE

gcjfernandez — Wed, 13 May 2020 15:44:15 GMT

Thank you for your explanations.

Yes that is correct.

The regression node can fit both MLR (Interval target) and Binary Logistic Regression (BLR). For BLR the above formula is correct for ASE.

For MLR, ASE= SSE/N;

Yes you are correct. When using Proc Logistic to fit BLR we defined the target level (whether we are modeling 1 or 0). Whereas in SAS EM regression node, we do not define whether we want to model 1 or 0. Therefore both predicted probabilities are included in the ASE computation.