Re: Applied Analytics Using SAS Enterprise Miner
Thank you for your explanations.
Based on the formula at page 3-72, is it correct to say that the numerator part of ASE takes account of both predicted probabilities, primary and secondary class, for a given case Yi? (i.e. essentially counting "residuals" twice)
Yes that is correct.
Is that expression used to calculate the SSE value for the Regression Node as well? In that case, it would be different from the "classic" definition where only differences against the predicted probability of the primary event are considered.
The regression node can fit both MLR (Interval target) and Binary Logistic Regression (BLR). For BLR the above formula is correct for ASE.
For MLR, ASE= SSE/N;
Last point: in course "Predictive Modeling Using Logistic Regression", at page B-11 of the course notes, the way ASE is calculated within macro %ASSESS is not based on the expression from page 3-72 (likewise SSE is just the sum of the squared differences between observed and fitted value): would the two approached yield different numerical results or would they just be equivalent for a binary outcome?
Yes you are correct. When using Proc Logistic to fit BLR we defined the target level (whether we are modeling 1 or 0). Whereas in SAS EM regression node, we do not define whether we want to model 1 or 0. Therefore both predicted probabilities are included in the ASE computation.
Re: Applied Analytics Using SAS Enterprise Miner
When the Target variable is Binary the denominator for ASE is N x 2 (2 levels: Event and non event) Please see Course PDF 3-72
In demo data because we are making 50:50 split for Training and validation it appears that the denominator is (Train +validation)
But actually for training ASE = SSE/2N.
Moreover, in the output from a Regression node, the Mean Square Error (MSE) should be calculated as Sum of Squared Errors (SSE) divided by the Degrees of Freedom or Error (DFE); however, that does not seem to be the case; here is a screenshot based on the model fitted at page 4-42 of the course notes:
My Answer:
In computing MSE for training and validation data DFE is not used in SASEM. It is using N as the denominator. Because in Decision Tree and Neural Net there are no Model degrees of freedom. Therefore no Error DF. Similarly in Validation data no model is fitted. Therefore in order have a comparable Error estimate across DT, Reg, and NN, it is using N as the denominator in MSE.
Thank you for your explanations.
Based on the formula at page 3-72, is it correct to say that the numerator part of ASE takes account of both predicted probabilities, primary and secondary class, for a given case Yi? (i.e. essentially counting "residuals" twice)
Is that expression used to calculate the SSE value for the Regression Node as well? In that case, it would be different from the "classic" definition where only differences against the predicted probability of the primary event are considered.
Last point: in course "Predictive Modeling Using Logistic Regression", at page B-11 of the course notes, the way ASE is calculated within macro %ASSESS is not based on the expression from page 3-72 (likewise SSE is just the sum of the squared differences between observed and fitted value): would the two approached yield different numerical results or would they just be equivalent for a binary outcome?
Thank you for your explanations.
Based on the formula at page 3-72, is it correct to say that the numerator part of ASE takes account of both predicted probabilities, primary and secondary class, for a given case Yi? (i.e. essentially counting "residuals" twice)
Yes that is correct.
Is that expression used to calculate the SSE value for the Regression Node as well? In that case, it would be different from the "classic" definition where only differences against the predicted probability of the primary event are considered.
The regression node can fit both MLR (Interval target) and Binary Logistic Regression (BLR). For BLR the above formula is correct for ASE.
For MLR, ASE= SSE/N;
Last point: in course "Predictive Modeling Using Logistic Regression", at page B-11 of the course notes, the way ASE is calculated within macro %ASSESS is not based on the expression from page 3-72 (likewise SSE is just the sum of the squared differences between observed and fitted value): would the two approached yield different numerical results or would they just be equivalent for a binary outcome?
Yes you are correct. When using Proc Logistic to fit BLR we defined the target level (whether we are modeling 1 or 0). Whereas in SAS EM regression node, we do not define whether we want to model 1 or 0. Therefore both predicted probabilities are included in the ASE computation.
This is a knowledge-sharing community for learners in the Academy. Find answers to your questions or post here for a reply.
To ensure your success, use these getting-started resources:
Estimating Your Study Time
Reserving Software Lab Time
Most Commonly Asked Questions
Troubleshooting Your SAS-Hadoop Training Environment
Ready to level-up your skills? Choose your own adventure.