<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Calculation of ASE in SAS Academy for Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Calculation-of-ASE/m-p/647507#M774</link>
    <description>&lt;P&gt;Thank you for your explanations.&lt;/P&gt;
&lt;P&gt;Based on the formula at page 3-72, is it correct to say that the numerator part of ASE takes account of both predicted probabilities, primary and secondary class, for a given case Yi? (i.e. essentially counting "residuals" twice)&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000FF"&gt;&lt;STRONG&gt;Yes that is correct.&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;Is that expression used to calculate the SSE value for the Regression Node as well? In that case, it would be different from the "classic" definition where only differences against the predicted probability of the primary event are considered.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;FONT color="#0000FF"&gt;The regression node can fit both MLR (Interval target) and Binary Logistic Regression (BLR). For BLR&amp;nbsp; the above formula is correct for ASE.&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;FONT color="#0000FF"&gt;For MLR, ASE= SSE/N;&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Last point: in course "&lt;FONT&gt;Predictive Modeling Using Logistic Regression&lt;/FONT&gt;", at page B-11 of the course notes, the way ASE is calculated within macro %ASSESS is not based on the expression from page 3-72 (likewise SSE is just the sum of the squared differences between observed and fitted value): would the two approached yield different numerical results or would they just be equivalent for a binary outcome?&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000FF"&gt;&lt;STRONG&gt;Yes you are correct. When using Proc Logistic to fit BLR we defined the target level (whether we are modeling 1 or 0). Whereas in SAS EM regression node, we do not define whether we want to model 1 or 0. Therefore both predicted probabilities are included in the ASE computation.&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 13 May 2020 15:44:15 GMT</pubDate>
    <dc:creator>gcjfernandez</dc:creator>
    <dc:date>2020-05-13T15:44:15Z</dc:date>
    <item>
      <title>Calculation of ASE</title>
      <link>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Calculation-of-ASE/m-p/646600#M754</link>
      <description>&lt;P&gt;Re:&amp;nbsp;&lt;FONT style="background-color: #ffffff;"&gt;Applied Analytics Using SAS Enterprise Miner&lt;/FONT&gt;&lt;/P&gt;
&lt;DIV&gt;Would it be possible to clarify how &lt;FONT style="background-color: #ffffff;"&gt;ASE (Average Square Error) is calculated (its definition is given at page 3.72 of the course notes)?&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT style="background-color: #ffffff;"&gt;Asking this because, by looking at the output from any modelling node, it looks like the denominator is based on the total number of cases in the whole sample (Training+Validation), not just Training or Validation datasets (see image at page 3.89 for output from for Decision Tree; same applies to Regression node - see example at page 4-43).&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT style="background-color: #ffffff;"&gt;Moreover, in the output from a Regression node, the Mean Square Error (MSE) should be calculated as Sum of Squared Errors (SSE) divided by the Degrees of Freedom or Error (DFE); however, that does not seem to be the case; here is a screenshot based on the model fitted at page 4-42 of the course notes:&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT style="background-color: #ffffff;"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="fit_statistics.png" style="width: 999px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/39261i2608F3552D20FE13/image-size/large?v=v2&amp;amp;px=999" role="button" title="fit_statistics.png" alt="fit_statistics.png" /&gt;&lt;/span&gt;&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 11 May 2020 09:49:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Calculation-of-ASE/m-p/646600#M754</guid>
      <dc:creator>pvareschi</dc:creator>
      <dc:date>2020-05-11T09:49:29Z</dc:date>
    </item>
    <item>
      <title>Re: Calculation of ASE</title>
      <link>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Calculation-of-ASE/m-p/647310#M769</link>
      <description>&lt;P&gt;Re:&amp;nbsp;&lt;FONT&gt;Applied Analytics Using SAS Enterprise Miner&lt;/FONT&gt;&lt;/P&gt;
&lt;DIV&gt;Would it be possible to clarify how&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;FONT&gt;ASE (Average Square Error) is calculated (its definition is given at page 3.72 of the course notes)?&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT&gt;Asking this because, by looking at the output from any modelling node, it looks like the denominator is based on the total number of cases in the whole sample (Training+Validation), not just Training or Validation datasets (see image at page 3.89 for output from for Decision Tree; same applies to Regression node - see example at page 4-43).&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT color="#0000FF"&gt;&lt;STRONG&gt;My Answer:&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT color="#0000FF"&gt;&lt;STRONG&gt;When the target variable is interval the denominator for ASE is N (Training or Validation sample size) Please see Course PDF 3-72&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;
&lt;DIV id="tinyMceEditorgcjfernandez_gmail_com_0" class="mceNonEditable lia-copypaste-placeholder"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;P&gt;&lt;FONT color="#0000FF"&gt;&lt;STRONG&gt;When the Target variable is Binary the denominator for ASE is N x 2 (2 levels: Event and non event)&amp;nbsp;Please see Course PDF 3-72&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000FF"&gt;&lt;STRONG&gt;In demo data because we are making 50:50 split for Training and validation it appears that the denominator is (Train +validation)&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000FF"&gt;&lt;STRONG&gt;But actually for training ASE = SSE/2N.&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN style="font-family: inherit;"&gt;Moreover, in the output from a Regression node, the Mean Square Error (MSE) should be calculated as Sum of Squared Errors (SSE) divided by the Degrees of Freedom or Error (DFE); however, that does not seem to be the case; here is a screenshot based on the model fitted at page 4-42 of the course notes:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000FF"&gt;&lt;STRONG&gt;&lt;SPAN style="font-family: inherit;"&gt;My Answer:&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000FF"&gt;&lt;STRONG&gt;&lt;FONT face="inherit"&gt;In computing MSE for training and validation data DFE is not used in SASEM. It&amp;nbsp; is using N as the denominator. Because in Decision Tree and Neural Net there are no Model degrees of freedom. Therefore no Error DF. Similarly in Validation data no model is fitted. Therefore in order have a comparable Error estimate &lt;/FONT&gt;across&lt;FONT face="inherit"&gt;&amp;nbsp;DT, Reg, and NN, it is using N as the denominator in MSE.&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;/DIV&gt;
&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;</description>
      <pubDate>Wed, 13 May 2020 05:50:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Calculation-of-ASE/m-p/647310#M769</guid>
      <dc:creator>gcjfernandez</dc:creator>
      <dc:date>2020-05-13T05:50:55Z</dc:date>
    </item>
    <item>
      <title>Re: Calculation of ASE</title>
      <link>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Calculation-of-ASE/m-p/647426#M771</link>
      <description>&lt;P&gt;Thank you for your explanations.&lt;/P&gt;
&lt;P&gt;Based on the formula at page 3-72, is it correct to say that the numerator part of ASE takes account of both predicted probabilities, primary and secondary class, for a given case Yi? (i.e. essentially counting "residuals" twice)&lt;/P&gt;
&lt;P&gt;Is that expression used to calculate the SSE value for the Regression Node as well? In that case, it would be different from the "classic" definition where only differences against the predicted probability of the primary event are considered.&lt;/P&gt;
&lt;P&gt;Last point: in course "&lt;FONT style="background-color: #ffffff;"&gt;Predictive Modeling Using Logistic Regression&lt;/FONT&gt;", at page B-11 of the course notes, the way ASE is calculated within macro %ASSESS is not based on the expression from page 3-72 (likewise SSE is just the sum of the squared differences between observed and fitted value): would the two approached yield different numerical results or would they just be equivalent for a binary outcome?&lt;/P&gt;</description>
      <pubDate>Wed, 13 May 2020 11:48:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Calculation-of-ASE/m-p/647426#M771</guid>
      <dc:creator>pvareschi</dc:creator>
      <dc:date>2020-05-13T11:48:41Z</dc:date>
    </item>
    <item>
      <title>Re: Calculation of ASE</title>
      <link>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Calculation-of-ASE/m-p/647507#M774</link>
      <description>&lt;P&gt;Thank you for your explanations.&lt;/P&gt;
&lt;P&gt;Based on the formula at page 3-72, is it correct to say that the numerator part of ASE takes account of both predicted probabilities, primary and secondary class, for a given case Yi? (i.e. essentially counting "residuals" twice)&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000FF"&gt;&lt;STRONG&gt;Yes that is correct.&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;Is that expression used to calculate the SSE value for the Regression Node as well? In that case, it would be different from the "classic" definition where only differences against the predicted probability of the primary event are considered.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;FONT color="#0000FF"&gt;The regression node can fit both MLR (Interval target) and Binary Logistic Regression (BLR). For BLR&amp;nbsp; the above formula is correct for ASE.&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;FONT color="#0000FF"&gt;For MLR, ASE= SSE/N;&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Last point: in course "&lt;FONT&gt;Predictive Modeling Using Logistic Regression&lt;/FONT&gt;", at page B-11 of the course notes, the way ASE is calculated within macro %ASSESS is not based on the expression from page 3-72 (likewise SSE is just the sum of the squared differences between observed and fitted value): would the two approached yield different numerical results or would they just be equivalent for a binary outcome?&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000FF"&gt;&lt;STRONG&gt;Yes you are correct. When using Proc Logistic to fit BLR we defined the target level (whether we are modeling 1 or 0). Whereas in SAS EM regression node, we do not define whether we want to model 1 or 0. Therefore both predicted probabilities are included in the ASE computation.&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 13 May 2020 15:44:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Calculation-of-ASE/m-p/647507#M774</guid>
      <dc:creator>gcjfernandez</dc:creator>
      <dc:date>2020-05-13T15:44:15Z</dc:date>
    </item>
  </channel>
</rss>

