<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Misclassification rate on proc hpsplit in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Misclassification-rate-on-proc-hpsplit/m-p/786952#M9004</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I think it has to do with missing values. Could that be possible?&lt;BR /&gt;Do the cell counts for 'Test' add up to the total number of observations in your 'Test' - partition? Probably not.&lt;BR /&gt;What if you use the&amp;nbsp;total number of observations in your 'Test' - partition as the denominator. Do you get &lt;FONT face="courier new,courier"&gt;0.2808&lt;/FONT&gt; then?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="cs95E872D0"&gt;&lt;SPAN class="cs4FC6FE93"&gt;In your code, are you using&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN class="csF09281A2"&gt;&lt;FONT face="courier new,courier"&gt;assignmissing=similar&lt;/FONT&gt;?&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="cs95E872D0"&gt;&lt;SPAN class="cs4FC6FE93"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="cs95E872D0"&gt;&lt;SPAN class="cs4FC6FE93"&gt;If&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;FONT face="courier new,courier"&gt;&lt;SPAN class="csF09281A2"&gt;assignmissing=none&lt;/SPAN&gt;&lt;/FONT&gt;&lt;SPAN class="cs4FC6FE93"&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;is used instead, then for the Test partition, the sum of the cells in the Confusion Matrix table does match the Number of Test Observations Used, I believe.&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="cs95E872D0"&gt;&lt;SPAN class="cs4FC6FE93"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="cs95E872D0"&gt;&lt;SPAN class="cs4FC6FE93"&gt;Here are some workarounds so that you can move forward with your analysis.&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL class="lia-list-style-type-disc"&gt;
&lt;LI class="cs95E872D0"&gt;&lt;SPAN class="cs4FC6FE93"&gt;Impute the missing values with a procedure (PROC STDIZE, PROC MI, PROC FASTCLUS,&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="cs4FC6FE93"&gt;and so on), or by some value(s) that make sense based on your subject knowledge.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI class="cs95E872D0"&gt;&lt;SPAN class="cs4FC6FE93"&gt;Use&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;FONT face="courier new,courier"&gt;&lt;SPAN class="csF09281A2"&gt;assignmissing=none&lt;/SPAN&gt;&lt;/FONT&gt;&lt;SPAN class="cs4FC6FE93"&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;on the PROC statement.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI class="cs95E872D0"&gt;&lt;SPAN class="cs4FC6FE93"&gt;( Remove observations that have missing values. ) Maybe not a viable option.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI class="cs95E872D0"&gt;&lt;SPAN class="cs4FC6FE93"&gt;( Remove variables that have missing values. )&amp;nbsp;Maybe not a viable option.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="cs95E872D0"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="cs95E872D0"&gt;&lt;SPAN class="cs4FC6FE93"&gt;Good luck,&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="cs95E872D0"&gt;&lt;SPAN class="cs4FC6FE93"&gt;Koen&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 21 Dec 2021 15:47:01 GMT</pubDate>
    <dc:creator>sbxkoenk</dc:creator>
    <dc:date>2021-12-21T15:47:01Z</dc:date>
    <item>
      <title>Misclassification rate on proc hpsplit</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Misclassification-rate-on-proc-hpsplit/m-p/783287#M8998</link>
      <description>&lt;P&gt;I am using a proc hpsplit to create a decision tree. The resulting confusion matrix is below. The misclassification rate for the test data seems wrong (although it is right for training and validation). This happens on other data sets I have tried too. What could be causing this?&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screenshot 2021-11-30 152511.png" style="width: 627px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/66295i9C0293759A475B76/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screenshot 2021-11-30 152511.png" alt="Screenshot 2021-11-30 152511.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 30 Nov 2021 21:27:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Misclassification-rate-on-proc-hpsplit/m-p/783287#M8998</guid>
      <dc:creator>nataliegerhart0</dc:creator>
      <dc:date>2021-11-30T21:27:26Z</dc:date>
    </item>
    <item>
      <title>Re: Misclassification rate on proc hpsplit</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Misclassification-rate-on-proc-hpsplit/m-p/786952#M9004</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I think it has to do with missing values. Could that be possible?&lt;BR /&gt;Do the cell counts for 'Test' add up to the total number of observations in your 'Test' - partition? Probably not.&lt;BR /&gt;What if you use the&amp;nbsp;total number of observations in your 'Test' - partition as the denominator. Do you get &lt;FONT face="courier new,courier"&gt;0.2808&lt;/FONT&gt; then?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="cs95E872D0"&gt;&lt;SPAN class="cs4FC6FE93"&gt;In your code, are you using&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN class="csF09281A2"&gt;&lt;FONT face="courier new,courier"&gt;assignmissing=similar&lt;/FONT&gt;?&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="cs95E872D0"&gt;&lt;SPAN class="cs4FC6FE93"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="cs95E872D0"&gt;&lt;SPAN class="cs4FC6FE93"&gt;If&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;FONT face="courier new,courier"&gt;&lt;SPAN class="csF09281A2"&gt;assignmissing=none&lt;/SPAN&gt;&lt;/FONT&gt;&lt;SPAN class="cs4FC6FE93"&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;is used instead, then for the Test partition, the sum of the cells in the Confusion Matrix table does match the Number of Test Observations Used, I believe.&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="cs95E872D0"&gt;&lt;SPAN class="cs4FC6FE93"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="cs95E872D0"&gt;&lt;SPAN class="cs4FC6FE93"&gt;Here are some workarounds so that you can move forward with your analysis.&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL class="lia-list-style-type-disc"&gt;
&lt;LI class="cs95E872D0"&gt;&lt;SPAN class="cs4FC6FE93"&gt;Impute the missing values with a procedure (PROC STDIZE, PROC MI, PROC FASTCLUS,&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="cs4FC6FE93"&gt;and so on), or by some value(s) that make sense based on your subject knowledge.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI class="cs95E872D0"&gt;&lt;SPAN class="cs4FC6FE93"&gt;Use&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;FONT face="courier new,courier"&gt;&lt;SPAN class="csF09281A2"&gt;assignmissing=none&lt;/SPAN&gt;&lt;/FONT&gt;&lt;SPAN class="cs4FC6FE93"&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;on the PROC statement.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI class="cs95E872D0"&gt;&lt;SPAN class="cs4FC6FE93"&gt;( Remove observations that have missing values. ) Maybe not a viable option.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI class="cs95E872D0"&gt;&lt;SPAN class="cs4FC6FE93"&gt;( Remove variables that have missing values. )&amp;nbsp;Maybe not a viable option.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="cs95E872D0"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="cs95E872D0"&gt;&lt;SPAN class="cs4FC6FE93"&gt;Good luck,&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="cs95E872D0"&gt;&lt;SPAN class="cs4FC6FE93"&gt;Koen&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 21 Dec 2021 15:47:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Misclassification-rate-on-proc-hpsplit/m-p/786952#M9004</guid>
      <dc:creator>sbxkoenk</dc:creator>
      <dc:date>2021-12-21T15:47:01Z</dc:date>
    </item>
    <item>
      <title>Re: Misclassification rate on proc hpsplit</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Misclassification-rate-on-proc-hpsplit/m-p/786995#M9005</link>
      <description>&lt;P&gt;Thank you for your help! SAS Support acknowledged there was a bug in the output for the same reason you identified. They offered the same temporary solutions. Thank you for your time with this!&lt;/P&gt;</description>
      <pubDate>Tue, 21 Dec 2021 18:06:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Misclassification-rate-on-proc-hpsplit/m-p/786995#M9005</guid>
      <dc:creator>nataliegerhart0</dc:creator>
      <dc:date>2021-12-21T18:06:19Z</dc:date>
    </item>
  </channel>
</rss>

