<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Evaluating clusters for optimal K in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Evaluating-clusters-for-optimal-K/m-p/353923#M5258</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I built 4 clustering models i.e. 3 manually and stepping down from K15 -&amp;gt; K6 -&amp;gt; K4 and 1 using automatic selection with the Cluster node in SAS Enterprise Miner. &amp;nbsp;The cluster statistics for the 4 models are,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;IMG src="https://communities.sas.com/t5/image/serverpage/image-id/8571i92905B20E65106CA/image-size/original?v=1.0&amp;amp;px=-1" border="0" alt="2017-04-27_7-48-51.png" title="2017-04-27_7-48-51.png" /&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The results are the exactly the same for Clustering K4 and Clustering Auto. &amp;nbsp;I have come to determine that a 4 clusters&amp;nbsp;model is optimum.&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Are these the correct metrics to evaluate clusters and to determine the optimal number of K? &amp;nbsp;I used cluster distance plots to visually determine as well.&lt;/LI&gt;&lt;LI&gt;Pseudo_F: &amp;nbsp;Is this the higher the better?&lt;/LI&gt;&lt;LI&gt;RSQ and RSQ_Ratio: &amp;nbsp;Are these the lower the better?&lt;/LI&gt;&lt;LI&gt;If these 4 metrics are not the best metrics to determine the optimal number of clusters, what are the appropriate ones generated from the Clustering node in SAS EM?&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Lobbie&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 26 Apr 2017 21:58:15 GMT</pubDate>
    <dc:creator>Lobbie</dc:creator>
    <dc:date>2017-04-26T21:58:15Z</dc:date>
    <item>
      <title>Evaluating clusters for optimal K</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Evaluating-clusters-for-optimal-K/m-p/353923#M5258</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I built 4 clustering models i.e. 3 manually and stepping down from K15 -&amp;gt; K6 -&amp;gt; K4 and 1 using automatic selection with the Cluster node in SAS Enterprise Miner. &amp;nbsp;The cluster statistics for the 4 models are,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;IMG src="https://communities.sas.com/t5/image/serverpage/image-id/8571i92905B20E65106CA/image-size/original?v=1.0&amp;amp;px=-1" border="0" alt="2017-04-27_7-48-51.png" title="2017-04-27_7-48-51.png" /&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The results are the exactly the same for Clustering K4 and Clustering Auto. &amp;nbsp;I have come to determine that a 4 clusters&amp;nbsp;model is optimum.&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Are these the correct metrics to evaluate clusters and to determine the optimal number of K? &amp;nbsp;I used cluster distance plots to visually determine as well.&lt;/LI&gt;&lt;LI&gt;Pseudo_F: &amp;nbsp;Is this the higher the better?&lt;/LI&gt;&lt;LI&gt;RSQ and RSQ_Ratio: &amp;nbsp;Are these the lower the better?&lt;/LI&gt;&lt;LI&gt;If these 4 metrics are not the best metrics to determine the optimal number of clusters, what are the appropriate ones generated from the Clustering node in SAS EM?&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Lobbie&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 26 Apr 2017 21:58:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Evaluating-clusters-for-optimal-K/m-p/353923#M5258</guid>
      <dc:creator>Lobbie</dc:creator>
      <dc:date>2017-04-26T21:58:15Z</dc:date>
    </item>
    <item>
      <title>Re: Evaluating clusters for optimal K</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Evaluating-clusters-for-optimal-K/m-p/355439#M5270</link>
      <description>&lt;P&gt;Hi Lobbie, see below for some comments around these.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;FONT color="#000000"&gt;Think these are fine as a guide, but suggest a little trial an error here - you also want the clusters to fit the purpose, not just the&amp;nbsp;best from a statistical sense. So you can also play around with which variables to use, and profiling to get a sense of the solution (can&amp;nbsp;use the segment profile node here).&amp;nbsp; This gives some more detail around approaches to selecting the no. of clusters:&amp;nbsp; &lt;A href="https://v8doc.sas.com/sashtml/stat/chap8/sect10.htm" target="_blank"&gt;https://v8doc.sas.com/sashtml/stat/chap8/sect10.htm&lt;/A&gt;&lt;/FONT&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;FONT color="#000000"&gt;Yes, it measures the separation of the clusters, so higher is better&lt;/FONT&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;FONT color="#000000"&gt;&amp;nbsp;It's the higher the better&amp;nbsp;for both. &amp;nbsp;RSQ this is the proportion of variance accounted for in the data, and RSQ_Ratio is similar but takes into account within vs between cluster variance.&amp;nbsp; These will keep increasing to a maximum where the number of clusters = the numbers of cases, so you're not looking for the higheset but actually an inflection point where the&amp;nbsp;rate of increase&amp;nbsp;is small&lt;/FONT&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;FONT color="#000000"&gt;&amp;nbsp;Also try looking at the CCC plot and see if there's some levelling here.&lt;/FONT&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;FONT color="#000000"&gt;Cheers,&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#000000"&gt;Troy&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 02 May 2017 23:39:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Evaluating-clusters-for-optimal-K/m-p/355439#M5270</guid>
      <dc:creator>trees1</dc:creator>
      <dc:date>2017-05-02T23:39:57Z</dc:date>
    </item>
  </channel>
</rss>

