<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: What should be the Optimum Number of Cluster in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/What-should-be-the-Optimum-Number-of-Cluster/m-p/663155#M8331</link>
    <description>&lt;P&gt;Thank You&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/10892"&gt;@PaigeMiller&lt;/a&gt;&amp;nbsp; &amp;amp;&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/18408"&gt;@Ksharp&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I know _ccc_ is strictly negative.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am already using PCA too.&lt;/P&gt;&lt;P&gt;The idea is to get outliers from 2 different algorithms and then join to get the output.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;PCA-- was able to handle this.&lt;/P&gt;&lt;P&gt;But&lt;/P&gt;&lt;P&gt;KNN-- is looking for better selection of variables. Just dumping variables for KNN to figure out the cluster does not seem to be the correct thing to do.&lt;/P&gt;&lt;P&gt;Thank You to the Legends.&lt;/P&gt;</description>
    <pubDate>Thu, 18 Jun 2020 13:24:19 GMT</pubDate>
    <dc:creator>arpitsharma27</dc:creator>
    <dc:date>2020-06-18T13:24:19Z</dc:date>
    <item>
      <title>What should be the Optimum Number of Cluster</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/What-should-be-the-Optimum-Number-of-Cluster/m-p/661849#M8328</link>
      <description>&lt;P&gt;Team,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have this:&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%macro clustering_method(method=);
proc cluster data=cars method=&amp;amp;method. ccc outtree=tree_&amp;amp;method. noprint;
where type="Sports";
by type;
	var horsepower mpg_highway weight wheelbase;
run;
proc sort data=tree_&amp;amp;method. out=&amp;amp;method.(keep= type _ncl_ _ccc_ );
by type _ncl_ _ccc_ ;
where not missing(_ccc_);
run;

%mend;
%clustering_method(method=Average);
%clustering_method(method=median);
%clustering_method(method=centroid);
%clustering_method(method=mcquitty);
%clustering_method(method=ward);

data Have;
set Average 
	Median 
	Centroid 
	McQuitty 
	Ward 
		indsname=source;
input_ds=scan(source,2,'.');;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Referring to the &lt;STRONG&gt;Have&lt;/STRONG&gt; dataset. What should be my optimum number of Clusters ? and Why?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please advise.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 17 Jun 2020 17:26:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/What-should-be-the-Optimum-Number-of-Cluster/m-p/661849#M8328</guid>
      <dc:creator>arpitsharma27</dc:creator>
      <dc:date>2020-06-17T17:26:16Z</dc:date>
    </item>
    <item>
      <title>Re: What should be the Optimum Number of Cluster</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/What-should-be-the-Optimum-Number-of-Cluster/m-p/661872#M8329</link>
      <description>&lt;P&gt;You are getting only negative numbers for CCC. This implies (to me) that there is no clustering. Also see&amp;nbsp;&lt;A href="https://www.researchgate.net/post/Could_someone_help_me_decide_the_ideal_noof_clusters_from_the_pseudo_t_squared_graph_in_SAS" target="_blank"&gt;https://www.researchgate.net/post/Could_someone_help_me_decide_the_ideal_noof_clusters_from_the_pseudo_t_squared_graph_in_SAS&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;which says&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN&gt;CCC is the cubic clustering criterion; the idea behind it is to compare the R squared you get with a specific number of clusters versus the R squared you would get by clustering a uniformly distributed set of points. That is, you interpret it similarly as you would R squared. You are getting STRICTLY negative values (and, in fact, they are decreasing with additional number of clusters before increasing again; I would interpret that increase as overfitting). This means that the model you are fitting to the data with X number of clusters fits worse than uniformly distributed points. This is evidence of a lack of clustering (or problems with the data).&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;</description>
      <pubDate>Wed, 17 Jun 2020 18:28:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/What-should-be-the-Optimum-Number-of-Cluster/m-p/661872#M8329</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2020-06-17T18:28:14Z</dc:date>
    </item>
    <item>
      <title>Re: What should be the Optimum Number of Cluster</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/What-should-be-the-Optimum-Number-of-Cluster/m-p/663133#M8330</link>
      <description>&lt;P&gt;It is a world unsolved problem.&lt;/P&gt;
&lt;P&gt;If I were you, I would try Primary Component Analysis.&lt;/P&gt;
&lt;P&gt;Anyway,&amp;nbsp;&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13684"&gt;@Rick_SAS&lt;/a&gt;&amp;nbsp; maybe have some ideas .&lt;/P&gt;</description>
      <pubDate>Thu, 18 Jun 2020 12:19:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/What-should-be-the-Optimum-Number-of-Cluster/m-p/663133#M8330</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2020-06-18T12:19:15Z</dc:date>
    </item>
    <item>
      <title>Re: What should be the Optimum Number of Cluster</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/What-should-be-the-Optimum-Number-of-Cluster/m-p/663155#M8331</link>
      <description>&lt;P&gt;Thank You&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/10892"&gt;@PaigeMiller&lt;/a&gt;&amp;nbsp; &amp;amp;&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/18408"&gt;@Ksharp&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I know _ccc_ is strictly negative.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am already using PCA too.&lt;/P&gt;&lt;P&gt;The idea is to get outliers from 2 different algorithms and then join to get the output.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;PCA-- was able to handle this.&lt;/P&gt;&lt;P&gt;But&lt;/P&gt;&lt;P&gt;KNN-- is looking for better selection of variables. Just dumping variables for KNN to figure out the cluster does not seem to be the correct thing to do.&lt;/P&gt;&lt;P&gt;Thank You to the Legends.&lt;/P&gt;</description>
      <pubDate>Thu, 18 Jun 2020 13:24:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/What-should-be-the-Optimum-Number-of-Cluster/m-p/663155#M8331</guid>
      <dc:creator>arpitsharma27</dc:creator>
      <dc:date>2020-06-18T13:24:19Z</dc:date>
    </item>
    <item>
      <title>Re: What should be the Optimum Number of Cluster</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/What-should-be-the-Optimum-Number-of-Cluster/m-p/663156#M8332</link>
      <description>&lt;P&gt;Aligned Box Criterion is available in the HP Cluster node in SAS Enterprise Miner. It will determine the optimum number of clusters. Here's a video that talks about using this option, along with using CCC and gap methods&amp;nbsp;&lt;A href="https://www.youtube.com/watch?v=NZpNTkfT47c" target="_blank" rel="noopener"&gt;https://www.youtube.com/watch?v=NZpNTkfT47c&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 18 Jun 2020 13:37:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/What-should-be-the-Optimum-Number-of-Cluster/m-p/663156#M8332</guid>
      <dc:creator>MelodieRush</dc:creator>
      <dc:date>2020-06-18T13:37:27Z</dc:date>
    </item>
  </channel>
</rss>

