<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Improve K-Means results in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Improve-K-Means-results/m-p/287579#M4286</link>
    <description>Nop, the objective is identify group of clients based in their money movement and number of operations. Using the replacement node, I could find 3 clusters... well at least is something jejejej</description>
    <pubDate>Wed, 27 Jul 2016 19:43:59 GMT</pubDate>
    <dc:creator>fri0</dc:creator>
    <dc:date>2016-07-27T19:43:59Z</dc:date>
    <item>
      <title>Improve K-Means results</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Improve-K-Means-results/m-p/287348#M4274</link>
      <description>&lt;P&gt;&amp;nbsp;Hi, I've run a k-means clustering in Enterprise Miner, but I get a giant cluster with 99% of the registers.... Any idea to get clusters more equitable? Thanks!&lt;/P&gt;</description>
      <pubDate>Tue, 26 Jul 2016 21:08:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Improve-K-Means-results/m-p/287348#M4274</guid>
      <dc:creator>fri0</dc:creator>
      <dc:date>2016-07-26T21:08:01Z</dc:date>
    </item>
    <item>
      <title>Re: Improve K-Means results</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Improve-K-Means-results/m-p/287480#M4280</link>
      <description>&lt;P&gt;The HP Cluster node by default uses the &amp;nbsp;Aligned Box Criterion method to pick the best K (number of clusters). But you can&amp;nbsp;override this by choosing Number of Clusters: User Specify in node properties, then use the Segment Profiler node to characterize solutions with different Ks.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also keep in mind that the HP Cluster node uses only interval inputs, ignoring any nominal/binary inputs. So, depending on your dataset, some of the information that could potentially help separate observations into clusters might be ignored. One way to deal with this is to binary-encode nominal inputs that you want to use in clustering.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Ray&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 27 Jul 2016 14:01:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Improve-K-Means-results/m-p/287480#M4280</guid>
      <dc:creator>rayIII</dc:creator>
      <dc:date>2016-07-27T14:01:04Z</dc:date>
    </item>
    <item>
      <title>Re: Improve K-Means results</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Improve-K-Means-results/m-p/287490#M4281</link>
      <description>&lt;P&gt;Hi ryall! thanks for &amp;nbsp;your answers. I used the default options in Cluster node, the result it's a ward clustering with abouth 10 clusters and also it has a giant cluster... I used the CCC plot to identify a possible optimal number of cllusters, with the graph I identified between 4 or 5 clusters. That's why I used a 4 and 5 k-means clustering, but the result it's almost the same with one cluster taking almost all the registers. I only have 2 intervals variables: total amount and number of operations. Any other idea? I found that sometimes use replacement and filter node could help.&lt;/P&gt;</description>
      <pubDate>Wed, 27 Jul 2016 14:41:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Improve-K-Means-results/m-p/287490#M4281</guid>
      <dc:creator>fri0</dc:creator>
      <dc:date>2016-07-27T14:41:49Z</dc:date>
    </item>
    <item>
      <title>Re: Improve K-Means results</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Improve-K-Means-results/m-p/287497#M4282</link>
      <description>&lt;P&gt;Hi. If you only have two clustering inputs, the first thing I would do is plot them as a scatterplot. Do you see distinct clusters in the 2 dimensions or is it basically one big blob?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Ray&lt;/P&gt;</description>
      <pubDate>Wed, 27 Jul 2016 15:04:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Improve-K-Means-results/m-p/287497#M4282</guid>
      <dc:creator>rayIII</dc:creator>
      <dc:date>2016-07-27T15:04:06Z</dc:date>
    </item>
    <item>
      <title>Re: Improve K-Means results</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Improve-K-Means-results/m-p/287513#M4283</link>
      <description>&lt;P&gt;Yes,&amp;nbsp;I ran a scatterplot and I could see about 3-4 groups,&amp;nbsp;but, I just ran a density graph because I think the scatterplot could be a&amp;nbsp;little tricky. The groups that I&amp;nbsp;saw don't have son many observations as I thought... =(&lt;/P&gt;
&lt;P&gt;Please, could you check the attached images and give me your opinion please?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Edit.PS. Do you know about Range standarization? When is a good idea to use it?&lt;/P&gt;&lt;BR /&gt;&lt;IMG src="https://communities.sas.com/t5/image/serverpage/image-id/12782i5C64F10F121994D1/image-size/large?v=1.0&amp;amp;px=600" border="0" alt="densitygraph.png" title="densitygraph.png" /&gt;&lt;IMG src="https://communities.sas.com/t5/image/serverpage/image-id/12783i2315A96ED0B84EB7/image-size/large?v=1.0&amp;amp;px=600" border="0" alt="scatterplot.jpg" title="scatterplot.jpg" /&gt;</description>
      <pubDate>Wed, 27 Jul 2016 16:08:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Improve-K-Means-results/m-p/287513#M4283</guid>
      <dc:creator>fri0</dc:creator>
      <dc:date>2016-07-27T16:08:10Z</dc:date>
    </item>
    <item>
      <title>Re: Improve K-Means results</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Improve-K-Means-results/m-p/287574#M4285</link>
      <description>&lt;P&gt;Hi.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;K-means tends to work best with well-separated spherical (or in your case, circular) groups.&amp;nbsp;I'm not seeing that in&amp;nbsp;your plots.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It might help to know a bit more about how&amp;nbsp;you plan to use the clusters. Are you trying to characterize a sample of observations? Derive an input for a predictive model?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 27 Jul 2016 19:17:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Improve-K-Means-results/m-p/287574#M4285</guid>
      <dc:creator>rayIII</dc:creator>
      <dc:date>2016-07-27T19:17:43Z</dc:date>
    </item>
    <item>
      <title>Re: Improve K-Means results</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Improve-K-Means-results/m-p/287579#M4286</link>
      <description>Nop, the objective is identify group of clients based in their money movement and number of operations. Using the replacement node, I could find 3 clusters... well at least is something jejejej</description>
      <pubDate>Wed, 27 Jul 2016 19:43:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Improve-K-Means-results/m-p/287579#M4286</guid>
      <dc:creator>fri0</dc:creator>
      <dc:date>2016-07-27T19:43:59Z</dc:date>
    </item>
    <item>
      <title>Re: Improve K-Means results</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Improve-K-Means-results/m-p/289024#M4307</link>
      <description>Hi, 

One key aspect is the variables you list to cluster the records. How many continuous variables you have? How many categorical variables you have (you should somehow quantify them to be interval scale, right?)? Have you standardized variables? 

If you have concern you have too many variables to enter the Cluster node, you may consider variable clustering to 'select' variables. Generally speaking, more variables more likely you will break open big clusters. On the other hand, you need to order variables according to variable clustering. If you don't do that and just add more variables, either you find yourself having to add many variables to  break up clusters, or the ending results, broken up albeit, are not meaningful. 

Hope this helps? Thanks for the questions. Jason Xin</description>
      <pubDate>Tue, 02 Aug 2016 21:20:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Improve-K-Means-results/m-p/289024#M4307</guid>
      <dc:creator>JasonXin</dc:creator>
      <dc:date>2016-08-02T21:20:16Z</dc:date>
    </item>
  </channel>
</rss>

