<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Clustering by one variable in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Clustering-by-one-variable/m-p/408506#M21271</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I have no experiance in clustering, thus I would be grateful If anyone could help me to choose optimal method.&lt;/P&gt;&lt;P&gt;I am going to&amp;nbsp; group about 1,5 mln customers by one variable (I ve got more but all of them are highly corellated), aboute 50 % of observation have value 0 in clustering variable.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am using fastclust procedure:&lt;/P&gt;&lt;P&gt;proc fastclus data=wyn2 least=1 maxc=4;&lt;BR /&gt;var zasilenia_za_ost_3m;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What I received is 4 groups with&lt;/P&gt;&lt;P&gt;1499995 in the first group&amp;nbsp;&lt;/P&gt;&lt;P&gt;2 in the second cluster&lt;/P&gt;&lt;P&gt;1 in the third cluster&lt;/P&gt;&lt;P&gt;2 in the fourth cluster.&lt;/P&gt;&lt;P&gt;It`s working that way as well even if I remove observation with 0 in clust var.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thus, my question is which method/procedure would be the best in this case?&lt;/P&gt;&lt;P&gt;Thank you.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 30 Oct 2017 07:57:48 GMT</pubDate>
    <dc:creator>Matt3</dc:creator>
    <dc:date>2017-10-30T07:57:48Z</dc:date>
    <item>
      <title>Clustering by one variable</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Clustering-by-one-variable/m-p/408506#M21271</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I have no experiance in clustering, thus I would be grateful If anyone could help me to choose optimal method.&lt;/P&gt;&lt;P&gt;I am going to&amp;nbsp; group about 1,5 mln customers by one variable (I ve got more but all of them are highly corellated), aboute 50 % of observation have value 0 in clustering variable.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am using fastclust procedure:&lt;/P&gt;&lt;P&gt;proc fastclus data=wyn2 least=1 maxc=4;&lt;BR /&gt;var zasilenia_za_ost_3m;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What I received is 4 groups with&lt;/P&gt;&lt;P&gt;1499995 in the first group&amp;nbsp;&lt;/P&gt;&lt;P&gt;2 in the second cluster&lt;/P&gt;&lt;P&gt;1 in the third cluster&lt;/P&gt;&lt;P&gt;2 in the fourth cluster.&lt;/P&gt;&lt;P&gt;It`s working that way as well even if I remove observation with 0 in clust var.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thus, my question is which method/procedure would be the best in this case?&lt;/P&gt;&lt;P&gt;Thank you.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 30 Oct 2017 07:57:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Clustering-by-one-variable/m-p/408506#M21271</guid>
      <dc:creator>Matt3</dc:creator>
      <dc:date>2017-10-30T07:57:48Z</dc:date>
    </item>
    <item>
      <title>Re: Clustering by one variable</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Clustering-by-one-variable/m-p/408727#M21305</link>
      <description>&lt;P&gt;It doesn't matter, you have a single variable. Look at distribution plots and make your cut off points using some common sense is probably a better approach.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 30 Oct 2017 15:08:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Clustering-by-one-variable/m-p/408727#M21305</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-10-30T15:08:09Z</dc:date>
    </item>
  </channel>
</rss>

