<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: What procedure is best for clustering? in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/What-procedure-is-best-for-clustering/m-p/203390#M10904</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Do your basic univariate analysis first. It looks like you have mostly categorical variables. You show a 'result' variable, is that results from an experiment or a value that you're trying to cluster on that is independent.&amp;nbsp; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I'd even consider regrouping the date into months or weeks. A million rows isn't really big data. &lt;/P&gt;&lt;P&gt;Be careful with using data that is related, i.e. product code and sub product code in the same analysis. &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Fri, 24 Apr 2015 19:23:56 GMT</pubDate>
    <dc:creator>Reeza</dc:creator>
    <dc:date>2015-04-24T19:23:56Z</dc:date>
    <item>
      <title>What procedure is best for clustering?</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/What-procedure-is-best-for-clustering/m-p/203389#M10903</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I have data which is very big (more than million rows) and it is not normally distributed. It has the following variables&lt;/P&gt;&lt;P&gt;Id &lt;SPAN style="font-size: 13.3333330154419px;"&gt;(numeric)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;form id &lt;SPAN style="font-size: 13.3333330154419px;"&gt;(text)&lt;/SPAN&gt; &lt;/P&gt;&lt;P&gt;product code &lt;SPAN style="font-size: 13.3333330154419px;"&gt;(text)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;sub-product code&lt;/P&gt;&lt;P&gt;test number &lt;SPAN style="font-size: 13.3333330154419px;"&gt;(text)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;test code &lt;SPAN style="font-size: 13.3333330154419px;"&gt;(text)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;collection date &lt;SPAN style="font-size: 13.3333330154419px;"&gt;(numeric)&lt;/SPAN&gt; &lt;/P&gt;&lt;P&gt;state &lt;SPAN style="font-size: 13.3333330154419px;"&gt;(text)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;district &lt;SPAN style="font-size: 13.3333330154419px;"&gt;(text)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;result (text)&lt;/P&gt;&lt;P&gt;result (numeric) (which has more negative results and very few positive result) &lt;/P&gt;&lt;P&gt;and other variables which are not used for analysis or it contains null values.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;My tasks is to analysis this data with cluster procedures (fastclus, varclus and others ), but I am not&amp;nbsp; very sure that which procedure will best suit my data, moreover I am dealing with the clustering for the first time.&lt;/P&gt;&lt;P&gt;My objective is to create clusters for&amp;nbsp; positive &lt;SPAN style="font-size: 13.3333330154419px; line-height: 1.5em;"&gt;count &lt;/SPAN&gt;&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt; and based on the geographic location(state) and the product code or the collection date?From documentation I understood that fastclus may be good, is that correct ?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt;Any suggestions are welcome!!&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt;Thanks In Advance&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 24 Apr 2015 17:55:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/What-procedure-is-best-for-clustering/m-p/203389#M10903</guid>
      <dc:creator>pkmkart</dc:creator>
      <dc:date>2015-04-24T17:55:33Z</dc:date>
    </item>
    <item>
      <title>Re: What procedure is best for clustering?</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/What-procedure-is-best-for-clustering/m-p/203390#M10904</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Do your basic univariate analysis first. It looks like you have mostly categorical variables. You show a 'result' variable, is that results from an experiment or a value that you're trying to cluster on that is independent.&amp;nbsp; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I'd even consider regrouping the date into months or weeks. A million rows isn't really big data. &lt;/P&gt;&lt;P&gt;Be careful with using data that is related, i.e. product code and sub product code in the same analysis. &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 24 Apr 2015 19:23:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/What-procedure-is-best-for-clustering/m-p/203390#M10904</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2015-04-24T19:23:56Z</dc:date>
    </item>
    <item>
      <title>Re: What procedure is best for clustering?</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/What-procedure-is-best-for-clustering/m-p/203391#M10905</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I don't understand what you are clustering.&amp;nbsp; If "ID" is, indeed, an ID number, then it seems unlikely (to me) to be usefully included in any cluster analysis.&amp;nbsp; Unless ID has some meaning beyond just being a code - but most ID numbers, even if they do have such meaning - don't relate linearly to anything.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;But what are all the variables? What is the ID the identification for? A customer? Is each unique?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;What are you trying to &lt;EM&gt;do?&lt;/EM&gt; Not "I'm trying to cluster these data" but, in a practical, real-world sense, what is the purpose?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I wrote a blog post called &lt;A href="http://www.statisticalanalysisconsulting.com/how-to-ask-a-statistics-question/"&gt;How to Ask a Statistics Question&lt;/A&gt; that may be useful&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Message was edited by: Peter Flom&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 28 Apr 2015 11:23:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/What-procedure-is-best-for-clustering/m-p/203391#M10905</guid>
      <dc:creator>plf515</dc:creator>
      <dc:date>2015-04-28T11:23:20Z</dc:date>
    </item>
    <item>
      <title>Re: What procedure is best for clustering?</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/What-procedure-is-best-for-clustering/m-p/203392#M10906</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I have to agree with &lt;A __default_attr="370497" __jive_macro_name="user" class="jive_macro jive_macro_user" href="https://communities.sas.com/"&gt;&lt;/A&gt;, this problem description seems to be very non-informative&lt;/P&gt;&lt;PRE __jive_macro_name="quote" class="jive_text_macro jive_macro_quote"&gt;
&lt;P&gt;My objective is to create clusters for&amp;nbsp; positive &lt;SPAN style="font-size: 13.33px;"&gt;count &lt;/SPAN&gt;&lt;SPAN style="font-size: 10pt;"&gt; &lt;/SPAN&gt;&lt;/P&gt;

&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;What does "positive count" mean?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Clusters of what? Variables? Subjects? I don't see how this data, as you described it, can produce clusters anyway, the only real numeric variable is "result", the others ("id" and "collection date") are not meaningful variables, they are just identifiers.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;People just don't create clusters for no reason. Explain what the reason is that you want these clusters, and how the list of variables can possibly produce clusters that are useful to the underlying reason for clusters.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 28 Apr 2015 12:35:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/What-procedure-is-best-for-clustering/m-p/203392#M10906</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2015-04-28T12:35:35Z</dc:date>
    </item>
    <item>
      <title>Re: What procedure is best for clustering?</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/What-procedure-is-best-for-clustering/m-p/203393#M10907</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;&lt;A __default_attr="370497" __jive_macro_name="user" class="jive_macro jive_macro_user" data-objecttype="3" href="https://communities.sas.com/"&gt;&lt;/A&gt; The link to your blog post is incorrect&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 28 Apr 2015 12:37:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/What-procedure-is-best-for-clustering/m-p/203393#M10907</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2015-04-28T12:37:12Z</dc:date>
    </item>
    <item>
      <title>Re: What procedure is best for clustering?</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/What-procedure-is-best-for-clustering/m-p/203394#M10908</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Paige&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Not sure what happened. I am at SGF and a little busy but i will try to fix it ASAP&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If you Google "How to ask a statistics question" and my name you can find it&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Peter&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 28 Apr 2015 12:46:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/What-procedure-is-best-for-clustering/m-p/203394#M10908</guid>
      <dc:creator>plf515</dc:creator>
      <dc:date>2015-04-28T12:46:03Z</dc:date>
    </item>
  </channel>
</rss>

