<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: proc cluster for mixed data in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/proc-cluster-for-mixed-data/m-p/27360#M6246</link>
    <description>FASTCLUS has a lot of limitations, and is not suitable for mixed data.&lt;BR /&gt;
&lt;BR /&gt;
I guess I will have to use PROC DISTANCE with Gower's dissimilarity. But when I run PROC CLUSTER, which distance method will be the most appropriate?&lt;BR /&gt;
&lt;BR /&gt;
Thanks,&lt;BR /&gt;
Romakanta</description>
    <pubDate>Wed, 25 Jun 2008 06:11:53 GMT</pubDate>
    <dc:creator>datalligence</dc:creator>
    <dc:date>2008-06-25T06:11:53Z</dc:date>
    <item>
      <title>proc cluster for mixed data</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/proc-cluster-for-mixed-data/m-p/27358#M6244</link>
      <description>I have a data set of about 600,000 obs. The variables I would like to use for grouping observations/transactions include numeric and categorical variables.&lt;BR /&gt;
&lt;BR /&gt;
In PROC CLUSTER, which METHOD or distance measure would be the most appropriate?</description>
      <pubDate>Tue, 24 Jun 2008 12:32:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/proc-cluster-for-mixed-data/m-p/27358#M6244</guid>
      <dc:creator>datalligence</dc:creator>
      <dc:date>2008-06-24T12:32:32Z</dc:date>
    </item>
    <item>
      <title>Re: proc cluster for mixed data</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/proc-cluster-for-mixed-data/m-p/27359#M6245</link>
      <description>Hi.&lt;BR /&gt;
1) You will wait a long time for CLUSTER to cope with computations on such a big amount of observations. Consider using FASTCLUS to do the job, or at least create first-level clusters that would be processed afterwards (the two-stage method, I think the correct name for the method is when you look in the SAS help).&lt;BR /&gt;
2) Use PRINQUAL or CORRESP procedures to pre-process your data : these can create numeric (continuous) variables summarizing information in categorical variables. Then merge with the already existing numeric information. And then cluster.&lt;BR /&gt;
Regards.&lt;BR /&gt;
Olivier</description>
      <pubDate>Tue, 24 Jun 2008 18:06:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/proc-cluster-for-mixed-data/m-p/27359#M6245</guid>
      <dc:creator>Olivier</dc:creator>
      <dc:date>2008-06-24T18:06:32Z</dc:date>
    </item>
    <item>
      <title>Re: proc cluster for mixed data</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/proc-cluster-for-mixed-data/m-p/27360#M6246</link>
      <description>FASTCLUS has a lot of limitations, and is not suitable for mixed data.&lt;BR /&gt;
&lt;BR /&gt;
I guess I will have to use PROC DISTANCE with Gower's dissimilarity. But when I run PROC CLUSTER, which distance method will be the most appropriate?&lt;BR /&gt;
&lt;BR /&gt;
Thanks,&lt;BR /&gt;
Romakanta</description>
      <pubDate>Wed, 25 Jun 2008 06:11:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/proc-cluster-for-mixed-data/m-p/27360#M6246</guid>
      <dc:creator>datalligence</dc:creator>
      <dc:date>2008-06-25T06:11:53Z</dc:date>
    </item>
  </channel>
</rss>

