<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to profile and interpret clusters? in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/How-to-profile-and-interpret-clusters/m-p/516870#M26353</link>
    <description>&lt;PRE&gt;&amp;nbsp;&lt;/PRE&gt;&lt;P&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;Hello, I have 77 variables and 27,000 observations. My goal is to find meaningful clusters out of it. I am finding it challenging to interpret the clusters!!&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;What I tried so far is, I performed PCA (using proc Princomp), which gave me an idea of reduced&amp;nbsp;dimension. Then I used the relevant PC's in the Fastclus&amp;nbsp;operations - after few iterations, I found an output that produced the desired number of significant clusters.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;Then, I set the original input variables with the produced clusters&amp;nbsp; I did it as I thought it will enable me to make sense of the clusters in terms of the original variables, even though the PCs were used for deriving clusters.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;My problem is how do&amp;nbsp;I profile the clusters to understand their business significance (interpretation) - I tried using Proc Tabulate but it didn't make sense either because I have 77 original variables to compare with my cluster.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;What should be the next right step - should I try to check multi-collinearity and remove as many variables&amp;nbsp;I can or there is an easier way?? I would appreciate&amp;nbsp;any kind of feedback or tips to resolve this issue.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;Thank You in advance&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;Regards&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;Kino&lt;/FONT&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 28 Nov 2018 20:45:23 GMT</pubDate>
    <dc:creator>kinoo1989</dc:creator>
    <dc:date>2018-11-28T20:45:23Z</dc:date>
    <item>
      <title>How to profile and interpret clusters?</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/How-to-profile-and-interpret-clusters/m-p/516870#M26353</link>
      <description>&lt;PRE&gt;&amp;nbsp;&lt;/PRE&gt;&lt;P&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;Hello, I have 77 variables and 27,000 observations. My goal is to find meaningful clusters out of it. I am finding it challenging to interpret the clusters!!&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;What I tried so far is, I performed PCA (using proc Princomp), which gave me an idea of reduced&amp;nbsp;dimension. Then I used the relevant PC's in the Fastclus&amp;nbsp;operations - after few iterations, I found an output that produced the desired number of significant clusters.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;Then, I set the original input variables with the produced clusters&amp;nbsp; I did it as I thought it will enable me to make sense of the clusters in terms of the original variables, even though the PCs were used for deriving clusters.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;My problem is how do&amp;nbsp;I profile the clusters to understand their business significance (interpretation) - I tried using Proc Tabulate but it didn't make sense either because I have 77 original variables to compare with my cluster.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;What should be the next right step - should I try to check multi-collinearity and remove as many variables&amp;nbsp;I can or there is an easier way?? I would appreciate&amp;nbsp;any kind of feedback or tips to resolve this issue.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;Thank You in advance&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;Regards&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;Kino&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 28 Nov 2018 20:45:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/How-to-profile-and-interpret-clusters/m-p/516870#M26353</guid>
      <dc:creator>kinoo1989</dc:creator>
      <dc:date>2018-11-28T20:45:23Z</dc:date>
    </item>
    <item>
      <title>Re: How to profile and interpret clusters?</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/How-to-profile-and-interpret-clusters/m-p/517029#M26362</link>
      <description>&lt;P&gt;If you want cluster variables ,check PROC VARCLUS.&lt;/P&gt;
&lt;P&gt;If you want pick up the most significant variables ,check PROC PLS or PROC HPGENSELECT.&lt;/P&gt;</description>
      <pubDate>Thu, 29 Nov 2018 13:24:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/How-to-profile-and-interpret-clusters/m-p/517029#M26362</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2018-11-29T13:24:32Z</dc:date>
    </item>
  </channel>
</rss>

