<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Proc Discrim on Clustered data in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Proc-Discrim-on-Clustered-data/m-p/485873#M25168</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I need help on &lt;SPAN&gt;generating discriminant statistics to classify data generated by cluster analysis.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In the cluster analysis,&amp;nbsp;I have done dimension reduction using&amp;nbsp;proc factor (method=principal rotate=varimax) which give me 6 factors.&amp;nbsp; I then use&amp;nbsp;proc cluster (Ward's method) and end up with 7 clusters.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is what I have below correct?&amp;nbsp; What value of k should i use?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;PROC DISCRIM data=train TESTDATA=test testout=newgroups method=npar k=6 OUTSTAT=newStat;&lt;BR /&gt;var Factors1-Factors6;&lt;BR /&gt;class&amp;nbsp;clusterID;&lt;BR /&gt;id dataID;&lt;BR /&gt;run;&lt;/P&gt;</description>
    <pubDate>Fri, 10 Aug 2018 17:08:03 GMT</pubDate>
    <dc:creator>Fae</dc:creator>
    <dc:date>2018-08-10T17:08:03Z</dc:date>
    <item>
      <title>Proc Discrim on Clustered data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Proc-Discrim-on-Clustered-data/m-p/485873#M25168</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I need help on &lt;SPAN&gt;generating discriminant statistics to classify data generated by cluster analysis.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In the cluster analysis,&amp;nbsp;I have done dimension reduction using&amp;nbsp;proc factor (method=principal rotate=varimax) which give me 6 factors.&amp;nbsp; I then use&amp;nbsp;proc cluster (Ward's method) and end up with 7 clusters.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is what I have below correct?&amp;nbsp; What value of k should i use?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;PROC DISCRIM data=train TESTDATA=test testout=newgroups method=npar k=6 OUTSTAT=newStat;&lt;BR /&gt;var Factors1-Factors6;&lt;BR /&gt;class&amp;nbsp;clusterID;&lt;BR /&gt;id dataID;&lt;BR /&gt;run;&lt;/P&gt;</description>
      <pubDate>Fri, 10 Aug 2018 17:08:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Proc-Discrim-on-Clustered-data/m-p/485873#M25168</guid>
      <dc:creator>Fae</dc:creator>
      <dc:date>2018-08-10T17:08:03Z</dc:date>
    </item>
    <item>
      <title>Re: Proc Discrim on Clustered data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Proc-Discrim-on-Clustered-data/m-p/485949#M25169</link>
      <description>&lt;P&gt;Now that you have the clusters, why not perform the discriminant analysis on the original variables? And why not start with parametric methods?&lt;/P&gt;
&lt;P&gt;When doing a non-parametric discriminant analysis on principal components you won't get reusable classification rules or any insight about classification logic.&lt;/P&gt;</description>
      <pubDate>Fri, 10 Aug 2018 20:55:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Proc-Discrim-on-Clustered-data/m-p/485949#M25169</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2018-08-10T20:55:44Z</dc:date>
    </item>
    <item>
      <title>Re: Proc Discrim on Clustered data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Proc-Discrim-on-Clustered-data/m-p/486427#M25217</link>
      <description>&lt;P&gt;Can &lt;SPAN&gt;discriminant&amp;nbsp;&lt;/SPAN&gt;analysis handle collinearity?&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for your suggestion about using&amp;nbsp;&lt;SPAN&gt;parametric method, i will check&amp;nbsp;what's their distribution, hopefully the variables or the log-transformed&amp;nbsp;variables are normally distributed.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 13 Aug 2018 17:45:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Proc-Discrim-on-Clustered-data/m-p/486427#M25217</guid>
      <dc:creator>Fae</dc:creator>
      <dc:date>2018-08-13T17:45:58Z</dc:date>
    </item>
    <item>
      <title>Re: Proc Discrim on Clustered data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Proc-Discrim-on-Clustered-data/m-p/486477#M25221</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/201239"&gt;@Fae&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Can &lt;SPAN&gt;discriminant&amp;nbsp;&lt;/SPAN&gt;analysis handle collinearity?&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Yes discriminant analysis can handle collinearity. When two variables are colinear, their multivariate distribution will look like an oblique ellisoid. Proc discrim is a multivariate procedure that handles such distributions, within each cluster. Parametric discrimination assumes that the multivariate distribution of each cluster&amp;nbsp;is multinormal. If you look at data from a multinormal distribution, one variable at&amp;nbsp;a time, you will see normal distributions, even if the variables are not completely independent.&lt;/P&gt;
&lt;P&gt;Proc discrim gives you the choice between the hypothesis that every cluster has the same covariance matrix, or not, with option POOL=YES/NO..&lt;/P&gt;</description>
      <pubDate>Mon, 13 Aug 2018 20:27:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Proc-Discrim-on-Clustered-data/m-p/486477#M25221</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2018-08-13T20:27:40Z</dc:date>
    </item>
    <item>
      <title>Re: Proc Discrim on Clustered data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Proc-Discrim-on-Clustered-data/m-p/486721#M25232</link>
      <description>&lt;P&gt;One quick question, if I were just to stick with the principal factors, using non-&lt;SPAN&gt;Parametric&amp;nbsp;method, how do i pick k=?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;For Parametric method, do I use use all the variables (Standardized and log-transformed) that go into the original principal component analysis or should I screen them?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 14 Aug 2018 15:59:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Proc-Discrim-on-Clustered-data/m-p/486721#M25232</guid>
      <dc:creator>Fae</dc:creator>
      <dc:date>2018-08-14T15:59:24Z</dc:date>
    </item>
    <item>
      <title>Re: Proc Discrim on Clustered data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Proc-Discrim-on-Clustered-data/m-p/486765#M25240</link>
      <description>&lt;P&gt;I agree with the procedure documentation that says "&lt;EM&gt;In nearest-neighbor methods, the choice of &lt;SPAN class=" aa-mathtext"&gt;k&lt;/SPAN&gt; is usually relatively uncritical (Hand &lt;A href="http://127.0.0.1:63178/help/statug.hlp/statug_discrim_references.htm#statug_discrimhand_d82" target="_blank"&gt;1982&lt;/A&gt;). A practical approach is to try several different values of the smoothing parameters within the context of the particular application and to choose the one that gives the best cross validated estimate of the error rate.&lt;/EM&gt;"&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Use only one version of each variable, the version that looks the most normal. Use only ordinal, preferably continuous, variables.&lt;/P&gt;</description>
      <pubDate>Tue, 14 Aug 2018 18:00:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Proc-Discrim-on-Clustered-data/m-p/486765#M25240</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2018-08-14T18:00:20Z</dc:date>
    </item>
  </channel>
</rss>

