<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Categorical inputs and standadisation in Cluster Analysis in SAS Academy for Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Categorical-inputs-and-standadisation-in-Cluster-Analysis/m-p/646590#M753</link>
    <description>&lt;P&gt;Re:&amp;nbsp;&lt;FONT style="background-color: #ffffff;"&gt;Applied Analytics Using SAS Enterprise Miner&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;I have a couple of questions on Cluster Analysis (chapter 8 of course notes):&lt;/P&gt;
&lt;P&gt;&lt;FONT style="background-color: #ffffff;"&gt;1. In what scenarios should categorical variables, via dummy indicators, be used for Clustering? &lt;/FONT&gt;&lt;FONT style="background-color: #ffffff;"&gt;Or would it just be better to use interval variables as suggested by the course notes at page 8-9?&amp;nbsp;&lt;FONT style="background-color: #ffffff; box-sizing: border-box; color: #333333; font-family: Arial,Helvetica,sans-serif; font-size: 16px; font-style: normal; font-variant: normal; font-weight: 300; letter-spacing: normal; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;"&gt;("An interval measurement level is recommended for k-means to produce non-trivial clusters")&lt;/FONT&gt;&lt;BR /&gt;2. In what instances would a Range Standardisation (with reference to property "Internal Standardization") be recommend in place of the usual standardisation (i.e. subtracting the mean and dividing by the standard deviation)?&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 11 May 2020 09:10:35 GMT</pubDate>
    <dc:creator>pvareschi</dc:creator>
    <dc:date>2020-05-11T09:10:35Z</dc:date>
    <item>
      <title>Categorical inputs and standadisation in Cluster Analysis</title>
      <link>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Categorical-inputs-and-standadisation-in-Cluster-Analysis/m-p/646590#M753</link>
      <description>&lt;P&gt;Re:&amp;nbsp;&lt;FONT style="background-color: #ffffff;"&gt;Applied Analytics Using SAS Enterprise Miner&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;I have a couple of questions on Cluster Analysis (chapter 8 of course notes):&lt;/P&gt;
&lt;P&gt;&lt;FONT style="background-color: #ffffff;"&gt;1. In what scenarios should categorical variables, via dummy indicators, be used for Clustering? &lt;/FONT&gt;&lt;FONT style="background-color: #ffffff;"&gt;Or would it just be better to use interval variables as suggested by the course notes at page 8-9?&amp;nbsp;&lt;FONT style="background-color: #ffffff; box-sizing: border-box; color: #333333; font-family: Arial,Helvetica,sans-serif; font-size: 16px; font-style: normal; font-variant: normal; font-weight: 300; letter-spacing: normal; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;"&gt;("An interval measurement level is recommended for k-means to produce non-trivial clusters")&lt;/FONT&gt;&lt;BR /&gt;2. In what instances would a Range Standardisation (with reference to property "Internal Standardization") be recommend in place of the usual standardisation (i.e. subtracting the mean and dividing by the standard deviation)?&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 11 May 2020 09:10:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Categorical-inputs-and-standadisation-in-Cluster-Analysis/m-p/646590#M753</guid>
      <dc:creator>pvareschi</dc:creator>
      <dc:date>2020-05-11T09:10:35Z</dc:date>
    </item>
    <item>
      <title>Re: Categorical inputs and standadisation in Cluster Analysis</title>
      <link>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Categorical-inputs-and-standadisation-in-Cluster-Analysis/m-p/647139#M767</link>
      <description>&lt;P&gt;I have a couple of questions on Cluster Analysis (chapter 8 of course notes):&lt;/P&gt;
&lt;P&gt;&lt;FONT&gt;1. In what scenarios should categorical variables, via dummy indicators, be used for Clustering?&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;FONT&gt;Or would it just be better to use interval variables as suggested by the course notes at page 8-9?&amp;nbsp;("An interval measurement level is recommended for k-means to produce non-trivial clusters")&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;FONT color="#0000FF"&gt;My Answers:&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT&gt;&lt;STRONG&gt;&lt;FONT color="#0000FF"&gt;For K-means and Hierarchical clustering&amp;nbsp; interval variables are recommended. SAS HP cluster node also can perform ABC clustering based on Manhattan distance. For this option you can also include dummy variables from a categorical var.&lt;/FONT&gt;&lt;/STRONG&gt;&lt;BR /&gt;2. In what instances would a Range Standardisation (with reference to property "Internal Standardization") be recommend in place of the usual standardisation (i.e. subtracting the mean and dividing by the standard deviation)?&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;FONT color="#0000FF"&gt;My answer:&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;FONT color="#0000FF"&gt;For K-mean clustering and PCA , Z-standardization is preferred. For some special NN machine learning algorithm Range-normalization may be preferred.&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 12 May 2020 15:47:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Categorical-inputs-and-standadisation-in-Cluster-Analysis/m-p/647139#M767</guid>
      <dc:creator>gcjfernandez</dc:creator>
      <dc:date>2020-05-12T15:47:40Z</dc:date>
    </item>
  </channel>
</rss>

