<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Clustering in SAS Enterprise Miner in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Clustering-in-SAS-Enterprise-Miner/m-p/512715#M7502</link>
    <description>&lt;BLOCKQUOTE&gt;
&lt;P&gt;I have a variable ( which has more than 1000 category ( modality)). I want to group them.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;how can I do it in SAS ENTERPRISE MINER?&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;SAS Enterprise Miner can perform observational clustering (groups observations which are similar with respect to a set of variables) and variable clustering (groups variables together that have tend to vary together).&amp;nbsp; &amp;nbsp;In your case, however, you are looking to group the levels within a single variable.&amp;nbsp; &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;To answer this, the first question to ask is on what basis do I want them grouped?&amp;nbsp; &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For example, I could group them based on&lt;/P&gt;
&lt;P&gt;&amp;nbsp; * &lt;STRONG&gt;cardinality&lt;/STRONG&gt; (how many there are)&lt;/P&gt;
&lt;P&gt;&amp;nbsp; * &lt;STRONG&gt;hierarchy&lt;/STRONG&gt; (how similar they are with regards to some more general categorization)&lt;/P&gt;
&lt;P&gt;&amp;nbsp; * &lt;STRONG&gt;response&lt;/STRONG&gt; (how similar they are with regards to a particular outcome of interest)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regarding using &lt;STRONG&gt;cardinality&lt;/STRONG&gt; -- The Pareto principle often comes into play where 80% of the data is represented by 20% of the levels.&amp;nbsp; Looking at levels which occur commonly enough as their own category initially and then group the remaining infrequently occurring levels into one or more other categories.&amp;nbsp; &amp;nbsp;Levels with too few observations have little impact on the full solution but can vary wildly so grouping them together reduces cardinality while providing a more stable solution.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regarding using &lt;STRONG&gt;hierarchy&lt;/STRONG&gt; -- If there are natural groupings of levels that makes sense, you might get a better solution using that hierarchy.&amp;nbsp; For example, suppose you wanted to group the SKU numbers in a grocery store.&amp;nbsp; You might look to higher levels of the SKU like grouping all the types of grapes together.&amp;nbsp; You could also look higher in the hierarchy and group all kinds of fruit or even all types of produce together.&amp;nbsp; &amp;nbsp;Variables with a large number of levels can sometimes be better represented by multiple variables which represent different levels of the hierarchy.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regarding using &lt;STRONG&gt;response&lt;/STRONG&gt; -- If you want to group the levels using an outcome variable, then you can simply fit a Decision Tree using your response variable of interest as your target variable.&amp;nbsp; Each split the tree makes will parse the levels of your variable so that you can choose your final groupings based on the tree that you build.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hope this helps!&lt;/P&gt;
&lt;P&gt;Doug&lt;/P&gt;</description>
    <pubDate>Tue, 13 Nov 2018 19:48:10 GMT</pubDate>
    <dc:creator>DougWielenga</dc:creator>
    <dc:date>2018-11-13T19:48:10Z</dc:date>
    <item>
      <title>Clustering in SAS Enterprise Miner</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Clustering-in-SAS-Enterprise-Miner/m-p/487548#M7313</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hi SAS,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a variable ( which has more than 1000 category ( modality)). I want to group them.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;how can I do it in SAS ENTERPRISE MINER?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;thanks&lt;/P&gt;&lt;P&gt;Sonia&lt;/P&gt;</description>
      <pubDate>Thu, 16 Aug 2018 18:36:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Clustering-in-SAS-Enterprise-Miner/m-p/487548#M7313</guid>
      <dc:creator>sonia_qc</dc:creator>
      <dc:date>2018-08-16T18:36:09Z</dc:date>
    </item>
    <item>
      <title>Re: Clustering in SAS Enterprise Miner</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Clustering-in-SAS-Enterprise-Miner/m-p/512715#M7502</link>
      <description>&lt;BLOCKQUOTE&gt;
&lt;P&gt;I have a variable ( which has more than 1000 category ( modality)). I want to group them.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;how can I do it in SAS ENTERPRISE MINER?&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;SAS Enterprise Miner can perform observational clustering (groups observations which are similar with respect to a set of variables) and variable clustering (groups variables together that have tend to vary together).&amp;nbsp; &amp;nbsp;In your case, however, you are looking to group the levels within a single variable.&amp;nbsp; &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;To answer this, the first question to ask is on what basis do I want them grouped?&amp;nbsp; &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For example, I could group them based on&lt;/P&gt;
&lt;P&gt;&amp;nbsp; * &lt;STRONG&gt;cardinality&lt;/STRONG&gt; (how many there are)&lt;/P&gt;
&lt;P&gt;&amp;nbsp; * &lt;STRONG&gt;hierarchy&lt;/STRONG&gt; (how similar they are with regards to some more general categorization)&lt;/P&gt;
&lt;P&gt;&amp;nbsp; * &lt;STRONG&gt;response&lt;/STRONG&gt; (how similar they are with regards to a particular outcome of interest)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regarding using &lt;STRONG&gt;cardinality&lt;/STRONG&gt; -- The Pareto principle often comes into play where 80% of the data is represented by 20% of the levels.&amp;nbsp; Looking at levels which occur commonly enough as their own category initially and then group the remaining infrequently occurring levels into one or more other categories.&amp;nbsp; &amp;nbsp;Levels with too few observations have little impact on the full solution but can vary wildly so grouping them together reduces cardinality while providing a more stable solution.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regarding using &lt;STRONG&gt;hierarchy&lt;/STRONG&gt; -- If there are natural groupings of levels that makes sense, you might get a better solution using that hierarchy.&amp;nbsp; For example, suppose you wanted to group the SKU numbers in a grocery store.&amp;nbsp; You might look to higher levels of the SKU like grouping all the types of grapes together.&amp;nbsp; You could also look higher in the hierarchy and group all kinds of fruit or even all types of produce together.&amp;nbsp; &amp;nbsp;Variables with a large number of levels can sometimes be better represented by multiple variables which represent different levels of the hierarchy.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regarding using &lt;STRONG&gt;response&lt;/STRONG&gt; -- If you want to group the levels using an outcome variable, then you can simply fit a Decision Tree using your response variable of interest as your target variable.&amp;nbsp; Each split the tree makes will parse the levels of your variable so that you can choose your final groupings based on the tree that you build.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hope this helps!&lt;/P&gt;
&lt;P&gt;Doug&lt;/P&gt;</description>
      <pubDate>Tue, 13 Nov 2018 19:48:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Clustering-in-SAS-Enterprise-Miner/m-p/512715#M7502</guid>
      <dc:creator>DougWielenga</dc:creator>
      <dc:date>2018-11-13T19:48:10Z</dc:date>
    </item>
  </channel>
</rss>

