<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Collapse levels of a categorical variable in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Collapse-levels-of-a-categorical-variable/m-p/211764#M39191</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Sure, write a macro to take the number of parameters you have, and the number you want and evaluate the rules &lt;img id="smileyhappy" class="emoticon emoticon-smileyhappy" src="https://communities.sas.com/i/smilies/16x16_smiley-happy.png" alt="Smiley Happy" title="Smiley Happy" /&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It could also be considered a decision tree analysis if you have SAS OR, and there may be macros for base SAS - try lexjansen.com&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Mon, 22 Jun 2015 21:03:49 GMT</pubDate>
    <dc:creator>Reeza</dc:creator>
    <dc:date>2015-06-22T21:03:49Z</dc:date>
    <item>
      <title>Collapse levels of a categorical variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Collapse-levels-of-a-categorical-variable/m-p/211761#M39188</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Team,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is there a way we can collapse the levels of categorical variables in an automated manner? One way i know is to calculate the % of events falling in each category and run cluster analysis on it. Can this be automated for multiple variables considering only event rate, not cluster analysis?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;For example, there is a variable called "Char A". I have calculated Event Rate for this variable, i.e. percentage of 1s appearing in dependent variable (let's say VarY). It is simply the mean of Y.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;TABLE border="0" cellpadding="0" cellspacing="0" style="width: 128px;"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD class="xl65" height="20" width="64"&gt;Char A&lt;/TD&gt;&lt;TD class="xl65" style="border-left: none;" width="64"&gt;Event Rate&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl63" height="20" style="border-top: none;"&gt;A&lt;/TD&gt;&lt;TD class="xl64" style="border-top: none; border-left: none;"&gt;49%&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl63" height="20" style="border-top: none;"&gt;B&lt;/TD&gt;&lt;TD class="xl64" style="border-top: none; border-left: none;"&gt;67%&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl63" height="20" style="border-top: none;"&gt;C&lt;/TD&gt;&lt;TD class="xl64" style="border-top: none; border-left: none;"&gt;2%&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl63" height="20" style="border-top: none;"&gt;D&lt;/TD&gt;&lt;TD class="xl64" style="border-top: none; border-left: none;"&gt;87%&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl63" height="20" style="border-top: none;"&gt;E&lt;/TD&gt;&lt;TD class="xl64" style="border-top: none; border-left: none;"&gt;4%&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl63" height="20" style="border-top: none;"&gt;F&lt;/TD&gt;&lt;TD class="xl64" style="border-top: none; border-left: none;"&gt;3%&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Next step is to combine categories of C,E and F as they have almost similar event rate (let's say, variation within 5%).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Method 2 : If the above methodology is complicated to automate, can we make it simple taking only the number of cases falling in Char A (not considering dependent variable). For example, if a categorical level contains atmost&amp;nbsp; 5% observations, combine it with others which all have percentage less than 5%,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;TABLE border="0" cellpadding="0" cellspacing="0" width="128"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD class="xl67" height="20" width="64"&gt;Char A&lt;/TD&gt;&lt;TD class="xl67" style="border-left: none;" width="64"&gt;ColPct&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl65" height="20" style="border-top: none;"&gt;C&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;2%&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl65" height="20" style="border-top: none;"&gt;F&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;3%&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl65" height="20" style="border-top: none;"&gt;E&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;4%&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl65" height="20" style="border-top: none;"&gt;A&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;49%&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl65" height="20" style="border-top: none;"&gt;B&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;67%&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl65" height="20" style="border-top: none;"&gt;D&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;87%&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 13.3333330154419px;"&gt;Combine categories of C,E and F as they have column percentage less than 5%.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 13.3333330154419px;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 13.3333330154419px;"&gt;After using either of the 2 methodologies, we need to replace the levels with the combined category in a raw data file. The raw data file looks like below -&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 13.3333330154419px;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;TABLE border="0" cellpadding="0" cellspacing="0" width="128"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD class="xl65" height="20" width="64"&gt;Char A&lt;/TD&gt;&lt;TD class="xl65" style="border-left: none;" width="64"&gt;Y&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl66" height="20" style="border-top: none;"&gt;C&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl66" height="20" style="border-top: none;"&gt;C&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl66" height="20" style="border-top: none;"&gt;F&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl66" height="20" style="border-top: none;"&gt;E&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;0&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl66" height="20" style="border-top: none;"&gt;E&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;0&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl66" height="20" style="border-top: none;"&gt;A&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl66" height="20" style="border-top: none;"&gt;B&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl66" height="20" style="border-top: none;"&gt;D&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl66" height="20" style="border-top: none;"&gt;E&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;0&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl66" height="20" style="border-top: none;"&gt;A&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;0&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl66" height="20" style="border-top: none;"&gt;B&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl66" height="20" style="border-top: none;"&gt;D&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD class="xl66" height="20" style="border-top: none;"&gt;C&lt;/TD&gt;&lt;TD class="xl66" style="border-top: none; border-left: none;"&gt;1&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&lt;SPAN style="font-size: 13.3333330154419px;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 13.3333330154419px;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Any help would be highly appreciated! Thanks in anticipation!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 22 Jun 2015 20:48:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Collapse-levels-of-a-categorical-variable/m-p/211761#M39188</guid>
      <dc:creator>Ujjawal</dc:creator>
      <dc:date>2015-06-22T20:48:56Z</dc:date>
    </item>
    <item>
      <title>Re: Collapse levels of a categorical variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Collapse-levels-of-a-categorical-variable/m-p/211762#M39189</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;The easiest way is usually a custom format as you don't have to change data and can specify any one the formats at run time.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;with your example:&lt;/P&gt;&lt;P&gt;proc format;&lt;/P&gt;&lt;P&gt;value $MyABC&lt;/P&gt;&lt;P&gt;"C","E","F" = "CEF"; /* or any string you want. */&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;This format only changes the listed values, others would appear as, from your example, A B D&lt;/P&gt;&lt;P&gt;Multiple groups are possible within one format. There are numeric range instructions but results are often odd when attempting to use those with character variables.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Then run your proc with the line below added:&lt;/P&gt;&lt;P&gt;format CharA $MyABC. ;&lt;/P&gt;&lt;P&gt;to the Freq or other analysis proc.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 22 Jun 2015 20:55:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Collapse-levels-of-a-categorical-variable/m-p/211762#M39189</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2015-06-22T20:55:33Z</dc:date>
    </item>
    <item>
      <title>Re: Collapse levels of a categorical variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Collapse-levels-of-a-categorical-variable/m-p/211763#M39190</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thanks for your response. First I need to check the categories that need to be combined in an automated manner. In the example, i know which categories need to be collapsed. I want this to be implemented via code. &lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt;In a raw data file, first i need to calculate column percentage of a categorical (character) variable and then decide the categories to be collapsed and then replace them with the combined category. &lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 22 Jun 2015 21:03:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Collapse-levels-of-a-categorical-variable/m-p/211763#M39190</guid>
      <dc:creator>Ujjawal</dc:creator>
      <dc:date>2015-06-22T21:03:33Z</dc:date>
    </item>
    <item>
      <title>Re: Collapse levels of a categorical variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Collapse-levels-of-a-categorical-variable/m-p/211764#M39191</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Sure, write a macro to take the number of parameters you have, and the number you want and evaluate the rules &lt;img id="smileyhappy" class="emoticon emoticon-smileyhappy" src="https://communities.sas.com/i/smilies/16x16_smiley-happy.png" alt="Smiley Happy" title="Smiley Happy" /&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It could also be considered a decision tree analysis if you have SAS OR, and there may be macros for base SAS - try lexjansen.com&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 22 Jun 2015 21:03:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Collapse-levels-of-a-categorical-variable/m-p/211764#M39191</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2015-06-22T21:03:49Z</dc:date>
    </item>
  </channel>
</rss>

