<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Removing levels in the logistic regression model in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Removing-levels-in-the-logistic-regression-model/m-p/319189#M16877</link>
    <description>&lt;P&gt;Does every level have enough obs for model ?&lt;/P&gt;
&lt;P&gt;If it was, you could try PROC HPGENSELECT to pick up the most significant levels.&lt;/P&gt;</description>
    <pubDate>Thu, 15 Dec 2016 08:20:23 GMT</pubDate>
    <dc:creator>Ksharp</dc:creator>
    <dc:date>2016-12-15T08:20:23Z</dc:date>
    <item>
      <title>Removing levels in the logistic regression model</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Removing-levels-in-the-logistic-regression-model/m-p/318921#M16854</link>
      <description>&lt;P&gt;Suppose if we have a categorical variable called zip codes and it has too many levels which is affecting the model. What is the most appropiate method to reduce the level of zipcodes in the logistic regression model.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Use greencase method?&lt;/P&gt;&lt;P&gt;Dummy variables?&lt;/P&gt;&lt;P&gt;Make it continous by reducing levels?&lt;/P&gt;&lt;P&gt;Frequency method?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;How would anyone deal with it? Is greencase method appropriate to reduce the levels of zip code?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Sameer&lt;/P&gt;</description>
      <pubDate>Wed, 14 Dec 2016 13:29:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Removing-levels-in-the-logistic-regression-model/m-p/318921#M16854</guid>
      <dc:creator>sameer112217</dc:creator>
      <dc:date>2016-12-14T13:29:02Z</dc:date>
    </item>
    <item>
      <title>Re: Removing levels in the logistic regression model</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Removing-levels-in-the-logistic-regression-model/m-p/318929#M16856</link>
      <description>&lt;P&gt;What is the Greencase method? Google shows nothing....&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Combine into spatially larger regions. Maybe counties?&lt;/P&gt;</description>
      <pubDate>Wed, 14 Dec 2016 13:50:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Removing-levels-in-the-logistic-regression-model/m-p/318929#M16856</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-12-14T13:50:10Z</dc:date>
    </item>
    <item>
      <title>Re: Removing levels in the logistic regression model</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Removing-levels-in-the-logistic-regression-model/m-p/318954#M16859</link>
      <description>&lt;P&gt;Depending on my project I would likely try to identify Zip codes with similar characterstics pertinent to the dependent variables&amp;nbsp;that are contiguous and recode. I would not combine Zips that were predominately rural with low population densite with urban or suburban for example. If practical you might look to replace with Metropolitan Statistical Areas or similar.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Note that Proc Logisitc handles categorical variables by creating internal dummy variables for the levels of the variable (minus one ).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In no way should Zip codes EVER be allowed to be treated as continuous.&lt;/P&gt;</description>
      <pubDate>Wed, 14 Dec 2016 15:26:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Removing-levels-in-the-logistic-regression-model/m-p/318954#M16859</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2016-12-14T15:26:12Z</dc:date>
    </item>
    <item>
      <title>Re: Removing levels in the logistic regression model</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Removing-levels-in-the-logistic-regression-model/m-p/318995#M16862</link>
      <description>&lt;P&gt;Cluster by using&amp;nbsp;Greenacre's method.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Also I never said we can make it continous but reiduce it giving numbers to level..&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;thanks everyone..&lt;/P&gt;</description>
      <pubDate>Wed, 14 Dec 2016 17:00:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Removing-levels-in-the-logistic-regression-model/m-p/318995#M16862</guid>
      <dc:creator>sameer112217</dc:creator>
      <dc:date>2016-12-14T17:00:52Z</dc:date>
    </item>
    <item>
      <title>Re: Removing levels in the logistic regression model</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Removing-levels-in-the-logistic-regression-model/m-p/319000#M16864</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/89720"&gt;@sameer112217&lt;/a&gt; wrote:&lt;BR /&gt;
&lt;P&gt;Also I never said we can make it continous but reiduce it giving numbers to level..&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;From your orignal post:&lt;/P&gt;
&lt;P&gt;"Make it continous by reducing levels?"&lt;/P&gt;</description>
      <pubDate>Wed, 14 Dec 2016 17:41:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Removing-levels-in-the-logistic-regression-model/m-p/319000#M16864</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2016-12-14T17:41:05Z</dc:date>
    </item>
    <item>
      <title>Re: Removing levels in the logistic regression model</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Removing-levels-in-the-logistic-regression-model/m-p/319189#M16877</link>
      <description>&lt;P&gt;Does every level have enough obs for model ?&lt;/P&gt;
&lt;P&gt;If it was, you could try PROC HPGENSELECT to pick up the most significant levels.&lt;/P&gt;</description>
      <pubDate>Thu, 15 Dec 2016 08:20:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Removing-levels-in-the-logistic-regression-model/m-p/319189#M16877</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2016-12-15T08:20:23Z</dc:date>
    </item>
    <item>
      <title>Re: Removing levels in the logistic regression model</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Removing-levels-in-the-logistic-regression-model/m-p/319252#M16881</link>
      <description>&lt;P&gt;yes like suppose if we have zip code for a city like mumbai in&amp;nbsp;bandra region&amp;nbsp;which starts from 400064. Mumbai comes in the state maharashtra, We could make it 1 for mumbai city irrespective of region...something like that...what is the best way to do in real corporate world to reduce the levels in regression?&lt;/P&gt;</description>
      <pubDate>Thu, 15 Dec 2016 13:56:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Removing-levels-in-the-logistic-regression-model/m-p/319252#M16881</guid>
      <dc:creator>sameer112217</dc:creator>
      <dc:date>2016-12-15T13:56:12Z</dc:date>
    </item>
    <item>
      <title>Re: Removing levels in the logistic regression model</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Removing-levels-in-the-logistic-regression-model/m-p/319254#M16882</link>
      <description>&lt;P&gt;1. Combine into larger spatial areas that make sense geographically&lt;/P&gt;
&lt;P&gt;2. Combine based on similar measurements of other variables -&amp;gt; perhaps via a cluster mechanism.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1 is easier and you msintsin interpretability of your model. It can be revisited in a later revision of the model,&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 15 Dec 2016 13:59:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Removing-levels-in-the-logistic-regression-model/m-p/319254#M16882</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-12-15T13:59:41Z</dc:date>
    </item>
    <item>
      <title>Re: Removing levels in the logistic regression model</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Removing-levels-in-the-logistic-regression-model/m-p/319303#M16883</link>
      <description>&lt;P&gt;One way would be to create custom formats to group your codes. That way you need not actually change data and you could have multiple formats as needed. Proc Logistic will honor the formatted values to create groups. Here is a simplistic example modified from SAS online documentation to illustrate:&lt;/P&gt;
&lt;PRE&gt;Data Neuralgia;
   input Treatment $ Sex $ Age Duration Pain $ @@;
   datalines;
P  F  68   1  No   B  M  74  16  No  P  F  67  30  No
P  M  66  26  Yes  B  F  67  28  No  B  F  77  16  No
A  F  71  12  No   B  F  72  50  No  B  F  76   9  Yes
A  M  71  17  Yes  A  F  63  27  No  A  F  69  18  Yes
B  F  66  12  No   A  M  62  42  No  P  F  64   1  Yes
A  F  64  17  No   P  M  74   4  No  A  F  72  25  No
P  M  70   1  Yes  B  M  66  19  No  B  M  59  29  No
A  F  64  30  No   A  M  70  28  No  A  M  69   1  No
B  F  78   1  No   P  M  83   1  Yes B  F  69  42  No
B  M  75  30  Yes  P  M  77  29  Yes P  F  79  20  Yes
A  M  70  12  No   A  F  69  12  No  B  F  65  14  No
B  M  70   1  No   B  M  67  23  No  A  M  76  25  Yes
P  M  78  12  Yes  B  M  77   1  Yes B  F  69  24  No
P  M  66   4  Yes  P  F  65  29  No  P  M  60  26  Yes
A  M  78  15  Yes  B  M  75  21  Yes A  F  67  11  No
P  F  72  27  No   P  F  70  13  Yes A  M  75   6  Yes
B  F  65   7  No   P  F  68  27  Yes P  M  68  11  Yes
P  M  67  17  Yes  B  M  70  22  No  A  M  65  15  No
P  F  67   1  Yes  A  M  67  10  No  P  F  72  11  Yes
A  F  74   1  No   B  M  80  21  Yes A  F  69   3  No
;
run;

proc format library=work;
value $alttreat
"P","A" = 'Alt'
;
run;
proc logistic data=Neuralgia;
   Title 'Original Treatment values';
   class Treatment Sex;
   model Pain= Treatment Sex Treatment*Sex Age Duration / expb;
run;

proc logistic data=Neuralgia;
   title "Formatted treatment values";
   class Treatment Sex;
   model Pain= Treatment Sex Treatment*Sex Age Duration / expb;
   format treatment $altTreat.;
run; title;




&lt;/PRE&gt;
&lt;P&gt;In the case of cities that may have muliple codes it would likely be relatively easy using a reference data set to create format to represent city from codes and another with province/state or similar.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I do this for my data with a data sources that only has a postal code to get either city or county (a sub-region of states within the USA)&lt;/P&gt;</description>
      <pubDate>Thu, 15 Dec 2016 16:18:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Removing-levels-in-the-logistic-regression-model/m-p/319303#M16883</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2016-12-15T16:18:44Z</dc:date>
    </item>
  </channel>
</rss>

