<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: a question about variable transformation in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/a-question-about-variable-transformation/m-p/101301#M5283</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thank you very much for your information! It is very very enlightening!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Thu, 03 May 2012 03:24:52 GMT</pubDate>
    <dc:creator>doudou66</dc:creator>
    <dc:date>2012-05-03T03:24:52Z</dc:date>
    <item>
      <title>a question about variable transformation</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/a-question-about-variable-transformation/m-p/101299#M5281</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hello, All&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have a binary dependent variable CREDIT_RATING which takes value of either 1(bad) or 0(good), and I have an independent variable INCOME which is continuous. I want to do a logistic regression&lt;/P&gt;&lt;P&gt;MODEL&amp;nbsp; CREDIT_RATING = INCOME&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I was told that I should NOT use INCOME directly in the model; rather I should group INCOME to different categories (such as $0-20,000 as INCOME_1, $20,001-$50,000 as INCOME_2, etc). While the suggestion makes sense intuitively, is there any statistical consideration here? what statistical knowledge was applied here? My another question is: is there any other way to transform INCOME to make it more suitable for the model?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 02 May 2012 17:18:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/a-question-about-variable-transformation/m-p/101299#M5281</guid>
      <dc:creator>doudou66</dc:creator>
      <dc:date>2012-05-02T17:18:12Z</dc:date>
    </item>
    <item>
      <title>Re: a question about variable transformation</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/a-question-about-variable-transformation/m-p/101300#M5282</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;My opinion only:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The answer is dependent on the data set that you have.&amp;nbsp; If INCOME follows the usual distribution with a long tail to the right, then it is likely that high INCOME values will influence the fit more than others.&amp;nbsp; Some have modeled it as a continuous variable, with a log transformation, which is probably close, but it is really a mixture of several different distributions.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The grouping makes a lot of sense and avoids a lot of this influence of single or small groups of records, and may make interpretation easier, but...&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It assumes homogeneity WITHIN the group, i.e., the probability of 0 (good) is exactly the same for all individuals within a group, no matter whether they are near the extremes of the group or not.&lt;/P&gt;&lt;P&gt;It assumes you have good reasons to set your cutpoints where you do.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If you are working on developing a predictive equation with only a single predictor, take a good look at PROC TRANSREG.&amp;nbsp; This would enable you to model the dependent variable as a logit, and the independent variable in a variety of ways--class, optimal transforms, non-optimal transforms, nonlinear transforms (such as Box-Cox or penalized B-splines).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Good luck.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Steve Denham&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 02 May 2012 19:30:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/a-question-about-variable-transformation/m-p/101300#M5282</guid>
      <dc:creator>SteveDenham</dc:creator>
      <dc:date>2012-05-02T19:30:56Z</dc:date>
    </item>
    <item>
      <title>Re: a question about variable transformation</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/a-question-about-variable-transformation/m-p/101301#M5283</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thank you very much for your information! It is very very enlightening!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 03 May 2012 03:24:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/a-question-about-variable-transformation/m-p/101301#M5283</guid>
      <dc:creator>doudou66</dc:creator>
      <dc:date>2012-05-03T03:24:52Z</dc:date>
    </item>
    <item>
      <title>Re: a question about variable transformation</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/a-question-about-variable-transformation/m-p/101302#M5284</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Besides the distributional considerations (long tailed --&amp;gt; undue influence), another statistical consideration is that incomes are often reported as rounded values. That is, it you draw a histogram of your income variable, you will likely see spikes at $40k, $60k, and $100k.&amp;nbsp; Although in general I dislike converting a continous variable to a discrete one, it seems to be a common practice for income, and "rounded incomes" are one reason.&amp;nbsp; Transformations cannot rid your data of this phenomenon.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 03 May 2012 14:00:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/a-question-about-variable-transformation/m-p/101302#M5284</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2012-05-03T14:00:34Z</dc:date>
    </item>
    <item>
      <title>Re: a question about variable transformation</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/a-question-about-variable-transformation/m-p/101303#M5285</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thank you very much for your suggestion.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Anyone has any other idea on this topic? Your input is highly appreciated.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 03 May 2012 15:35:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/a-question-about-variable-transformation/m-p/101303#M5285</guid>
      <dc:creator>doudou66</dc:creator>
      <dc:date>2012-05-03T15:35:55Z</dc:date>
    </item>
    <item>
      <title>Re: a question about variable transformation</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/a-question-about-variable-transformation/m-p/101304#M5286</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I can share a traditional way of deciding on best way to bin continuous variables (using best KS). Firstly you can determine the range of income and split roughly into 8-10 bins. For e.g. if it ranges from 20,000 to 100,000 you can start with &lt;/P&gt;&lt;P&gt; - 20K - 30K&lt;/P&gt;&lt;P&gt;- 30K - 40K&lt;/P&gt;&lt;P&gt;--------------------&lt;/P&gt;&lt;P&gt;90K and above &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;For each bin you would know the actual good/bad distribution and you can come up with KS value (abs diff between Cum good% and Cum bad% for that bin). The bin giving the highest KS would be used as first cut-off to split the variable into two bins. e.g. if 60K has best KS ur first split is &amp;lt;=60K and &amp;gt;60K. You can repeat the best KS method on &amp;lt;=60K distribution to again come up with a suitable cut-off. Similarly repeat it for &amp;gt;60K and so on...you keep doing this until you reach a reasonable level of KS and also maintaining ranking. This is an iterative process and quite useful. &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 04 May 2012 00:56:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/a-question-about-variable-transformation/m-p/101304#M5286</guid>
      <dc:creator>Manivini123</dc:creator>
      <dc:date>2012-05-04T00:56:24Z</dc:date>
    </item>
    <item>
      <title>Re: a question about variable transformation</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/a-question-about-variable-transformation/m-p/101305#M5287</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Classification (or decision) trees would give you optimal cutting points, i.e. the income level categories that make the most difference in credit rating. It is available as &lt;STRONG&gt;Partition analysis&lt;/STRONG&gt; in JMP and &lt;STRONG&gt;Decision trees&lt;/STRONG&gt; in SAS Enterprise Miner.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;PG&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 04 May 2012 01:21:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/a-question-about-variable-transformation/m-p/101305#M5287</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2012-05-04T01:21:34Z</dc:date>
    </item>
    <item>
      <title>Re: a question about variable transformation</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/a-question-about-variable-transformation/m-p/101306#M5288</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thank you all very much for great help.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 04 May 2012 15:44:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/a-question-about-variable-transformation/m-p/101306#M5288</guid>
      <dc:creator>doudou66</dc:creator>
      <dc:date>2012-05-04T15:44:27Z</dc:date>
    </item>
  </channel>
</rss>

