<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Variable Transformation in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-Transformation/m-p/388629#M5860</link>
    <description>&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN&gt;I am wondering about the necessity of transforming my interval-scaled input variables.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Different modeling methods will be differently impacted by the scale/distribution of the input variables. &amp;nbsp; Tree-based models, for instance, would only depend on the ordering of the observations regardless of their magnitude. &amp;nbsp;In reality, you might get different split points when comparing the splits for a variable to the splits for the log of that same variable, but it should not lead to major differences if you have sufficient data. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But is transforming the input variables 'necessary'? &amp;nbsp; The short answer is that it will be more help in some methods than in others. &amp;nbsp; For less flexible modeling methods like regression models, it might be very important in some cases while it might be less important for more flexible modeling methods like neural networks. &amp;nbsp;It should have limited impact on tree-based models as described above.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN&gt;Could someone explain to me the mathematical necessity of binning input variables?&amp;nbsp;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Data preparation is a way to obtain better performing models from the same data set. &amp;nbsp;As I mentioned above, the impact of those transformations can vary greatly depending on the modeling method, the distributions of the variables being transformed, and the modeling methods being used. &amp;nbsp;There is no 'necessity' in that case but it might be desirable. &amp;nbsp;The difference between good and great model could be simply how the data is prepared in some cases. &amp;nbsp; Binning summarizes data which loses information in one sense yet can make the predictive model better should you be using a less flexible method like regression. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For interval variables, considering binned versions of your interval inputs allows you to model non-linearity that might not be easily captured by interactions and/or higher-order terms often used to make a regression model more 'flexible'. &amp;nbsp;Considering both binned and raw versions of these variables in further variable selection will provide the variable selection routine with different ways to use the same information. &amp;nbsp;The binning is not necessary but it stands to reason that considering potentially non-linear relationships should be of help. &amp;nbsp;As a result, the binned variables might have a dramatic impact on regression models but would typically have a lesser impact on nonlinear modeling approaches like trees and neural networks unless the variables were very poorly conditioned.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;Hope this helps!&lt;/P&gt;
&lt;P&gt;Doug&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 16 Aug 2017 21:21:56 GMT</pubDate>
    <dc:creator>DougWielenga</dc:creator>
    <dc:date>2017-08-16T21:21:56Z</dc:date>
    <item>
      <title>Variable Transformation</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-Transformation/m-p/387708#M5792</link>
      <description>&lt;P&gt;Hi there&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am wondering about the necessity of transforming my interval-scaled input variables. My target is also interval-scaled and I perform no transformation on it. I also have class input variables.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Basically, I compare the outputs from 5 models, using the Model Comparison Node and the Average Squared Error is roughly the same for each model regardless if I transformed my input variables (optimal binning) or not. Could someone explain to me the mathematical necessity of binning input variables? In case this information is important, I also intend to transform log (one side of my diagram) and standardize them (another one) after the binning/no binning Node. Hope it is clear.&lt;/P&gt;</description>
      <pubDate>Mon, 14 Aug 2017 09:13:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Variable-Transformation/m-p/387708#M5792</guid>
      <dc:creator>NicolasC</dc:creator>
      <dc:date>2017-08-14T09:13:59Z</dc:date>
    </item>
    <item>
      <title>Re: Variable Transformation</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-Transformation/m-p/388629#M5860</link>
      <description>&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN&gt;I am wondering about the necessity of transforming my interval-scaled input variables.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Different modeling methods will be differently impacted by the scale/distribution of the input variables. &amp;nbsp; Tree-based models, for instance, would only depend on the ordering of the observations regardless of their magnitude. &amp;nbsp;In reality, you might get different split points when comparing the splits for a variable to the splits for the log of that same variable, but it should not lead to major differences if you have sufficient data. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But is transforming the input variables 'necessary'? &amp;nbsp; The short answer is that it will be more help in some methods than in others. &amp;nbsp; For less flexible modeling methods like regression models, it might be very important in some cases while it might be less important for more flexible modeling methods like neural networks. &amp;nbsp;It should have limited impact on tree-based models as described above.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN&gt;Could someone explain to me the mathematical necessity of binning input variables?&amp;nbsp;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Data preparation is a way to obtain better performing models from the same data set. &amp;nbsp;As I mentioned above, the impact of those transformations can vary greatly depending on the modeling method, the distributions of the variables being transformed, and the modeling methods being used. &amp;nbsp;There is no 'necessity' in that case but it might be desirable. &amp;nbsp;The difference between good and great model could be simply how the data is prepared in some cases. &amp;nbsp; Binning summarizes data which loses information in one sense yet can make the predictive model better should you be using a less flexible method like regression. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For interval variables, considering binned versions of your interval inputs allows you to model non-linearity that might not be easily captured by interactions and/or higher-order terms often used to make a regression model more 'flexible'. &amp;nbsp;Considering both binned and raw versions of these variables in further variable selection will provide the variable selection routine with different ways to use the same information. &amp;nbsp;The binning is not necessary but it stands to reason that considering potentially non-linear relationships should be of help. &amp;nbsp;As a result, the binned variables might have a dramatic impact on regression models but would typically have a lesser impact on nonlinear modeling approaches like trees and neural networks unless the variables were very poorly conditioned.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;Hope this helps!&lt;/P&gt;
&lt;P&gt;Doug&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Aug 2017 21:21:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Variable-Transformation/m-p/388629#M5860</guid>
      <dc:creator>DougWielenga</dc:creator>
      <dc:date>2017-08-16T21:21:56Z</dc:date>
    </item>
  </channel>
</rss>

