<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Normalization and standardization in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Normalization-and-standardization/m-p/358557#M5302</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I wonder if anyone can help me about some simple questions, I have a labelled dataset on which I am looking to apply decision tree, neural network, SVM and random forest algorithms.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have done basic normalization and standardization on all columns and left only three columns which contain 0 or 1 values as a flag&amp;nbsp;&lt;/P&gt;&lt;P&gt;for example three flags called read, write and execute which may only contain 0 or 1 as a value. Further on my main target variable called CAT which was initially containing only two values 0f 1 or 2 for two categories lets say hardware =1 and software=2.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My standardization routine also changed it to&amp;nbsp;-1.598497 for hardware and 0.625538 for software. &amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My first question is do i really need to convert this CAT variable to standardized values for above mentioned algorithms or I can ignore it for this column and use 1 and 2 as normal values.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My second question , if I replace my values manually for these two columns with 0 for hardware and 1 for software. Is it a bad practice or going to create wrong results as compare to the values of 1 and 2 or&amp;nbsp;&lt;SPAN&gt;-1.598497 and 0.625538.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Please help me about this, which one of these values should be appropriate for ANN, DTs,RF and SVM.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Regards&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sun, 14 May 2017 14:31:57 GMT</pubDate>
    <dc:creator>geniusgenie</dc:creator>
    <dc:date>2017-05-14T14:31:57Z</dc:date>
    <item>
      <title>Normalization and standardization</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Normalization-and-standardization/m-p/358557#M5302</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I wonder if anyone can help me about some simple questions, I have a labelled dataset on which I am looking to apply decision tree, neural network, SVM and random forest algorithms.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have done basic normalization and standardization on all columns and left only three columns which contain 0 or 1 values as a flag&amp;nbsp;&lt;/P&gt;&lt;P&gt;for example three flags called read, write and execute which may only contain 0 or 1 as a value. Further on my main target variable called CAT which was initially containing only two values 0f 1 or 2 for two categories lets say hardware =1 and software=2.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My standardization routine also changed it to&amp;nbsp;-1.598497 for hardware and 0.625538 for software. &amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My first question is do i really need to convert this CAT variable to standardized values for above mentioned algorithms or I can ignore it for this column and use 1 and 2 as normal values.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My second question , if I replace my values manually for these two columns with 0 for hardware and 1 for software. Is it a bad practice or going to create wrong results as compare to the values of 1 and 2 or&amp;nbsp;&lt;SPAN&gt;-1.598497 and 0.625538.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Please help me about this, which one of these values should be appropriate for ANN, DTs,RF and SVM.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Regards&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 14 May 2017 14:31:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Normalization-and-standardization/m-p/358557#M5302</guid>
      <dc:creator>geniusgenie</dc:creator>
      <dc:date>2017-05-14T14:31:57Z</dc:date>
    </item>
    <item>
      <title>Re: Normalization and standardization</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Normalization-and-standardization/m-p/358568#M5304</link>
      <description>&lt;P&gt;Generally, these algorithms react to the variance of the input variables, and so setting the variance of ALL Input variables to 1 makes each variable &lt;EM&gt;a priori&lt;/EM&gt; have equal importance. If you leave the 0/1 binary variables as 0/1, then these will have a different variance and become less important — or more important — than the other variables. So, a good first analysis would not use 0/1, but it would use the standardardized values.&lt;/P&gt;</description>
      <pubDate>Sun, 14 May 2017 16:04:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Normalization-and-standardization/m-p/358568#M5304</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2017-05-14T16:04:31Z</dc:date>
    </item>
    <item>
      <title>Re: Normalization and standardization</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Normalization-and-standardization/m-p/358587#M5305</link>
      <description>&lt;P&gt;If you search "Andrew Gelman Variable Standardization" you'll get some interesting background thoughts on standardizing variables including binary variables. The last two links are quite informative IMO.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="http://andrewgelman.com/2009/07/11/when_to_standar/" target="_blank"&gt;http://andrewgelman.com/2009/07/11/when_to_standar/&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="http://andrewgelman.com/2012/08/18/standardizing-regression-inputs/" target="_blank"&gt;http://andrewgelman.com/2012/08/18/standardizing-regression-inputs/&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="http://www.stat.columbia.edu/~gelman/research/published/standardizing7.pdf" target="_blank"&gt;http://www.stat.columbia.edu/~gelman/research/published/standardizing7.pdf&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 14 May 2017 19:06:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Normalization-and-standardization/m-p/358587#M5305</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-05-14T19:06:31Z</dc:date>
    </item>
    <item>
      <title>Re: Normalization and standardization</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Normalization-and-standardization/m-p/358589#M5306</link>
      <description>Thanks a lot Paigemiller and Reeza i will follow your suggestions. Hope to get good results.&lt;BR /&gt;&lt;BR /&gt;Regards&lt;BR /&gt;</description>
      <pubDate>Sun, 14 May 2017 19:13:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Normalization-and-standardization/m-p/358589#M5306</guid>
      <dc:creator>geniusgenie</dc:creator>
      <dc:date>2017-05-14T19:13:52Z</dc:date>
    </item>
  </channel>
</rss>

