<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Values used for imputation of missing values in SAS Academy for Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Values-used-for-imputation-of-missing-values/m-p/650513#M818</link>
    <description>&lt;P&gt;Taking as an example the process flow on page 4-20 of the course text, my understanding is that values imputed (e.g. means or medians) are calculated based on the training dataset and used on the validation/test/score datasets.&lt;/P&gt;
&lt;P&gt;However, if oversampling is used, are those values biased? If so, should they not be adjusted for oversampling or is it valid/correct to use them as they are because those are the records also used for fitting the model?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000FF"&gt;&lt;STRONG&gt;My response:&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000FF"&gt;&lt;STRONG&gt;Please see my previous responses related to oversampling: Oversampling doesn't interfere in model selection or model estimation. Only the estimated posterior probabilities needs adjustment (Intercept shift) based on Prior.&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000FF"&gt;&lt;STRONG&gt;Therefore,if missing values are imputed based on training data mean or median (a constant) there is no need to adjust for over sampling.&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000FF"&gt;&lt;STRONG&gt;There are several non-constant missing value imputation methods are also available (Tree based methods, weighted regression method (Huber, Tukey) in SAS Enterprise miner and users can easily test these methods and pick the suitable ones based on their data.&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 25 May 2020 19:35:50 GMT</pubDate>
    <dc:creator>gcjfernandez</dc:creator>
    <dc:date>2020-05-25T19:35:50Z</dc:date>
    <item>
      <title>Values used for imputation of missing values</title>
      <link>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Values-used-for-imputation-of-missing-values/m-p/650202#M815</link>
      <description>&lt;P&gt;Re: Applied Analytics Using SAS Enterprise Miner&lt;/P&gt;
&lt;P&gt;Taking as an example the process flow on page 4-20 of the course text, my understanding is that values imputed (e.g. means or medians) are calculated based on the training dataset and used on the validation/test/score datasets.&lt;/P&gt;
&lt;P&gt;However, if oversampling is used, are those values biased? If so, should they not be adjusted for oversampling or is it valid/correct to use them as they are because those are the records also used for fitting the model?&lt;/P&gt;</description>
      <pubDate>Sun, 24 May 2020 18:16:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Values-used-for-imputation-of-missing-values/m-p/650202#M815</guid>
      <dc:creator>pvareschi</dc:creator>
      <dc:date>2020-05-24T18:16:49Z</dc:date>
    </item>
    <item>
      <title>Re: Values used for imputation of missing values</title>
      <link>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Values-used-for-imputation-of-missing-values/m-p/650513#M818</link>
      <description>&lt;P&gt;Taking as an example the process flow on page 4-20 of the course text, my understanding is that values imputed (e.g. means or medians) are calculated based on the training dataset and used on the validation/test/score datasets.&lt;/P&gt;
&lt;P&gt;However, if oversampling is used, are those values biased? If so, should they not be adjusted for oversampling or is it valid/correct to use them as they are because those are the records also used for fitting the model?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000FF"&gt;&lt;STRONG&gt;My response:&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000FF"&gt;&lt;STRONG&gt;Please see my previous responses related to oversampling: Oversampling doesn't interfere in model selection or model estimation. Only the estimated posterior probabilities needs adjustment (Intercept shift) based on Prior.&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000FF"&gt;&lt;STRONG&gt;Therefore,if missing values are imputed based on training data mean or median (a constant) there is no need to adjust for over sampling.&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000FF"&gt;&lt;STRONG&gt;There are several non-constant missing value imputation methods are also available (Tree based methods, weighted regression method (Huber, Tukey) in SAS Enterprise miner and users can easily test these methods and pick the suitable ones based on their data.&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 25 May 2020 19:35:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Values-used-for-imputation-of-missing-values/m-p/650513#M818</guid>
      <dc:creator>gcjfernandez</dc:creator>
      <dc:date>2020-05-25T19:35:50Z</dc:date>
    </item>
  </channel>
</rss>

