<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Cluster-mean imputation in SAS Academy for Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Cluster-mean-imputation/m-p/651926#M831</link>
    <description>&lt;P&gt;Re: Predictive Modeling Using Logistic Regression&lt;/P&gt;
&lt;P&gt;In cluster-mean imputation (page 3-11 and appendix B-7 of course text), should the variables used to define the clusters be restricted to those with missing values or could include all of them (i.e. with and without missing values)?&lt;/P&gt;
&lt;P&gt;Moreover, would it be possible to clarify statement at the bottom of page 3.11 of course text: “A simpler but sometimes useful alternative is to define a priori segments (for example, high, middle, low and unknown income) and then do mean or median imputation within each segment”.&lt;/P&gt;
&lt;P&gt;Not sure I understand the benefits of creating the above segments; however, I understand how the example shown on page 3.12 works: is the wording of page 3.11 correct?&lt;/P&gt;</description>
    <pubDate>Sat, 30 May 2020 05:06:51 GMT</pubDate>
    <dc:creator>pvareschi</dc:creator>
    <dc:date>2020-05-30T05:06:51Z</dc:date>
    <item>
      <title>Cluster-mean imputation</title>
      <link>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Cluster-mean-imputation/m-p/651926#M831</link>
      <description>&lt;P&gt;Re: Predictive Modeling Using Logistic Regression&lt;/P&gt;
&lt;P&gt;In cluster-mean imputation (page 3-11 and appendix B-7 of course text), should the variables used to define the clusters be restricted to those with missing values or could include all of them (i.e. with and without missing values)?&lt;/P&gt;
&lt;P&gt;Moreover, would it be possible to clarify statement at the bottom of page 3.11 of course text: “A simpler but sometimes useful alternative is to define a priori segments (for example, high, middle, low and unknown income) and then do mean or median imputation within each segment”.&lt;/P&gt;
&lt;P&gt;Not sure I understand the benefits of creating the above segments; however, I understand how the example shown on page 3.12 works: is the wording of page 3.11 correct?&lt;/P&gt;</description>
      <pubDate>Sat, 30 May 2020 05:06:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Cluster-mean-imputation/m-p/651926#M831</guid>
      <dc:creator>pvareschi</dc:creator>
      <dc:date>2020-05-30T05:06:51Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster-mean imputation</title>
      <link>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Cluster-mean-imputation/m-p/652311#M856</link>
      <description>&lt;P&gt;PROC FASTCLUS can be used to replace the missing values with the cluster means from the training data set. The data is split into training and validation data sets, and PROC FASTCLUS is used to compute the cluster means on the training data set and save the cluster means on an output data set. Then you use PROC FASTCLUS again to replace the missing values from the validation data set with the cluster means from the training data set. You use all the variables to define your clusters (even variables with missing values). Use the IMPUTE option which requests imputation of missing values after the final assignment of observations to clusters.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you use PROC STDIZE without a BY statement, then you use unconditional imputation. Sometimes using a priori segments and using conditional imputation is helpful. For example, suppose income is related to education. Instead of using an unconditional mean imputation of income, maybe a conditional mean imputation of income based on education (high, middle, and low) might give you more reasonable values for income based on education. This might improve the predictive accuracy of the model, especially if income is related to the target.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 01 Jun 2020 17:26:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Cluster-mean-imputation/m-p/652311#M856</guid>
      <dc:creator>sasmlp</dc:creator>
      <dc:date>2020-06-01T17:26:39Z</dc:date>
    </item>
  </channel>
</rss>

