<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Handling missing values in volume prediction models in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Handling-missing-values-in-volume-prediction-models/m-p/647102#M31062</link>
    <description>&lt;P&gt;I was curious of the best ways to handle/model missing values when all values for certain states are missing. I am trying to predict purchase volume based on customer, and I have customers in all 50 states. There are a few variables in which i only have data for in about half of the states, but i want to use other independent variables that are available in all states as well.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My 2 initial thoughts were to 1) include a "state_has_data" indicator and use that in the model, or 2) create a model that estimates the volume for the states that do have data, and use that prediction when there are missing values.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Are there better ways of handling this?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I've included an example below of something similar to what i'm trying to accomplish using the sashelp.cars data set. In the example, "x1" where origin='Asia' would be equivalent to a state with missing values for the independent variable. I've also added a regression procedure at the end to help add context. I am still exploring other regression procedures as well (e.g. proc COUNTREG, proc GLM, etc.). Also, I only have SAS E.G., without miner or other modeling 'add-ons'.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc sql;&lt;BR /&gt;create table have as&lt;BR /&gt;select &lt;BR /&gt;make&lt;BR /&gt;,Origin&lt;BR /&gt;,avg(MSRP) as x1&lt;BR /&gt;,avg(Horsepower) as x2&lt;BR /&gt;,avg(MPG_Highway) as x3&lt;BR /&gt;,count(1) as y&lt;BR /&gt;from&lt;BR /&gt;(select &lt;BR /&gt;make&lt;BR /&gt;,Origin&lt;BR /&gt;,case when Origin = 'Asia' then . else MSRP end as MSRP&lt;BR /&gt;,Horsepower&lt;BR /&gt;,MPG_Highway&lt;BR /&gt;from sashelp.cars&lt;BR /&gt;)&lt;BR /&gt;group by make&lt;BR /&gt;,Origin&lt;BR /&gt;order by Origin&lt;BR /&gt;;quit;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;proc reg data=have;&lt;BR /&gt;model y = x1-x3;&lt;BR /&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 12 May 2020 14:17:59 GMT</pubDate>
    <dc:creator>triley</dc:creator>
    <dc:date>2020-05-12T14:17:59Z</dc:date>
    <item>
      <title>Handling missing values in volume prediction models</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Handling-missing-values-in-volume-prediction-models/m-p/647102#M31062</link>
      <description>&lt;P&gt;I was curious of the best ways to handle/model missing values when all values for certain states are missing. I am trying to predict purchase volume based on customer, and I have customers in all 50 states. There are a few variables in which i only have data for in about half of the states, but i want to use other independent variables that are available in all states as well.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My 2 initial thoughts were to 1) include a "state_has_data" indicator and use that in the model, or 2) create a model that estimates the volume for the states that do have data, and use that prediction when there are missing values.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Are there better ways of handling this?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I've included an example below of something similar to what i'm trying to accomplish using the sashelp.cars data set. In the example, "x1" where origin='Asia' would be equivalent to a state with missing values for the independent variable. I've also added a regression procedure at the end to help add context. I am still exploring other regression procedures as well (e.g. proc COUNTREG, proc GLM, etc.). Also, I only have SAS E.G., without miner or other modeling 'add-ons'.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc sql;&lt;BR /&gt;create table have as&lt;BR /&gt;select &lt;BR /&gt;make&lt;BR /&gt;,Origin&lt;BR /&gt;,avg(MSRP) as x1&lt;BR /&gt;,avg(Horsepower) as x2&lt;BR /&gt;,avg(MPG_Highway) as x3&lt;BR /&gt;,count(1) as y&lt;BR /&gt;from&lt;BR /&gt;(select &lt;BR /&gt;make&lt;BR /&gt;,Origin&lt;BR /&gt;,case when Origin = 'Asia' then . else MSRP end as MSRP&lt;BR /&gt;,Horsepower&lt;BR /&gt;,MPG_Highway&lt;BR /&gt;from sashelp.cars&lt;BR /&gt;)&lt;BR /&gt;group by make&lt;BR /&gt;,Origin&lt;BR /&gt;order by Origin&lt;BR /&gt;;quit;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;proc reg data=have;&lt;BR /&gt;model y = x1-x3;&lt;BR /&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 12 May 2020 14:17:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Handling-missing-values-in-volume-prediction-models/m-p/647102#M31062</guid>
      <dc:creator>triley</dc:creator>
      <dc:date>2020-05-12T14:17:59Z</dc:date>
    </item>
    <item>
      <title>Re: Handling missing values in volume prediction models</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Handling-missing-values-in-volume-prediction-models/m-p/647115#M31064</link>
      <description>&lt;P&gt;Multiple imputation, as can be done in PROC MI, is one way of dealing with this sort of situation.&lt;/P&gt;</description>
      <pubDate>Tue, 12 May 2020 14:56:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Handling-missing-values-in-volume-prediction-models/m-p/647115#M31064</guid>
      <dc:creator>StatDave</dc:creator>
      <dc:date>2020-05-12T14:56:29Z</dc:date>
    </item>
    <item>
      <title>Re: Handling missing values in volume prediction models</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Handling-missing-values-in-volume-prediction-models/m-p/647119#M31065</link>
      <description>&lt;P&gt;Thanks. I am trying to see if there is anything besides basic imputation (should have been more clear in the original question). I would like to almost treat the prediction separately based if the state has the data or not (i.e. have a model for states with the extra variables and one for the states without type of thing).&lt;/P&gt;</description>
      <pubDate>Tue, 12 May 2020 15:01:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Handling-missing-values-in-volume-prediction-models/m-p/647119#M31065</guid>
      <dc:creator>triley</dc:creator>
      <dc:date>2020-05-12T15:01:39Z</dc:date>
    </item>
  </channel>
</rss>

