<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Missing/Not Applicable Values for Interval Variable in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Missing-Not-Applicable-Values-for-Interval-Variable/m-p/312869#M4705</link>
    <description>Hi Jason, &lt;BR /&gt;&lt;BR /&gt;Thanks for your reply. I am using EM for this project and my target variable is actually interval, meaning I would be using linear regression, glms or NN. &lt;BR /&gt;&lt;BR /&gt;Would this method work for interval targets? Another method I thought of is to convert AgeOfChild and ChildIsMarried into categorical variables, with the level "NA" for people without children.</description>
    <pubDate>Sun, 20 Nov 2016 08:23:02 GMT</pubDate>
    <dc:creator>reterberb</dc:creator>
    <dc:date>2016-11-20T08:23:02Z</dc:date>
    <item>
      <title>Missing/Not Applicable Values for Interval Variable</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Missing-Not-Applicable-Values-for-Interval-Variable/m-p/312250#M4686</link>
      <description>&lt;P&gt;Hello,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Suppose I have a dataset containing input variables&amp;nbsp;like this:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;(Binary) &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;(Interval) &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; (Binary)&lt;/P&gt;
&lt;P&gt;HaveAChild &amp;nbsp; &amp;nbsp; &amp;nbsp;AgeOfChild &amp;nbsp; &amp;nbsp; ChildIsMarried&lt;/P&gt;
&lt;P&gt;1 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;12 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1&lt;/P&gt;
&lt;P&gt;0 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; . &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;0 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; . &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;1 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;20 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 0&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;1 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;11 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;0 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; . &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;In my predictive modelling, I would like to make use of models such as regression or neural networks, which require complete cases.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;However, the AgeOfChild and ChildIsMarried variables are missing for observations where HaveAChild=0, which is expected since there is no child to begin with.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In this case, how can I handle these missing values without discarding them, considering that imputation&amp;nbsp;wouldn't really make sense (e.g. not having a child but having a child age).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 17 Nov 2016 09:27:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Missing-Not-Applicable-Values-for-Interval-Variable/m-p/312250#M4686</guid>
      <dc:creator>reterberb</dc:creator>
      <dc:date>2016-11-17T09:27:38Z</dc:date>
    </item>
    <item>
      <title>Re: Missing/Not Applicable Values for Interval Variable</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Missing-Not-Applicable-Values-for-Interval-Variable/m-p/312265#M4687</link>
      <description>&lt;P&gt;Then you should drop these HaveAChild=0, since these obs don't mean anything .&lt;/P&gt;</description>
      <pubDate>Thu, 17 Nov 2016 10:21:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Missing-Not-Applicable-Values-for-Interval-Variable/m-p/312265#M4687</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2016-11-17T10:21:04Z</dc:date>
    </item>
    <item>
      <title>Re: Missing/Not Applicable Values for Interval Variable</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Missing-Not-Applicable-Values-for-Interval-Variable/m-p/312268#M4688</link>
      <description>Unfortunately in my scenario, these cases with no child are still important, as there are other input variables which do not depend on whether HaveAChild=1 or 0.&lt;BR /&gt;&lt;BR /&gt;Furthermore, I would want my model to be able to score cases with no child in the future. &lt;BR /&gt;&lt;BR /&gt; Is there any other alternative?</description>
      <pubDate>Thu, 17 Nov 2016 10:28:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Missing-Not-Applicable-Values-for-Interval-Variable/m-p/312268#M4688</guid>
      <dc:creator>reterberb</dc:creator>
      <dc:date>2016-11-17T10:28:11Z</dc:date>
    </item>
    <item>
      <title>Re: Missing/Not Applicable Values for Interval Variable</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Missing-Not-Applicable-Values-for-Interval-Variable/m-p/312270#M4689</link>
      <description>&lt;P&gt;Then I think you should drop variable&amp;nbsp;&lt;SPAN&gt;&amp;nbsp;AgeOfChild &amp;nbsp; &amp;nbsp;,since this variable is not valid for all obs .&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 17 Nov 2016 10:32:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Missing-Not-Applicable-Values-for-Interval-Variable/m-p/312270#M4689</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2016-11-17T10:32:59Z</dc:date>
    </item>
    <item>
      <title>Re: Missing/Not Applicable Values for Interval Variable</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Missing-Not-Applicable-Values-for-Interval-Variable/m-p/312815#M4703</link>
      <description>Hi, 

If the missing values fall exactly along the line of 1 and 0, then simple imputation does not work, since they will run into total or quasi separation. They will be and should be rejected by logistic regression or NN downright. 

It is, however, not entirely hopeless, besides the option to drop them. If you do not have EM, and have STAT, take a look into proc MI. You may need to build your final models by the group of values MI plugs in for you. If you have license for EM, under Impute Node, take a look at the Distribution option. In some cases, the Tree option may work but depending other variables, it is possible that you still may not be able to reduce the risk of 'quasi seperation'. Tree Imputation should be the secondary option to try after Distribution. Given that your target=1 typically is very small proportionately, make sure the distribution of non-missing is large, 'normal' or sensible enough for you. 

Hope this help? Thank you for using SAS. 

Jason Xin</description>
      <pubDate>Sat, 19 Nov 2016 16:46:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Missing-Not-Applicable-Values-for-Interval-Variable/m-p/312815#M4703</guid>
      <dc:creator>JasonXin</dc:creator>
      <dc:date>2016-11-19T16:46:32Z</dc:date>
    </item>
    <item>
      <title>Re: Missing/Not Applicable Values for Interval Variable</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Missing-Not-Applicable-Values-for-Interval-Variable/m-p/312869#M4705</link>
      <description>Hi Jason, &lt;BR /&gt;&lt;BR /&gt;Thanks for your reply. I am using EM for this project and my target variable is actually interval, meaning I would be using linear regression, glms or NN. &lt;BR /&gt;&lt;BR /&gt;Would this method work for interval targets? Another method I thought of is to convert AgeOfChild and ChildIsMarried into categorical variables, with the level "NA" for people without children.</description>
      <pubDate>Sun, 20 Nov 2016 08:23:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Missing-Not-Applicable-Values-for-Interval-Variable/m-p/312869#M4705</guid>
      <dc:creator>reterberb</dc:creator>
      <dc:date>2016-11-20T08:23:02Z</dc:date>
    </item>
    <item>
      <title>Re: Missing/Not Applicable Values for Interval Variable</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Missing-Not-Applicable-Values-for-Interval-Variable/m-p/313084#M4715</link>
      <description>Hi, 

Yes, Distribution AND Tree should both work. You can try and tell the difference. Tree method is more informativeness friendly while distribution method remains univariate essentially. Please pay attention to the distribution inside the non-missing subgroups +the % size of the non-missing. For argument sake, if you only have 1% non-missing, I am hard-pressed to do it. 

Converting to 'flags': this idea is always intriguing, in the sense that the resulting indicators by definition are associated with the sourcing element. In the linear regression context, classically we 'stay away' from categorical variable, almost by instinct. But facilities in EM or SAS STAT are equally robust supporting categorical variables, in variable selection and estimation, by way of, say, the CLASS statement. The chance is if you derive indicator, you can only use one of them, if it is useful after all. You could use decision tree in EM to run a test. Make sure all the performance reading is off validation data set. 

Best Regards
Jason Xin</description>
      <pubDate>Mon, 21 Nov 2016 14:42:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Missing-Not-Applicable-Values-for-Interval-Variable/m-p/313084#M4715</guid>
      <dc:creator>JasonXin</dc:creator>
      <dc:date>2016-11-21T14:42:16Z</dc:date>
    </item>
  </channel>
</rss>

