<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Alternative to HPFOREST in SAS University (data contains missing observations) in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Alternative-to-HPFOREST-in-SAS-University-data-contains-missing/m-p/724225#M8599</link>
    <description>&lt;P&gt;I have several things I'm trying to do.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;(1)&lt;/STRONG&gt; Create a predictive model based on all variables available (75 total) for a binary outcome. I have a lot of missing data that I was &lt;U&gt;told not to impute&lt;/U&gt;. To my understanding decision trees and random forests handle missing data well and will still be able to produce a decent prediction model. However, I am using SAS University, which does not seem to support HPFOREST. Is there an alternative?&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt; ERROR: Procedure HPFOREST not found.&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;STRONG&gt;(2)&lt;/STRONG&gt; Build a logistic regression prediction model based on subset of participants who contain most variable information (&amp;gt;95%). The problem I run into is:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt; WARNING: There is a complete separation of data points in Step 2. The maximum likelihood estimate does not exist.
 WARNING: The LOGISTIC procedure continues in spite of the above warning. Results shown are based on the last maximum likelihood 
          iteration. Validity of the model fit is questionable.&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;I assume this is a quasi-separation issue. However, firth does not work with selection procedures. Are there other ways to remedy this? Or would it better to go through the purposeful selection steps individually?&lt;/P&gt;
&lt;P&gt;I read that reducing explanatory variables may help, which loops back to HPFOREST. I'd like to use a random forest to narrow down my variable candidates for the logistic model.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;(3)&lt;/STRONG&gt; Build a logistic regression prediction model with all participants (230) and variables with at least 90% of information.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Data Information:&amp;nbsp;&lt;/P&gt;
&lt;P&gt;N = 230 total participants&lt;/P&gt;
&lt;P&gt;n = 115 participants with at least 95% variables filled&lt;/P&gt;
&lt;P&gt;75 Total variables of interest&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Subset data created from:&lt;/P&gt;
&lt;PRE&gt;DATA CLEANED.CompleteCases95;
 set CLEANED.FilteredAnalytic;
 if cmiss (of _ALL_)/75 &amp;lt;= 0.05; *don't count visit_date or id;
RUN; *Total rows: 115, Total columns: 77;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am not set on using random forest. Any technique that handles large amount of missingness well will do. Thank you in advance!&lt;/P&gt;</description>
    <pubDate>Sat, 06 Mar 2021 20:30:33 GMT</pubDate>
    <dc:creator>amarikow57</dc:creator>
    <dc:date>2021-03-06T20:30:33Z</dc:date>
    <item>
      <title>Alternative to HPFOREST in SAS University (data contains missing observations)</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Alternative-to-HPFOREST-in-SAS-University-data-contains-missing/m-p/724225#M8599</link>
      <description>&lt;P&gt;I have several things I'm trying to do.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;(1)&lt;/STRONG&gt; Create a predictive model based on all variables available (75 total) for a binary outcome. I have a lot of missing data that I was &lt;U&gt;told not to impute&lt;/U&gt;. To my understanding decision trees and random forests handle missing data well and will still be able to produce a decent prediction model. However, I am using SAS University, which does not seem to support HPFOREST. Is there an alternative?&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt; ERROR: Procedure HPFOREST not found.&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;STRONG&gt;(2)&lt;/STRONG&gt; Build a logistic regression prediction model based on subset of participants who contain most variable information (&amp;gt;95%). The problem I run into is:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt; WARNING: There is a complete separation of data points in Step 2. The maximum likelihood estimate does not exist.
 WARNING: The LOGISTIC procedure continues in spite of the above warning. Results shown are based on the last maximum likelihood 
          iteration. Validity of the model fit is questionable.&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;I assume this is a quasi-separation issue. However, firth does not work with selection procedures. Are there other ways to remedy this? Or would it better to go through the purposeful selection steps individually?&lt;/P&gt;
&lt;P&gt;I read that reducing explanatory variables may help, which loops back to HPFOREST. I'd like to use a random forest to narrow down my variable candidates for the logistic model.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;(3)&lt;/STRONG&gt; Build a logistic regression prediction model with all participants (230) and variables with at least 90% of information.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Data Information:&amp;nbsp;&lt;/P&gt;
&lt;P&gt;N = 230 total participants&lt;/P&gt;
&lt;P&gt;n = 115 participants with at least 95% variables filled&lt;/P&gt;
&lt;P&gt;75 Total variables of interest&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Subset data created from:&lt;/P&gt;
&lt;PRE&gt;DATA CLEANED.CompleteCases95;
 set CLEANED.FilteredAnalytic;
 if cmiss (of _ALL_)/75 &amp;lt;= 0.05; *don't count visit_date or id;
RUN; *Total rows: 115, Total columns: 77;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am not set on using random forest. Any technique that handles large amount of missingness well will do. Thank you in advance!&lt;/P&gt;</description>
      <pubDate>Sat, 06 Mar 2021 20:30:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Alternative-to-HPFOREST-in-SAS-University-data-contains-missing/m-p/724225#M8599</guid>
      <dc:creator>amarikow57</dc:creator>
      <dc:date>2021-03-06T20:30:33Z</dc:date>
    </item>
    <item>
      <title>Re: Alternative to HPFOREST in SAS University (data contains missing observations)</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Alternative-to-HPFOREST-in-SAS-University-data-contains-missing/m-p/724710#M8604</link>
      <description>Please do a variable selection , optimal binning of interval inputs and then try Gradient Boosting. Finally compare the performance with the Decision Tree model.</description>
      <pubDate>Tue, 09 Mar 2021 00:07:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Alternative-to-HPFOREST-in-SAS-University-data-contains-missing/m-p/724710#M8604</guid>
      <dc:creator>gcjfernandez</dc:creator>
      <dc:date>2021-03-09T00:07:33Z</dc:date>
    </item>
  </channel>
</rss>

