Solved: Alternative to HPFOREST in SAS University (data contains missing obser...

amarikow57 · Posted 03-06-2021 03:30 PM

I have several things I'm trying to do.

(1) Create a predictive model based on all variables available (75 total) for a binary outcome. I have a lot of missing data that I was told not to impute. To my understanding decision trees and random forests handle missing data well and will still be able to produce a decent prediction model. However, I am using SAS University, which does not seem to support HPFOREST. Is there an alternative?

 ERROR: Procedure HPFOREST not found.

(2) Build a logistic regression prediction model based on subset of participants who contain most variable information (>95%). The problem I run into is:

 WARNING: There is a complete separation of data points in Step 2. The maximum likelihood estimate does not exist.
 WARNING: The LOGISTIC procedure continues in spite of the above warning. Results shown are based on the last maximum likelihood 
          iteration. Validity of the model fit is questionable.

I assume this is a quasi-separation issue. However, firth does not work with selection procedures. Are there other ways to remedy this? Or would it better to go through the purposeful selection steps individually?

I read that reducing explanatory variables may help, which loops back to HPFOREST. I'd like to use a random forest to narrow down my variable candidates for the logistic model.

(3) Build a logistic regression prediction model with all participants (230) and variables with at least 90% of information.

Data Information:

N = 230 total participants

n = 115 participants with at least 95% variables filled

75 Total variables of interest

Subset data created from:

DATA CLEANED.CompleteCases95;
 set CLEANED.FilteredAnalytic;
 if cmiss (of _ALL_)/75 <= 0.05; *don't count visit_date or id;
RUN; *Total rows: 115, Total columns: 77;

I am not set on using random forest. Any technique that handles large amount of missingness well will do. Thank you in advance!

gcjfernandez · Posted 03-08-2021 07:07 PM

Please do a variable selection , optimal binning of interval inputs and then try Gradient Boosting. Finally compare the performance with the Decision Tree model.

View solution in original post

gcjfernandez · Posted 03-08-2021 07:07 PM

Please do a variable selection , optimal binning of interval inputs and then try Gradient Boosting. Finally compare the performance with the Decision Tree model.

Alternative to HPFOREST in SAS University (data contains missing observations)

Re: Alternative to HPFOREST in SAS University (data contains missing observations)

Re: Alternative to HPFOREST in SAS University (data contains missing observations)

Alternative to HPFOREST in SAS University (data contains missing observations)

Re: Alternative to HPFOREST in SAS University (data contains missing observations)

Re: Alternative to HPFOREST in SAS University (data contains missing observations)

SAS Innovate 2025: Save the Date