I have several things I'm trying to do.
(1) Create a predictive model based on all variables available (75 total) for a binary outcome. I have a lot of missing data that I was told not to impute. To my understanding decision trees and random forests handle missing data well and will still be able to produce a decent prediction model. However, I am using SAS University, which does not seem to support HPFOREST. Is there an alternative?
ERROR: Procedure HPFOREST not found.
(2) Build a logistic regression prediction model based on subset of participants who contain most variable information (>95%). The problem I run into is:
WARNING: There is a complete separation of data points in Step 2. The maximum likelihood estimate does not exist.
WARNING: The LOGISTIC procedure continues in spite of the above warning. Results shown are based on the last maximum likelihood
iteration. Validity of the model fit is questionable.
I assume this is a quasi-separation issue. However, firth does not work with selection procedures. Are there other ways to remedy this? Or would it better to go through the purposeful selection steps individually?
I read that reducing explanatory variables may help, which loops back to HPFOREST. I'd like to use a random forest to narrow down my variable candidates for the logistic model.
(3) Build a logistic regression prediction model with all participants (230) and variables with at least 90% of information.
Data Information:
N = 230 total participants
n = 115 participants with at least 95% variables filled
75 Total variables of interest
Subset data created from:
DATA CLEANED.CompleteCases95; set CLEANED.FilteredAnalytic; if cmiss (of _ALL_)/75 <= 0.05; *don't count visit_date or id; RUN; *Total rows: 115, Total columns: 77;
I am not set on using random forest. Any technique that handles large amount of missingness well will do. Thank you in advance!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.