topic Re: Clarification of statement on variables screening in SAS Academy for Data Science

Clarification of statement on variables screening

pvareschi — Sat, 30 May 2020 08:54:54 GMT

Re: Predictive Modeling Using Logistic Regression

Just to confirm my understanding of the following statement at the bottom of page 3.47 of the course text: "Very liberal univariate screening might be helpful when the number of clusters created in PROC VARCLUS is still relatively large".

Does "liberal univariate screening" mean that it is better to err on the side of allowing more inputs through the screening and then rely on regression selection techniques to find the best predictors?

Re: Clarification of statement on variables screening

sasmlp — Mon, 01 Jun 2020 17:05:05 GMT

If you plan to use the best subset selection method in PROC LOGISTIC, you need to get the number of predictor variables down to around 50 or else the CPU time will be fairly excessive. Therefore, if the number of clusters obtained by PROC VARCLUS is much greater than 50, then further screening methods such as the Spearman and Hoeffding correlation statistics could be used to further reduce the number of predictor variables available to PROC LOGISTIC. Very liberal univariate screening simply means reducing the number of variables down to a reasonable number.