Solved: Data imputation before or after variable transformation

pvareschi · Posted 05-24-2020 02:39 PM

Re: Predictive Modeling Using Logistic Regression

At page 3-59 of the course text, variable transformation is suggested as a way of accounting for nonlinear relationship between input and output. However, the way the topics (and related SAS logic steps) are presented in the course, imputation of missing values is done in an earlier step (as part of the data preparation stage). On the other hand, throughout course "Applied Analytics Using SAS Enterprise Miner" it is emphasized that data imputation should be done after transforming variables (see page 4-53 of the course text): which way is the most appropriate or is either approach valid?

gcjfernandez · Posted 05-25-2020 03:22 PM

Re: Predictive Modeling Using Logistic Regression

At page 3-59 of the course text, variable transformation is suggested as a way of accounting for nonlinear relationship between input and output. However, the way the topics (and related SAS logic steps) are presented in the course, imputation of missing values is done in an earlier step (as part of the data preparation stage). On the other hand, throughout course "Applied Analytics Using SAS Enterprise Miner" it is emphasized that data imputation should be done after transforming variables (see page 4-53 of the course text): which way is the most appropriate or is either approach valid?

My response:

The following are best practice steps related to fitting regression models:

Out of the following three pre-processing steps (re_coding categorical levels, interval input transformation and missing value imputation) before regression modeling, the missing value imputation step is the most significant step. That is why it is introduced first in the AAEM training in Ch4.
Also we recommend that the missing value imputation step must be the last step before fitting the regression model

View solution in original post

gcjfernandez · Posted 05-25-2020 03:22 PM

Re: Predictive Modeling Using Logistic Regression

At page 3-59 of the course text, variable transformation is suggested as a way of accounting for nonlinear relationship between input and output. However, the way the topics (and related SAS logic steps) are presented in the course, imputation of missing values is done in an earlier step (as part of the data preparation stage). On the other hand, throughout course "Applied Analytics Using SAS Enterprise Miner" it is emphasized that data imputation should be done after transforming variables (see page 4-53 of the course text): which way is the most appropriate or is either approach valid?

My response:

The following are best practice steps related to fitting regression models:

Out of the following three pre-processing steps (re_coding categorical levels, interval input transformation and missing value imputation) before regression modeling, the missing value imputation step is the most significant step. That is why it is introduced first in the AAEM training in Ch4.
Also we recommend that the missing value imputation step must be the last step before fitting the regression model

Data imputation before or after variable transformation

Re: Data imputation before or after variable transformation

Re: Data imputation before or after variable transformation

Data imputation before or after variable transformation

Re: Data imputation before or after variable transformation

Re: Data imputation before or after variable transformation

SAS Training: Just a Click Away