Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

How to calculate VIF using SAS Enterprise Miner?

Accepted Solution Solved
Reply
N/A
Posts: 1
Accepted Solution

How to calculate VIF using SAS Enterprise Miner?

I am trying to see multicollinearity while building predictive model. I am having difficulty in figure out how to find that out using SAS Enterprise Miner.

Thanks in advance.


Accepted Solutions
Solution
‎07-07-2017 02:06 PM
SAS Employee
Posts: 120

Re: How to calculate VIF using SAS Enterprise Miner?

These metrics are not generated by Enterprise Miner.  Enterprise Miner is designed for processing large data sets with a large number of variables for which it would be impractical to evaluate these and other typical regression diagnostics.  Additionally, the number of variables and observations involved often accompany a nontrivial number of missing values.  Note that even when you have only one missing values for a variable, you cannot use the entire observation.  I was once given a data set with 25,000 observations (small by data mining standards) for which there were only 25 complete observations.  

 

For this reason, data mining methods like regression that require complete data need to have missing values imputed before modeling. This imputation artificially creates data (for 24,975 out of 25,000 observations in my example above) which necessarily calls many of the classical regression diagnostics into question, because you must now question the error estimates which means that all of the statistical tests, confidence limits, and most of the diagnostics are also called into question.  For this reason, many of these classical statistics are not produced by SAS Enterprise Miner.   

 

Thankfully, the presence of a large number of observations means that you will typically have holdout data to validate the model empirically.   Rather than relying on statistical assumptions, you can break your data into two or three representative samples.  A model that works on both the training data and the validation and/or test data can be trusted even when multicollinearity is present.   

View solution in original post


All Replies
PROC Star
Posts: 7,360

Re: How to calculate VIF using SAS Enterprise Miner?

Solution
‎07-07-2017 02:06 PM
SAS Employee
Posts: 120

Re: How to calculate VIF using SAS Enterprise Miner?

These metrics are not generated by Enterprise Miner.  Enterprise Miner is designed for processing large data sets with a large number of variables for which it would be impractical to evaluate these and other typical regression diagnostics.  Additionally, the number of variables and observations involved often accompany a nontrivial number of missing values.  Note that even when you have only one missing values for a variable, you cannot use the entire observation.  I was once given a data set with 25,000 observations (small by data mining standards) for which there were only 25 complete observations.  

 

For this reason, data mining methods like regression that require complete data need to have missing values imputed before modeling. This imputation artificially creates data (for 24,975 out of 25,000 observations in my example above) which necessarily calls many of the classical regression diagnostics into question, because you must now question the error estimates which means that all of the statistical tests, confidence limits, and most of the diagnostics are also called into question.  For this reason, many of these classical statistics are not produced by SAS Enterprise Miner.   

 

Thankfully, the presence of a large number of observations means that you will typically have holdout data to validate the model empirically.   Rather than relying on statistical assumptions, you can break your data into two or three representative samples.  A model that works on both the training data and the validation and/or test data can be trusted even when multicollinearity is present.   

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 1898 views
  • 0 likes
  • 3 in conversation