Statistical measures of model performance are based on both model error and degrees of freedom. Some SAS Enterprise Miner models, such as Decision Tree models, do not output degrees of freedom and are not suitable for benchmarking using the statistical measures listed here. The information that follows pertains to Mallows’ Cq, Akaike’s Information Criterion, Bayesian Information Criterion, and Kolmogorov-Smirnov Statistic and is suitable only for specific models.
Mallow’s Cq
Mallows' Cq (Hosmer and Lemeshow, 2000) is a variant of Mallows Cp measure (1973), which can be used to analyze linear regression models for assessment. Hosmer and Lemeshow derived the corresponding Cq statistic to evaluate logistic regression models, using the following equation
where
is the Pearson chi-square statistic for the model with p variables
The expected value of Cq is q + 1. Models with Cq values near q + 1 are candidates for final models.
In general, data mining problems have massive numbers of variables which leads to a high likelihood of missing values. Given the typical data size, several things are often true of these problems:
* missing values must be imputed
* the imputed data will have a large number of observations
* the number of usable observations will be inflated by the imputation
* the presence of imputed data makes many of the classical estimates of error more questionable.
* holdout data is present to validate/test the fitted model (empirical validation, statistical validation less critical)
For these reasons, classical statistical scenarios such as those appropriate for treatment by Hosmer-Lemeshow are not routinely calculated in SAS Enterprise Miner.
I hope this helps!
Doug
Statistical measures of model performance are based on both model error and degrees of freedom. Some SAS Enterprise Miner models, such as Decision Tree models, do not output degrees of freedom and are not suitable for benchmarking using the statistical measures listed here. The information that follows pertains to Mallows’ Cq, Akaike’s Information Criterion, Bayesian Information Criterion, and Kolmogorov-Smirnov Statistic and is suitable only for specific models.
Mallow’s Cq
Mallows' Cq (Hosmer and Lemeshow, 2000) is a variant of Mallows Cp measure (1973), which can be used to analyze linear regression models for assessment. Hosmer and Lemeshow derived the corresponding Cq statistic to evaluate logistic regression models, using the following equation
where
is the Pearson chi-square statistic for the model with p variables
The expected value of Cq is q + 1. Models with Cq values near q + 1 are candidates for final models.
In general, data mining problems have massive numbers of variables which leads to a high likelihood of missing values. Given the typical data size, several things are often true of these problems:
* missing values must be imputed
* the imputed data will have a large number of observations
* the number of usable observations will be inflated by the imputation
* the presence of imputed data makes many of the classical estimates of error more questionable.
* holdout data is present to validate/test the fitted model (empirical validation, statistical validation less critical)
For these reasons, classical statistical scenarios such as those appropriate for treatment by Hosmer-Lemeshow are not routinely calculated in SAS Enterprise Miner.
I hope this helps!
Doug
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.