07-26-2015 03:50 PM
I am trying to find information about how the model selction is performed in the scorecard node in miner when the 'SELECTION CRITERION' is not 'DEFAULT', but 'AIC' for example. I dont beleive that sas miner is running all possible combinations of the predictors and choosing the model with the smallest AIC value. Does any body know where can i read about what is happening ?
07-28-2015 09:05 AM
This is the value used for the CHOOSE= option in the MODEL statement of PROC DMREG. It is the criterion used to select among the models that are created during the different steps of the model selection; so for example for Forward selection, each of the models created when adding an effect during the forward selection process is evaluated, and the procedure selects the one that is the best in terms of the criterion specified.
Hope that helps,
07-29-2015 07:39 AM
I have checked the output log and I have a better understandign now. So what the node does is to add variables with significant coefficients and stops when adding an extra coefficient is not optimal according with the model selection criteria chosen. Does the variable's IV decide the order they enter into this loop? I mean, is the highest IV the first to be tested after the intercpt?
what I think is not optimal is that it might be possible to find a model as 'good' as the one chosen by the scorecard node, by adding variables in different order than highest IVs. Hence, different variables but same performance. Or it is alse possible to find a model 'as good as' the one suggested by the node, but with fewer parametrs, i.e. less complex.
07-29-2015 08:43 AM
Effects are entered into the model based on the most significant p-value from the score chi-square statistic. The process is repeated until none of the remaining effects meet the specified level for entry or until the STOP= value is reached. Then the criterion you are asking about is used to select which step in the selection process is used for the final model. So you should see something like this in the Output window for the DMREG procedure (but note that it won't always be the final step that is selected):
Summary of Forward Selection
Effect Number Score Information
Step Entered DF In Chi-Square Pr > ChiSq Criterion
1 WOE_PROF 1 1 2273.7985 <.0001 58378.2
2 WOE_STATUS 1 2 1475.0792 <.0001 56842.9
3 WOE_TMJOB1 1 3 842.4397 <.0001 56022.4
4 WOE_TMADD 1 4 274.8227 <.0001 55752.0
The selected model, based on the Akaike information criterion, is the model trained in Step 4. It consists of the following effects:
Intercept WOE_STATUS WOE_TMJOB1 WOE_PROF WOE_TMADD