Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

scorecard node and model selection options

Reply
Contributor
Posts: 21

scorecard node and model selection options

Hi,

I am trying to find information about how the model selction is performed in the scorecard node in miner when the 'SELECTION CRITERION' is not 'DEFAULT', but 'AIC' for example. I dont beleive that sas miner is running all possible combinations of the predictors and choosing the model with the smallest AIC value. Does any body know where can i read about what is happening ?

Thanks

SAS Super FREQ
Posts: 306

Re: scorecard node and model selection options

Posted in reply to rogelio_mancisidor

This is the value used for the CHOOSE= option in the MODEL statement of PROC DMREG.  It is the criterion used to select among the models that are created during the different steps of the model selection; so for example for Forward selection, each of the models created when adding an effect during the forward selection process is evaluated, and the procedure selects the one that is the best in terms of the criterion specified.

Hope that helps,

Wendy

Contributor
Posts: 21

Re: scorecard node and model selection options

Posted in reply to rogelio_mancisidor

Thanks Wendy.

I have checked the output log and I have a better understandign now. So what the node does is to add variables with significant coefficients and stops when adding an extra coefficient is not optimal according with the model selection criteria chosen. Does the variable's IV decide the order they enter into this loop? I mean, is the highest IV the first to be tested after the intercpt?

what I think is not optimal is that it might be possible to find a model as 'good' as the one chosen by the scorecard node, by adding variables in different order than highest IVs. Hence, different variables but same performance. Or it is alse possible to find a model 'as good as' the one suggested by the node, but with fewer parametrs, i.e. less complex.

SAS Super FREQ
Posts: 306

Re: scorecard node and model selection options

Posted in reply to rogelio_mancisidor

Effects are entered into the model based on the most significant p-value from the score chi-square statistic. The process is repeated until none of the remaining effects meet the specified level for entry or until the STOP= value is reached. Then the criterion you are asking about is used to select which step in the selection process is used for the final model.  So you should see something like this in the Output window for the DMREG procedure (but note that it won't always be the final step that is selected):

                            Summary of Forward Selection

                                                                                                 Akaike

            Effect                          Number         Score                         Information

    Step    Entered         DF        In    Chi-Square    Pr > ChiSq      Criterion

       1    WOE_PROF         1         1     2273.7985        <.0001        58378.2

       2    WOE_STATUS       1         2     1475.0792        <.0001        56842.9

       3    WOE_TMJOB1       1         3      842.4397        <.0001        56022.4

       4    WOE_TMADD        1         4      274.8227        <.0001        55752.0

 

The selected model, based on the Akaike information criterion, is the model trained in Step 4. It consists of the following effects:

Intercept  WOE_STATUS  WOE_TMJOB1  WOE_PROF  WOE_TMADD

Ask a Question
Discussion stats
  • 3 replies
  • 477 views
  • 0 likes
  • 2 in conversation