BookmarkSubscribeRSS Feed
rogelio_mancisidor
Calcite | Level 5

Hi,

I am trying to find information about how the model selction is performed in the scorecard node in miner when the 'SELECTION CRITERION' is not 'DEFAULT', but 'AIC' for example. I dont beleive that sas miner is running all possible combinations of the predictors and choosing the model with the smallest AIC value. Does any body know where can i read about what is happening ?

Thanks

3 REPLIES 3
WendyCzika
SAS Employee

This is the value used for the CHOOSE= option in the MODEL statement of PROC DMREG.  It is the criterion used to select among the models that are created during the different steps of the model selection; so for example for Forward selection, each of the models created when adding an effect during the forward selection process is evaluated, and the procedure selects the one that is the best in terms of the criterion specified.

Hope that helps,

Wendy

rogelio_mancisidor
Calcite | Level 5

Thanks Wendy.

I have checked the output log and I have a better understandign now. So what the node does is to add variables with significant coefficients and stops when adding an extra coefficient is not optimal according with the model selection criteria chosen. Does the variable's IV decide the order they enter into this loop? I mean, is the highest IV the first to be tested after the intercpt?

what I think is not optimal is that it might be possible to find a model as 'good' as the one chosen by the scorecard node, by adding variables in different order than highest IVs. Hence, different variables but same performance. Or it is alse possible to find a model 'as good as' the one suggested by the node, but with fewer parametrs, i.e. less complex.

WendyCzika
SAS Employee

Effects are entered into the model based on the most significant p-value from the score chi-square statistic. The process is repeated until none of the remaining effects meet the specified level for entry or until the STOP= value is reached. Then the criterion you are asking about is used to select which step in the selection process is used for the final model.  So you should see something like this in the Output window for the DMREG procedure (but note that it won't always be the final step that is selected):

                            Summary of Forward Selection

                                                                                                 Akaike

            Effect                          Number         Score                         Information

    Step    Entered         DF        In    Chi-Square    Pr > ChiSq      Criterion

       1    WOE_PROF         1         1     2273.7985        <.0001        58378.2

       2    WOE_STATUS       1         2     1475.0792        <.0001        56842.9

       3    WOE_TMJOB1       1         3      842.4397        <.0001        56022.4

       4    WOE_TMADD        1         4      274.8227        <.0001        55752.0

 

The selected model, based on the Akaike information criterion, is the model trained in Step 4. It consists of the following effects:

Intercept  WOE_STATUS  WOE_TMJOB1  WOE_PROF  WOE_TMADD

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1237 views
  • 0 likes
  • 2 in conversation