turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- scorecard node and model selection options

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-26-2015 03:50 PM

Hi,

I am trying to find information about how the model selction is performed in the scorecard node in miner when the 'SELECTION CRITERION' is not 'DEFAULT', but 'AIC' for example. I dont beleive that sas miner is running all possible combinations of the predictors and choosing the model with the smallest AIC value. Does any body know where can i read about what is happening ?

Thanks

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-28-2015 09:05 AM

This is the value used for the CHOOSE= option in the MODEL statement of PROC DMREG. It is the criterion used to select among the models that are created during the different steps of the model selection; so for example for Forward selection, each of the models created when adding an effect during the forward selection process is evaluated, and the procedure selects the one that is the best in terms of the criterion specified.

Hope that helps,

Wendy

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-29-2015 07:39 AM

Thanks Wendy.

I have checked the output log and I have a better understandign now. So what the node does is to add variables with significant coefficients and stops when adding an extra coefficient is not optimal according with the model selection criteria chosen. Does the variable's IV decide the order they enter into this loop? I mean, is the highest IV the first to be tested after the intercpt?

what I think is not optimal is that it might be possible to find a model as 'good' as the one chosen by the scorecard node, by adding variables in different order than highest IVs. Hence, different variables but same performance. Or it is alse possible to find a model 'as good as' the one suggested by the node, but with fewer parametrs, i.e. less complex.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-29-2015 08:43 AM

Effects are entered into the model based on the most significant p-value from the score chi-square statistic. The process is repeated until none of the remaining effects meet the specified level for entry or until the STOP= value is reached. Then the criterion you are asking about is used to select which step in the selection process is used for the final model. So you should see something like this in the Output window for the DMREG procedure (but note that it won't always be the final step that is selected):

** Summary of Forward Selection**

** Akaike**

** Effect Number Score Information**

** Step Entered DF In Chi-Square Pr > ChiSq Criterion**

** 1 WOE_PROF 1 1 2273.7985 <.0001 58378.2**

** 2 WOE_STATUS 1 2 1475.0792 <.0001 56842.9**

** 3 WOE_TMJOB1 1 3 842.4397 <.0001 56022.4**

** 4 WOE_TMADD 1 4 274.8227 <.0001 55752.0**

**The selected model, based on the Akaike information criterion, is the model trained in Step 4. It consists of the following effects:**

**Intercept WOE_STATUS WOE_TMJOB1 WOE_PROF WOE_TMADD**