Your questions are largely addressed in the SAS Enterprise Miner help utility. You can access this utility by opening SAS Enterprise Miner and clicking on Help --> Contents and then navigating in the panel on the left to
Node Reference
Model
Regression Node
From there, click on the link to Regression Node Model Selection Criteria which shares the following information:
Validation Error — chooses the model that has the smallest error rate for the validation data set. For logistic regression models, the error is the negative log-likelihood. For linear regression, the error is the error sum of squares (SSE). This option is grayed out if a validation predecessor data set is not input to the Regression node.
In reality, you can select the option but it is ignored. So in your first case, there is no validation data set present so it would just use the Stepwise options which appear in the section just above the Regression Node Model Selection Criteria where it says
Stepwise — As in the Forward method, Stepwise selection begins, by default, with no candidate effects in the model and then systematically adds effects that are significantly associated with the target. However, after an effect is added to the model, Stepwise may remove any effect already in the model that is not significantly associated with the target.
This stepwise process continues until one of the following occurs:
No other effect in the model meets the Stay Significance Level.
The Max Steps criterion is met. If you choose the Stepwise selection method, then you can specify a Max Steps to put a limit on the number of steps before the effect selection process stops. The default value is set to the number of effects in the model. If you add interactions via the Interaction Builder, the Max Steps is automatically updated to include these terms.
An effect added in one step is the only effect deleted in the next step.
In your second scenario, you have a Validation data set so the partition created by the Data Partition node will be scored and assessed on the model trained in each step of the stepwise selection process. The selected model is identified in the Output window and will read something like the following:
The selected model, based on the error rate for the validation data, is the model trained in Step 3. It consists of the following effects:
If there is no validation data set, it would provide something like the following indicating the last model was selected:
The selected model is the model trained in the last step (Step 7). It consists of the following effects:
For your final scenario, you would need to review the help for the Model Comparison node which can be accessed by clicking on Help --> Contents and then navigating in the panel on the left to
Node Reference
Assess Nodes
Model Comparison Node
and then clicking on Model Comparison Node Train Properties: Model Selection Properties which has a great deal of possible outcomes depending on the settings you choose. Here is an excerpt from the help utility:
Selection Statistic — Use the Selection Statistic property of the Model Comparison node to specify the fit statistic that you want to use to select the model. Depending on the availability, different fit statistics are used.
When Selection Statistic is set to DEFAULT, the average profit statistic from the validation data (_VAPROF_) is used for model selection. If the _VAPROF_ statistic is not present, the average loss statistic from the validation data (_VALOSS_)is used.
If no validation data set is present, the associate training statistic for average profit (_APROF_) or average loss (_ALOSS_) is used.
If no Selection Statistic is specified, the proportion of misclassified data in the validation data set (_VMISC_) is used for model selection. If the _VMISC_ statistic is not present, the average squared error statistic from the validation data set (_VASE_) is used. If no validation data set is present, the associate training statistic for misclassified data (_MISC_) or average squared error (_ASE_) is used.
I hope this helps!
Doug
... View more