BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Shailesh2
Calcite | Level 5

Hi,

I am pretty new to Enterprise Miner and have been struggling a bit to understand why the Stepwise Regression procedure terminates after a variable gets dropped based on significance criteria. 

 

I had run the regression node with 88 variables with none having QCS at any point. Among different options in regression node, I have changed only the "selection model" to stepwise . I have not put any stop criteria and do not find a way to alter it either. However, the procedure terminates as soon a variable is dropped at any step. The remaining variables do not get chance to participate in this case. 

 

Can you pl help me in getting the node run without any stop criteria so that all the variables gets included in the stepwise method? 

 

Thanks in advance!

Shailesh

 

1 ACCEPTED SOLUTION

Accepted Solutions
DougWielenga
SAS Employee

Shailesh2,


The "Stepwise" selection option performs forward stepwise selection with optional backwards steps after each forward step.  The process continues until no variable meets the criteria for being added (via forward selection) or removed (via backward selection).  All variables not in the model are considered at each step.  As Wendy indicated, you can modify the default options but there are stopping criteria or it would not be a selection process at all.  Simply using the default Regression node options, for instance, would generate a model using every input variable as a main effect.  This is likely to yield a model with several unnecessary variables included which is why selection options are often considered.  It is not surprising selection ends with a backwards step (where a variable is removed) because it suggests the variable which has been removed does not substantially improve the fit over the simpler model with that variable dropped. The fact no other variables are brought into the model indicates that the remaining variables cannot improve the fit enough to warrant their inclusion in the model based on the criteria being used.    

 

You might also consider using the Variable Selection node prior to the Regression node.  This node allows you to bin interval variables allowing for more complex relationships between input and response variables, possibly improving over a model where interval inputs are treated only as numeric variables.  It also bins together groups of levels for each categorical variable based on their relationship to the response.  These transformations can produce a superior model to the original data.  You can still use stepwise selection in the modeling node as desired and/or investigate interactions among inputs which is more challenging when all 88 variables are considered.

 

Hope this helps!

Doug

 

View solution in original post

7 REPLIES 7
WendyCzika
SAS Employee

I'm not sure why that would be happening, but you can change the various criteria for the stepwise selection in the Regression node:

 

- You can change the Selection Criterion property

- Change the Use Selection Defaults property to No, then click on the ellipsis next to Selection Options to customize those values

Shailesh2
Calcite | Level 5

Hi Wendy,

Thanks for the response. However, when I do change selection defaults to "No" , I do not find suitable option which I may alter to keep it going. 

Let me know if you are suggesting any specific "selection option" for this.

 

Thanks<
Shailesh

DougWielenga
SAS Employee

Shailesh2,


The "Stepwise" selection option performs forward stepwise selection with optional backwards steps after each forward step.  The process continues until no variable meets the criteria for being added (via forward selection) or removed (via backward selection).  All variables not in the model are considered at each step.  As Wendy indicated, you can modify the default options but there are stopping criteria or it would not be a selection process at all.  Simply using the default Regression node options, for instance, would generate a model using every input variable as a main effect.  This is likely to yield a model with several unnecessary variables included which is why selection options are often considered.  It is not surprising selection ends with a backwards step (where a variable is removed) because it suggests the variable which has been removed does not substantially improve the fit over the simpler model with that variable dropped. The fact no other variables are brought into the model indicates that the remaining variables cannot improve the fit enough to warrant their inclusion in the model based on the criteria being used.    

 

You might also consider using the Variable Selection node prior to the Regression node.  This node allows you to bin interval variables allowing for more complex relationships between input and response variables, possibly improving over a model where interval inputs are treated only as numeric variables.  It also bins together groups of levels for each categorical variable based on their relationship to the response.  These transformations can produce a superior model to the original data.  You can still use stepwise selection in the modeling node as desired and/or investigate interactions among inputs which is more challenging when all 88 variables are considered.

 

Hope this helps!

Doug

 

Shailesh2
Calcite | Level 5

Hi Doug!

Thanks for your detailed response. 

1. When I run forward selection method, it is giving me the result w/o any note for model termination.

2. After doing variable selection which reduced the count of variables from 88 to 27 stepwise runs fine w/o getting terminated. 

 

However, I wanted to see coefficients of all the significant variables as a result of all the 88 input variables to understand their impact on the event through stepwise regression. And while I do it in EG, it works fine till end. So the issue still persists in EM:(

 

Why? : eg. if var1 would have given me max/min coeff then I would like to keep this var and then try out training model again with other vars getting added to the list . Idea was to have few (5-6) vars starting with var1 to finalize final set. 

 

 

Thanks,
Shailesh

DougWielenga
SAS Employee

Shailesh,

 

I'm still not sure I understand what you are doing in SAS Enterprise Guide and how it differs from what is happening in SAS Enterprise Miner. I'm not sure what you mean by model termination.  The purpose of doing selection is to avoid having to use all of the variables.  If you are trying to force certain variables into the model in SAS Enterprise Miner and then want the selection to proceed, you can follow the steps in

 

   Usage Note 24334: How to force a variable to be included in a Regression node model selection

 

which can be found at 

 

    http://support.sas.com/kb/24/334.html

 

If you want all of the effects to be fit, you can get a model with all main effects fit by default with specifying a selection method.  When you specify a selection method, it uses the criteria (default or the ones you specified) to determine how variables are added and/or removed from the model.   Using Forward selection, you will never 'remove' a variable from the model, but you are not likely to need all 88 variables.   

You should note that the p-values for a variable in a given step measure how significantly the variable is conditioned on the other effects currently in the model.  Adding additional effects can increase this significance, decrease the significance, or leave the significance unchanged.  

 

Additionally, the procedure you are using in SAS Enterprise Guide differs from that used in SAS Enterprise Miner.  For a binary target, you would be using the LOGISTIC procedure in SAS Enterprise Guide but the DMREG procedure underlies the Regression node in SAS Enterprise Miner.   The direct use of non-HP procedures is not supported by SAS Tech Support, but there is documentation available on request to licensed users of SAS Enterprise Miner.  Here is an excerpt of the documentation for DMREG which describes how it differs from LOGISTIC.

 

The DMREG and LOGISTIC procedures fit the same models for a categorical target. Both procedures have the CLASS statement to specify categorical input variables and both use the deviation from the mean coding as the default parameterization for a CLASS input variable. However, there are many differences between the two procedures, both in syntax and in features. For example, to specify the GLM parameterization of CLASS variables, you specify the MODEL statement option CODING= GLM in the DMREG procedure. But, in the LOGISTIC procedure , you specify the CLASS statement option PARAM= GLM. You are required to specify a DMDB catalog of input data in the DMREG procedure, but not in the LOGISTIC procedure. The DMREG procedure produces DATA step scoring code, but the LOGISTIC procedure does not. In terms of training a model, you might expect the estimates from both procedures to be identical. Often the estimates between the two procedures are very close but not necessarily identical for a number of reasons. The DMREG and LOGISTIC procedures do not use the same routines to carry out the optimization, and the convergence criterion and optimization technique used might not be the same. However, discrepancies of the parameter estimates between the two procedures would not make any difference in prediction.

 

Let me know what you think.

 

Cordially,
Doug

 

 

Shailesh2
Calcite | Level 5

Hi Doug,

I do not want to force a variable in the model. By termination I mean that it is not allowing any other variable (which can be induced as per codes on EG on same dataset). Whenever a variable is removed in a step then it terminates not allowing subsequent variables to participate. The message is " Note: Model building terminates because the last effect entered is removed by the Wald test criterion". 

Variable selection node can reduce the counts of variables but this issue is still there for those reduced count of vars. 

 

I am not sure what to change in the "selection options" in model selection settings or anywhere else in EM as none seems to be valid for stepwise.

I hope this brings more clarity to you. Let me know!

 

Thanks,
Shailesh

 

DougWielenga
SAS Employee

I do not want to force a variable in the model. By termination I mean that it is not allowing any other variable (which can be induced as per codes on EG on same dataset). Whenever a variable is removed in a step then it terminates not allowing subsequent variables to participate. The message is " Note: Model building terminates because the last effect entered is removed by the Wald test criterion". 

 

Are you using the Regression node in SAS Enterprise Miner or are you running code?  I don't see the Wald Criterion as an option in the SAS Enterpise Miner Regression node.   Can you post the code you are running or attach the log file from the node that you are running when you get this message? 

 

Thanks!

Doug

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 5587 views
  • 0 likes
  • 3 in conversation