06-27-2016 05:51 AM
Dear all ,
I would like to test the strength of the explanatory variables in the regression model.
In other words, I would like to test if one will always get the same variables for the model and the same level of importance.
How can I examine it?
06-28-2016 11:24 AM
If you are using Enterprise Miner, you could try running your logistic regression flow using different sampling seeds and eyeball the selected variables in the regression node results. That's the "low-tech" approach.
If you want more confidence, look into the Group Processing nodes (Start Groups and End Groups) in EM. They allow you to repeatedly run a flow "i" times. You would use a different sample (with replacement) at each iteration and would need to accumulate the selected variables across iterations. The end result would be a frequency distribution of selected variables.
This site has some tips on how to use the group processing nodes.
I hope this helps.
06-29-2016 02:12 AM
Dear Ray ,
Thank you so much for your replay ,
I know that there is a solution with start end node.
I tried it, but have not found the way to see there selected choice of the independent variable every iteration.
Also, I didnt understand the difference between bagging and boosting at the start Group node.
Any help will be very much appreciated.
06-29-2016 09:27 AM
Take a look at Index mode (not boosting or bagging). This tip may help: it is not exactly what you are trying to do but it uses Index mode to iterative over the data and accumulate data for a chart.
I see a couple of datasets on the SAS server (reg_effects, reg_outterms) that appear to contain the selected inputs for a particular run. You'll probably need a SAS code to accumulate the selected effects across iterations. Proc Append may come in handy.
Once you get a simple flow working with Index mode it should be straightforward to customize it for your purpose.