BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mike90
Quartz | Level 8

Outside of the "inest=" issue, there is a lot of good information in your post.   Thanks for posting.

 

Mike90
Quartz | Level 8

The usage note does not suggest building a model on a small sample, and then just using that.  It gives a technique for speeding up building a logistic regression model using the full data set.

 

From http://support.sas.com/kb/22/607.html

Usage Note 22607: Preventing excessive time or memory use by PROC LOGISTIC

 

Initial parameters too far from final parameters  When the initial parameters are far from the final parameters, the procedure may need many iterations to reach the solution. While this generally isn't a problem when the model has a small number of parameters and the data set is not large, the time needed can become large when this is not true. You may be able to reduce the number of iterations needed, and therefore the time to fit the model, by using the following strategy: Fit the desired model using a relatively small, random subset of the data. Make sure that all levels of all CLASS variables appear in the subset. Use the OUTEST= option to save the parameter estimates. Using a small, random subset of the data that can be held in memory should require little time and allow you to get good starting values. Now you can run PROC LOGISTIC on your full data set with the INEST= option to use the saved estimates as starting values. By doing this, you are hopefully starting close to the solution so that fewer iterations will be necessary. Note that it is not possible to know in advance the number of iterations that will be needed to find the solution for any given data set and model.

DougWielenga
SAS Employee

Short answer -- there is not a way to do this in SAS Enterprise Miner unless you want to write the code to call the LOGISTIC procedure itself in a SAS Code node.

 

It is true that starting with parameters closer to optimal values when such values exist might lead to a solution in fewer iterations, but the problem you are attempting to solve is not (in my experience) an issue in SAS Enterprise Miner when it used as intended.   As you have already found, you are able to build far superior models (in this case) using other more flexible modeling strategies.  I have also seen situations where regression models have done as well or better than far more flexible models. 

 

In either case, your question was how to do this in SAS Enterprise Miner, and my answer was to try and explain why...

... the problem addressed by the Usage Note should not be an issue in SAS Enterprise Miner

... if the problem occurs, there are better ways to address it in SAS Enterprise Miner

... for this reason and others (see discussion about scoring, assessment, etc...) there is no functionality to pass the parameters from one Regression node to a subsequent Regression node

... the technique proposed for the LOGISTIC procedure is problematic for the data mining data sets that SAS Enterprise Miner was designed for since there is a much higher likelihood of quasi-separation when the number of levels of a categorical variable increase, particularly when such variables are considered in interactions.

 

Hope this helps!

Doug

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 17 replies
  • 4269 views
  • 0 likes
  • 4 in conversation