Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

EM Gradient Boosting unable to produce a model

Accepted Solution Solved
Reply
Contributor
Posts: 57
Accepted Solution

EM Gradient Boosting unable to produce a model

I ran Gradient Boosting in EM and only get single value prediction, p_sales = 0.8, but Regression, NN, and DT all produce a model. What goes wrong?


Accepted Solutions
Solution
2 weeks ago
SAS Employee
Posts: 121

Re: EM Gradient Boosting unable to produce a model

[ Edited ]

Open the Gradient Boosting node results and click on

 

     View --> SAS Results --> Log

 

to view the log from your Gradient Boosting node and look for notes similar to the following:
 
...
NOTE: Will not search for split on variable A.
NOTE: Too few acceptable cases.
NOTE: Option MINCATSIZE=5 may apply.
NOTE: Will not search for split on variable B.
NOTE: Too few acceptable cases.
NOTE: Option MINCATSIZE=5 may apply
....
 
We have seen that message appear when some of the samples had an insufficient number of events and non-events and Gradient Boosting was unable to iterate.  Sampling is used at different points to determine split values, and then the model is fit to the whole data set.   If there are not enough events, then SAS Enterprise Miner cannot determine where the splits should occur.
 
Are other models able to run such as a regression model or decision tree?  If so, examine your regression results to see whether there are many near-zero standard errors.  Some customers have the opposite problem - infinite standard errors.  For more information about this problem, please review  
 
   Usage Note 22599: Understanding and correcting complete or quasi-complete separation problems
   http://support.sas.com/kb/22/599.html
 
Another possibility is that there are too many missing values in the data, or that the missing values are distributed in such a way that no splits can be found with the existing settings.  Some options, depending on what you think is appropriate, might include:
    - examining your imputed data to see whether it looks like you think it should look
    - in the Gradient Boosting node, try:
    - lowering the Minimum Categorical Size property to 2 or 3
    - changing the Missing Values property

 

I hope this helps!

Doug

View solution in original post


All Replies
Super Contributor
Posts: 336

Re: EM Gradient Boosting unable to produce a model

Hey Aha,

What does the subseries plot for the Gradient Boosting node look like?

When you say Reg, NN, DT produce a model, how good of a model?

You might need to tweak the splitting rule or node options of the Gradient Boosting. Are you using the defaults? What have you moved around?

I hope it helps,

Miguel

Contributor
Posts: 57

Re: EM Gradient Boosting unable to produce a model

Subseries plot just contains one point.

Reg, NN, DT models all look normal and good.

I used teh defaults.

Contributor
Posts: 57

Re: EM Gradient Boosting unable to produce a model

After I changed Leaf Fraction from 0.1 to 0.05, it was able to produce a model instead of a single value prediction. I see top important variables have high # of splitting rules in the Variable Importance window like 98. Will this change cause overfitting?

Super Contributor
Posts: 336

Re: EM Gradient Boosting unable to produce a model

Great job on tweaking your Gradient Boosting model!

The best way to confirm there is no overfitting is to take a look at the subseries plot. If the selected model is at a low point, no overfitting is expected.

Contributor
Posts: 57

Re: EM Gradient Boosting unable to produce a model

Can you elaborate on what you mean by "the selected model is at a low point"?

Super Contributor
Posts: 336

Re: EM Gradient Boosting unable to produce a model

In my humble experience with real data I found it difficult to get overtrained models with gradient boosting. My guess is that the stochastic gradient descent function somehow prevent your model from getting overtrained the way you can overtrain a decision tree or other models.

What I run into a lot of times is something like the below. Increasing the number of steps does not necessarily mean lower ASE or misclassification. Adding steps do not necessarily get you a better model, but you are not getting a worse one either.

Look at the ASE of the training, it looks like it is going lower and lower, while in reality the ASE of the validation is not decreasing as you add iteration steps.

trytoovertrainGB.png

By "selecting a model at a low point" I mean that a good model seems to be in the region before it starts decreasing into a monotonical line. In my personal opinion it does not seem that it hurts the model to add a large number of iteration steps, but I haven't seen big gains either.

Now that you got your GB node to split, experiment a bit with and without a Partition node. Although you are not getting exactly the same models you should see a similar behavior.

Gradient Boosting models take a while to run but I have found that they really pay off as they are robust in the presence of new data, and they are also well suited for data with rare target events.

I hope it helps,

Miguel

Contributor
Posts: 57

Re: EM Gradient Boosting unable to produce a model

Thanks for the clarification!

I am not familiar with all theories behind Gradient Boosting. Can I create several gradient boosting nodes, twist parameters for each one and pick the best model? Will this way really give me the best model? Will using ensemble node to combine these GB nodes produce an even better model?

Super Contributor
Posts: 336

Re: EM Gradient Boosting unable to produce a model

A great book that can catch you up in decision tree ensembles (including boosting and gradient boosting) is Decision Trees for Analytics Using SAS® Enterprise Miner™. In just a few pages you get most info you need on tree ensembles. By far my favorite SAS book of 2013!

Ensembles will give you a better model as long as the models are discordant, since probabilites get averaged. There is one example in this paper (http://support.sas.com/resources/papers/proceedings14/SAS133-2014.pdf) that talks about discordant models into a better ensemble model. Discordance is mentioned in the Decision Tree book too. Unfortunately there is no way to know if the ensemble will help, but running it.

Sounds like you have made good progress with your EM diagram. Keep up the good work!

Best,

Miguel

Regular Contributor
Posts: 188

Re: EM Gradient Boosting unable to produce a model

Even I am facing the same problem. Gradient boosting node is not able to produce any model, no plots,no variable importance. Though it ran successfully.

 

I changed the following in default options:

Max Depth: 10

Surrogate rule: 2

 

 

The decision tree is running fine, but not gradient boosting.

Solution
2 weeks ago
SAS Employee
Posts: 121

Re: EM Gradient Boosting unable to produce a model

[ Edited ]

Open the Gradient Boosting node results and click on

 

     View --> SAS Results --> Log

 

to view the log from your Gradient Boosting node and look for notes similar to the following:
 
...
NOTE: Will not search for split on variable A.
NOTE: Too few acceptable cases.
NOTE: Option MINCATSIZE=5 may apply.
NOTE: Will not search for split on variable B.
NOTE: Too few acceptable cases.
NOTE: Option MINCATSIZE=5 may apply
....
 
We have seen that message appear when some of the samples had an insufficient number of events and non-events and Gradient Boosting was unable to iterate.  Sampling is used at different points to determine split values, and then the model is fit to the whole data set.   If there are not enough events, then SAS Enterprise Miner cannot determine where the splits should occur.
 
Are other models able to run such as a regression model or decision tree?  If so, examine your regression results to see whether there are many near-zero standard errors.  Some customers have the opposite problem - infinite standard errors.  For more information about this problem, please review  
 
   Usage Note 22599: Understanding and correcting complete or quasi-complete separation problems
   http://support.sas.com/kb/22/599.html
 
Another possibility is that there are too many missing values in the data, or that the missing values are distributed in such a way that no splits can be found with the existing settings.  Some options, depending on what you think is appropriate, might include:
    - examining your imputed data to see whether it looks like you think it should look
    - in the Gradient Boosting node, try:
    - lowering the Minimum Categorical Size property to 2 or 3
    - changing the Missing Values property

 

I hope this helps!

Doug

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 1921 views
  • 1 like
  • 4 in conversation