BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
munitech4u
Quartz | Level 8

I have a dataset that has event rate like:

 

'1':  5.14%

'2': 2.92%

'3': 3.68%

 

I used decision nodes with inverse priors. Random forest and neural network are able to produce the output, but not Gradient boosting.

 

I changed the default to: shrinkage:0.01, leaf fraction: 0.05, depth: 5, branches: 3

 

But still no luck.

1 ACCEPTED SOLUTION

Accepted Solutions
DougWielenga
SAS Employee

You appear to have some extremely small categories.  Open the Gradient Boosting node results and click on

 

     View --> SAS Results --> Log

 

to view the log from your Gradient Boosting node and look for notes similar to the following:
 
...
NOTE: Will not search for split on variable A.
NOTE: Too few acceptable cases.
NOTE: Option MINCATSIZE=5 may apply.
NOTE: Will not search for split on variable B.
NOTE: Too few acceptable cases.
NOTE: Option MINCATSIZE=5 may apply
....
 
We have seen that message appear when some of the samples had an insufficient number of events and non-events and Gradient Boosting was unable to iterate.  Sampling is used at different points to determine split values, and then the model is fit to the whole data set.   If there are not enough events, then SAS Enterprise Miner cannot determine where the splits should occur.
 
You mention that other models are able to run - examine your regression results to see whether there are many near-zero standard errors.  Those values support the suggestion that there is not enough variability in the covariates with respect to the number of events and non-events.  Some customers have the opposite problem - infinite standard errors.  For more information about this problem, please review  
 
   Usage Note 22599: Understanding and correcting complete or quasi-complete separation problems
   http://support.sas.com/kb/22/599.html
 
Another possibility is that there are too many missing values in the data, or that the missing values are distributed in such a way that no splits can be found with the existing settings.  Some options, depending on what you think is appropriate, might include:
    - lowering the Minimum Categorical Size property to 2 or 3
    - changing the Missing Values property

    - lowering the Leaf Fraction property value

 

I hope this helps!

Doug

View solution in original post

1 REPLY 1
DougWielenga
SAS Employee

You appear to have some extremely small categories.  Open the Gradient Boosting node results and click on

 

     View --> SAS Results --> Log

 

to view the log from your Gradient Boosting node and look for notes similar to the following:
 
...
NOTE: Will not search for split on variable A.
NOTE: Too few acceptable cases.
NOTE: Option MINCATSIZE=5 may apply.
NOTE: Will not search for split on variable B.
NOTE: Too few acceptable cases.
NOTE: Option MINCATSIZE=5 may apply
....
 
We have seen that message appear when some of the samples had an insufficient number of events and non-events and Gradient Boosting was unable to iterate.  Sampling is used at different points to determine split values, and then the model is fit to the whole data set.   If there are not enough events, then SAS Enterprise Miner cannot determine where the splits should occur.
 
You mention that other models are able to run - examine your regression results to see whether there are many near-zero standard errors.  Those values support the suggestion that there is not enough variability in the covariates with respect to the number of events and non-events.  Some customers have the opposite problem - infinite standard errors.  For more information about this problem, please review  
 
   Usage Note 22599: Understanding and correcting complete or quasi-complete separation problems
   http://support.sas.com/kb/22/599.html
 
Another possibility is that there are too many missing values in the data, or that the missing values are distributed in such a way that no splits can be found with the existing settings.  Some options, depending on what you think is appropriate, might include:
    - lowering the Minimum Categorical Size property to 2 or 3
    - changing the Missing Values property

    - lowering the Leaf Fraction property value

 

I hope this helps!

Doug

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1455 views
  • 0 likes
  • 2 in conversation