I am trying gradient boosting on a dataset of size 3.4m with event rate of 1.2%. I ran with following parameters:
Max depth:10
surrogate rules:2
leaf fraction:0.05
But it fails to produce any result. blank output, no variable importance.
Hi, can you please check that your flow is treating your target as binary, as opposed to interval? From your output, it looks like the target is interval, which doesn't seem to fit with your problem description.
The results you are seeing can be reproduced with randomly-generated x and y variables, so most likely none of your predictors is related to target.
Ray
Ok, I changed the target variable to binary, It produced the lift chart, but no variable importance or tree? Is it even meaningful?
Believe it or not you are now a little closer. 🙂
If you build a decision tree, do you actually get a tree or just a root node?
Your event level is quite rare. This is very common in predictive modeling, but you will probably need to use sampling techniques like oversampling in order to get a good model.
Here's a really good article (see the section 'inadequate or excessive input data') that talks through sampling and adjusting your target profile in Enterprise Miner.
http://www2.sas.com/proceedings/forum2007/073-2007.pdf
Also, since this topic comes up from time to time, take a look at some of the other Community posts on rare events.
I hope this helps,
Ray
yes, the decision tree gives the tree with specified depth. Not sure, why gradient boosting fails to produce.
Try adding a Decisions (Assess menu) node to your flow between your input data source and your boosting node.
Use the following property settings:
Apply Decisions = Yes
Decisions = Custom
In the custom editor, specify Yes in the Decisions tab.
Assuming your target is binary with 1 representing the event value, specify a matrix like this in the Decision Weights tab:
100 0
0 1
Then press OK rerun your flow.
This will change the probability cutoff for predicting 0 and 1. You will know it worked if you see two rows for 'profit' in your boosting node results. If you don't, then the decision matrix wasn't actually picked up and you will need to check Decisions your settings.
Remember to also check out the article mentioned above for some background.
You can also tweak some of the GB properties. The 'traditional favorites' are N iterations and Shrinkage.
well, I added the decisions node, but it doesn't seem to change the results. It shows the lift chart, but nothing much. How do I access the model accuracy? It doesn't even show the variable importance. But I remember, we can access the variable importance and ROC values in R and Python packages.
Attaching the log
Are you suggesting that, I should try by removing the surrogate rules? Well, that was what I had tried initiallly
Or are you suggesting that Gradient boosting should not be used with high dimensionality data which might have lot of noise.
The only time it worked on my data was, when I had around 30-40 variables and I had oversampled, as my event rate was about 1.22%.
But Some people say, that boosting should work.
well, I am having trouble again running it :
Now the target variable is different with 4 classes: 0,1,2,3. With 0 being the reference.
Not sure, why it is not producing any output. I have attached the log.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.