BookmarkSubscribeRSS Feed
munitech4u
Quartz | Level 8

I am trying gradient boosting on a dataset of size 3.4m with event rate of 1.2%. I ran with following parameters:

 

Max depth:10

surrogate rules:2

leaf fraction:0.05

 

But it fails to produce any result. blank output, no variable importance.


gradient_boost.jpg
11 REPLIES 11
rayIII
SAS Employee

Hi, can you please check that your flow is treating your target as binary, as opposed to interval?  From your output, it looks like the target is interval, which doesn't seem to fit with your problem description.

 

The results you are seeing can be reproduced with randomly-generated x and y variables, so most likely none of your predictors is related to target. 

 

Ray

munitech4u
Quartz | Level 8

Ok, I changed the target variable to binary, It produced the lift chart, but no variable importance or tree? Is it even meaningful?


gradient_boost1.jpg
rayIII
SAS Employee

Believe it or not you are now a little closer. 🙂

 

If you build a decision tree, do you actually get a tree or just a root node?

 

Your event level is quite rare. This is very common in predictive modeling, but  you will probably need to use sampling techniques like oversampling in order to get a good model. 

 

Here's a really good article (see the section 'inadequate or excessive input data') that talks through sampling and adjusting your target profile in Enterprise Miner. 

 

http://www2.sas.com/proceedings/forum2007/073-2007.pdf

 

Also, since this topic comes up from time to time, take a look at some of the other Community posts on rare events. 

 

I hope this helps, 

 

Ray

munitech4u
Quartz | Level 8

yes, the decision tree gives the tree with specified depth. Not sure, why gradient boosting fails to produce.

rayIII
SAS Employee

Try adding a Decisions (Assess menu) node to your flow between your input data source and your boosting node.

 

Use the following property settings: 

 

Apply Decisions = Yes

Decisions = Custom

 

In the custom editor, specify Yes in the Decisions tab.

 

Assuming your target is binary with 1 representing the event value, specify a matrix like this in the Decision Weights tab: 

 

100   0 

0       1

 

Then press OK rerun your flow. 

 

This will change the probability cutoff for predicting 0 and 1. You will know it worked if you see two rows for 'profit' in your boosting node results. If you don't, then the decision matrix wasn't actually picked up and you will need to check Decisions your settings. 

 

Remember to also check out the article mentioned above for some background. 

 

You can also tweak some of the GB properties. The 'traditional favorites' are N iterations and Shrinkage. 

munitech4u
Quartz | Level 8

well, I added the decisions node, but it doesn't seem to change the results. It shows the lift chart, but nothing much. How do I access the model accuracy? It doesn't even show the variable importance. But I remember, we can access the variable importance and ROC values in R and Python packages.


gradient_boost1.jpg
munitech4u
Quartz | Level 8

Attaching the log

JasonXin
SAS Employee
Hi, "Surrogate Rule=" interacts with Missing Value option in either DT or the GB node. When surrogate rule =0 (default in the GB node), Missing value's default kicks in, which is "Use in Search" for both DT and the GB nodes. "Use in Search" does not mean carrying the missing value forward. Instead, it imputes it, implicitly and legitimately, with rules like "grouping it with the branch that maximizes the worth of split". Now, when you set surrogate rule=2, in both DT and GB nodes, you tell EM to override the "Missing Value" default (="Use in Search"). When surrogating kicks in (data driven, no guarantee it will), AND it cannot find non-missing surrogates up to 2 levels, you get a missing value assigned to the non-leaf node (in the 'middle stream'). That missing status in DT does not create problem. If the DT gives you a terminal node that has missing value in it, so what? In (most) GB cases where max branches are relatively large (assuming other stopping rules do not stop spliting prematurely), say 100, a non-leaf node born at, say, 7th branch/level, carrying missing value, may get great chance to be re-surrogated-2 successfully at depth 78 for example (although carrying missing values down the path from branch 7 all the way down to 78 may very well be deemed as analytically unacceptable). However, the chance (see you only give us max depth=10) is the missing value gets to the very bottom layer of the trees. Again, this does not pose problem for DT. However, GB needs to build loss function to re-iterate. How are we supposed to build a loss function with missing input? We don't put rules in GB engine to tell it to drop the OBS due to missing loss function (that is sample violation). So the GB halts without a model. The software developer, however, cannot flag it as error because there is no error from design and quality of the product perspectives. Because you told it to surrogate 2. Surrogating should be applied often when the depth is deep where the surrogates are most similar to the missing spot for better info quality, or you have a great 'visual' of the surrounding of the missing. Surrogating rule= is retained today due to historical usage of GB. Historically GB was often used for universes with transparent /small research data, not industrial or noise as we often have today. --Jason Xin
munitech4u
Quartz | Level 8

Are you suggesting that, I should try by removing the surrogate rules? Well, that was what I had tried initiallly

 

Or are you suggesting that Gradient boosting should not be used with high dimensionality data which might have lot of noise.

 

The only time it worked on my data was, when I had around 30-40 variables and I had oversampled, as my event rate was about 1.22%.

But Some people say, that boosting should work.

JasonXin
SAS Employee
Hi (I am sorry for delayed response. I don't come here often) I was not suggesting anything. I was just trying to help explain a symptom you described in your original post. I would concur boosting should work on over-sampled data. However, use frequency or weighting carefully, in building and judging the resulting model: it is unusual that one uses over-sampling without somehow 'putting it back', for various reasons. It is not obvious to many people that going from surrogating under regular decision tree modeling to surrogating under GB is actually a leap-forward, a leap that requires some fundamentally different expectations for surrogating. When building a regular DT, you are looking into a painted vivid tree graph, while GB kind of modeling where surrogating happens is where trees are not supposed to be 'SEEN'. In other words, in GB (and in random foresting), you are supposed to run trees in a 'black box'. It is meaningless to "can I take a look at one of trees?" in GB or RF , unlike how you normally should approach a regular DT . Against this general backdrop, how are we supposed to engage surrogating? This is one of the dim light on machine learning methods: you don't have diagnostic tools and facilities. We may have some new versions in the future on GB.
munitech4u
Quartz | Level 8

well, I am having trouble again running it :

 

Now the target variable is different with 4 classes: 0,1,2,3. With 0 being the reference.

 

Not sure, why it is not producing any output. I have attached the log.

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 2794 views
  • 2 likes
  • 3 in conversation