Solved: Running GBM in SAS eminer

rbhadra · Posted 12-14-2017 01:10 AM

Hi,

I am very new to SAS Eminer and trying to figure out how to Run the GBM node. The diagram I have created looks like this

In both the data sets I have set the target variable which is a continuous variable. In the gbm node I have set the options as follows

I have the set the score options as follows

This is the gbm results I am getting when I run the diagram. Here none of the nodes have any importance value which kind of tells that the gbm is not running properly. I have uploaded the model output as well

This is the SAS code being generated in the score node

*------------------------------------------------------------*;
* EM SCORE CODE;
*------------------------------------------------------------*;
*------------------------------------------------------------*;
* TOOL: Input Data Source;
* TYPE: SAMPLE;
* NODE: Ids;
*------------------------------------------------------------*;
*------------------------------------------------------------*;
* TOOL: Extension Class;
* TYPE: MODEL;
* NODE: Boost;
*------------------------------------------------------------*;
****************************************************************;
****** DECISION TREE SCORING CODE ******;
****************************************************************;

****** LENGTHS OF NEW CHARACTER VARIABLES ******;
LENGTH _WARN_ $ 4;

****** LABELS FOR NEW VARIABLES ******;
label P_SALES_QTY_MODEL = 'Predicted: SALES_QTY_MODEL' ;
P_SALES_QTY_MODEL = 2.2083471984;
label _WARN_ = 'Warnings' ;

 


****************************************************************;
****** END OF DECISION TREE SCORING CODE ******;
****************************************************************;
*------------------------------------------------------------*;
* TOOL: Score Node;
* TYPE: ASSESS;
* NODE: Score;
*------------------------------------------------------------*;
*------------------------------------------------------------*;
* Score: Creating Fixed Names;
*------------------------------------------------------------*;
LABEL EM_PREDICTION= "Prediction for SALES_QTY_MODEL";
EM_PREDICTION = P_SALES_QTY_MODEL;

It would be great if anyone can help me out in understanding where I am going wrong

WendyCzika · Posted 12-18-2017 02:06 PM

First think I'd suggest trying is using a smaller value for the Leaf Fraction property in order to get some splits in your model. There are also different options for handling missing values, so you can try different options for the Missing Values property. If you have much more than 20K observations in your training data, you can also try increasing the Node Sample property.

View solution in original post

WendyCzika · Posted 12-18-2017 02:06 PM

First think I'd suggest trying is using a smaller value for the Leaf Fraction property in order to get some splits in your model. There are also different options for handling missing values, so you can try different options for the Missing Values property. If you have much more than 20K observations in your training data, you can also try increasing the Node Sample property.

DougWielenga · Posted 01-05-2018 10:51 AM

If you were using a binary target, I would have suggested you look at Usage Note 47965 which describes some approaches to modeling a rare target level (http://support.sas.com/kb/47/965.html) which is often the reason why a default Tree or Gradient Boosting model does not find any splits. Since the target is interval, consider using some of the following approaches to identifying possible issues:

1 - Review the log - sometimes the log explains why splits were not found for certain variables

2 - Run a Variable Selection node to identify which if any of the variables appear to be useful for splitting

NOTE: The Variable Selection node generates grouped versions of the nominal/ordinal variables, and it optionally can create grouped versions of interval variables.

3 - Evaluate the distribution/relationship between the target variable and the grouped and ungrouped versions of the variables identified as important to determine if there are any potential concerns (e.g. is the target heavily skewed? are there too many missing values in the input? are there too many class levels for a nominal variable? is there too little support for the target in large ranges of the interval variable? etc...)

4 - Run a Decision Tree against the data to see if it generates any splits, and review the results to see if there are any apparent issues that might point to a cause.

5 - Create a binary target out of your interval target by creating a threshold (e.g. flag those observations that are in the top 10% of the data) as "Yes" and the rest as "No". You can then try the approaches given in Usage Note 47965 for a binary target.

Hope this helps!

Cordially,

Doug

Running GBM in SAS eminer

Re: Running GBM in SAS eminer

Re: Running GBM in SAS eminer

Re: Running GBM in SAS eminer