BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rbhadra
Calcite | Level 5

Hi,

 

I am very new to SAS Eminer and trying to figure out how to Run the GBM node. The diagram I have created looks like this

 

diagram.PNG

 

In both the data sets I have set the target variable which is a continuous variable. In the gbm node I have set the options as follows

 

gbm_2.PNGgbm_1.PNG

 

I have the set the score options as follows

 

score.PNG

 

This is the gbm results I am getting when I run the diagram. Here none of the nodes have any importance value which kind of tells that the gbm is not running properly. I have uploaded the model output as well

 

results_gbm.PNG

 

This is the SAS code being generated in the score node

*------------------------------------------------------------*;
* EM SCORE CODE;
*------------------------------------------------------------*;
*------------------------------------------------------------*;
* TOOL: Input Data Source;
* TYPE: SAMPLE;
* NODE: Ids;
*------------------------------------------------------------*;
*------------------------------------------------------------*;
* TOOL: Extension Class;
* TYPE: MODEL;
* NODE: Boost;
*------------------------------------------------------------*;
****************************************************************;
****** DECISION TREE SCORING CODE ******;
****************************************************************;

****** LENGTHS OF NEW CHARACTER VARIABLES ******;
LENGTH _WARN_ $ 4;

****** LABELS FOR NEW VARIABLES ******;
label P_SALES_QTY_MODEL = 'Predicted: SALES_QTY_MODEL' ;
P_SALES_QTY_MODEL = 2.2083471984;
label _WARN_ = 'Warnings' ;

 


****************************************************************;
****** END OF DECISION TREE SCORING CODE ******;
****************************************************************;
*------------------------------------------------------------*;
* TOOL: Score Node;
* TYPE: ASSESS;
* NODE: Score;
*------------------------------------------------------------*;
*------------------------------------------------------------*;
* Score: Creating Fixed Names;
*------------------------------------------------------------*;
LABEL EM_PREDICTION= "Prediction for SALES_QTY_MODEL";
EM_PREDICTION = P_SALES_QTY_MODEL;

It would be great if anyone can help me out in understanding where I am going wrong

 

1 ACCEPTED SOLUTION

Accepted Solutions
WendyCzika
SAS Employee

First think I'd suggest trying is using a smaller value for the Leaf Fraction property in order to get some splits in your model.  There are also different options for handling missing values, so you can try different options for the Missing Values property.  If you have much more than 20K observations in your training data, you can also try increasing the Node Sample property.

View solution in original post

2 REPLIES 2
WendyCzika
SAS Employee

First think I'd suggest trying is using a smaller value for the Leaf Fraction property in order to get some splits in your model.  There are also different options for handling missing values, so you can try different options for the Missing Values property.  If you have much more than 20K observations in your training data, you can also try increasing the Node Sample property.

DougWielenga
SAS Employee

If you were using a binary target, I would have suggested you look at Usage Note 47965 which describes some approaches to modeling a rare target level (http://support.sas.com/kb/47/965.html) which is often the reason why a default Tree or Gradient Boosting model does not find any splits. Since the target is interval, consider using some of the following approaches to identifying possible issues:

 

1 - Review the log - sometimes the log explains why splits were not found for certain variables

 

2 - Run a Variable Selection node to identify which if any of the variables appear to be useful for splitting

NOTE: The Variable Selection node generates grouped versions of the nominal/ordinal variables, and it optionally can create grouped versions of interval variables.   

 

3 - Evaluate the distribution/relationship between the target variable and the grouped and ungrouped versions of the variables identified as important to determine if there are any potential concerns (e.g. is the target heavily skewed?  are there too many missing values in the input?  are there too many class levels for a nominal variable? is there too little support for the target in large ranges of the interval variable? etc...)

 

4 - Run a Decision Tree against the data to see if it generates any splits, and review the results to see if there are any apparent issues that might point to a cause.

 

5 - Create a binary target out of your interval target by creating a threshold (e.g. flag those observations that are in the top 10% of the data) as "Yes" and the rest as "No".  You can then try the approaches given in Usage Note 47965 for a binary target.

 

Hope this helps!

 

Cordially,

Doug

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1886 views
  • 0 likes
  • 3 in conversation