BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jmrg2300
Calcite | Level 5

I'm on SAS Enterprise Miner v 14.1. I have a fairly simple Decision tree model. The target is a binary variable (1,0) and I have about 380K observations. The data source is a SAS data set, and I have about 40 or so variables (Interval, Binary, Nominal).

 

The way enterprise miner is behaving is intermittent. Sometimes, I get a full blown decision tree and then when I make a small change to the Decision tree parameter (for. e.g change it from maximum depth of 6 to 4, I just a single node in the output. It basically does not produce any results what so over. The same model worked great before, until I changed the underlying data and now it behaves erratically. 

 

I'm attaching the properties of the Decision tree node. Most of them are the default. 

 

Anyone encounter this before? Know how to solve it?

Thanks.


Miner options for decision tree.png
1 ACCEPTED SOLUTION

Accepted Solutions
DougWielenga
SAS Employee

Keep in mind that Tree models are 'unstable' in the sense that slight changes in the input data can result in very different looking trees since differences in one split can result in differences in all subsequent splits.  A tree with only one node is essentially the null model (all observations receive the same prediction) and can occur when you have weak predictors and/or when you have a rare event with no decision profile.  By default, SAS Enterprise Miner assigns observations to the most likely outcome.  If one group is far more likely than the other and you have not specified any additional weight to correctly choosing the rare event, you might end up with a one node model since no splitting results in a node with more events than non-events.  This is explained in SAS Note 47965 available at

 

  http://support.sas.com/kb/47/965.html

You have a few options:

   1 - choose an alternate setting for Assessment Measure such as Average Square Error or Lift (under the Subtree section of properties for the Decision Tree node) 

   2 - build a target profile following the steps in Usage Note 47965 from the link above

   3 - build the Tree interactively by clicking on the ... to the right of Interactive in the Decision Tree properties

 

Any of these approaches should give you a model, although you should consider using a Data Partition if you have sufficient data to avoid overfitting. 

 

Hope this helps!

Doug

View solution in original post

4 REPLIES 4
Reeza
Super User

@jmrg2300 wrote:

The same model worked great before, until I changed the underlying data and now it behaves erratically. 

 

 


How does the data compare to before?

When it does the one node, how predictive is it? Is that one variable correct? if only one variable is significant it's going to be highly predictive, which means it's possible that it may be a measurement error, or it's something that's only known after the fact.  

 

I wonder if your priors are severly skewed, ie how many 1 vs 0 do you have in your 380K observations?

 

 

jmrg2300
Calcite | Level 5

No, its not a data issue and not what you're thinking. There isn't one variable that is predictive. In fact the output is just one node showing my target variable distribution. None of the input variables. This happens when I reduce the depth of the tree from 6 to 5 or 4.

 

That one property change results in no output. Can't figure out why when it worked yesterday. The underlying data changes slightly. The same variables/features/target are used here too. Just the data itself changed and not that drastically.

DougWielenga
SAS Employee

Keep in mind that Tree models are 'unstable' in the sense that slight changes in the input data can result in very different looking trees since differences in one split can result in differences in all subsequent splits.  A tree with only one node is essentially the null model (all observations receive the same prediction) and can occur when you have weak predictors and/or when you have a rare event with no decision profile.  By default, SAS Enterprise Miner assigns observations to the most likely outcome.  If one group is far more likely than the other and you have not specified any additional weight to correctly choosing the rare event, you might end up with a one node model since no splitting results in a node with more events than non-events.  This is explained in SAS Note 47965 available at

 

  http://support.sas.com/kb/47/965.html

You have a few options:

   1 - choose an alternate setting for Assessment Measure such as Average Square Error or Lift (under the Subtree section of properties for the Decision Tree node) 

   2 - build a target profile following the steps in Usage Note 47965 from the link above

   3 - build the Tree interactively by clicking on the ... to the right of Interactive in the Decision Tree properties

 

Any of these approaches should give you a model, although you should consider using a Data Partition if you have sufficient data to avoid overfitting. 

 

Hope this helps!

Doug

RalphAbbey
SAS Employee

Have you also tried the HP Tree node? You lose some flexibility, such as the interactive decision tree, but it might also be something to explore.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 6746 views
  • 0 likes
  • 4 in conversation