Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- SAS Communities Library
- /
- Decision Tree in Layman’s Terms

Options

- RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content

- Article History
- RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content

Views
4,840

A decision tree is a graphical representation of all the possible solutions to a decision based on certain conditions. Tree models where the target variable can take a finite set of values are calledclassification treesand target variable can take continuous values (numbers) are calledregression trees.

Consider the above picture, whether I should accept a new job offer or not? For that, we need to create a decision tree starting with the base condition or the root node (in blue colour) was that the minimum salary should be $ 100,000 if it has not $ 100,000 then you are not accepting the offer. So, if your salary is greater than 100,000 then you will further check whether the company is giving one-month vacation or not? If they are not giving then you are declining the offer. If they are giving a vacation, then you will further check whether the company is offering a free gym? If they are not providing free gym then you are declining the offer. If they are providing free gym then you are happily accepting the offer. This is just an example of a decision tree.

There are many specific decision-tree algorithms are available. Notable ones include:

- ID3 (Iterative Dichotomiser 3)
- CART (Classification And Regression Tree)
- Chi-square automatic interaction detection (CHAID). Performs multi-level splits when computing classification trees.

In this article, we will see ID3. There are three commonly used impurity measures used in decision trees: **Entropy**, **Gini index**, and **Classification Error**. Decision tree algorithms use **information gain **to split a node. Gini index or entropy is the criterion for calculating information gain. Gini index used by CART algorithm and Entropy used by ID3 algorithm. Before getting into the details, let’s read about impurity.

Suppose if you have a basket full of apples and another bowl contains full of the same label called Apple. If you are asked to pick one item from each basket and bowl, then the probability of getting the Apple and it’s correct label is 1 so in this case, you can say that impurity is 0

Suppose now there are three different fruits in the basket and three different labels in the bowl then the probability of matching the fruit to a label is obviously not 1, it’s something less than that. It could be possible that if we pick a banana from the basket and randomly pick a label from the bowl it says grapes. So here, any random permutation and the combination can be possible. In this case, we can say that impurity is not zero.

*Entropy is an indicator of how messy your data is.*

Entropy is the measure of randomness or unpredictability in the dataset. In other terms, it controls how a decision tree decides to split the data. Entropy is the measure of homogeneity in the data. Its value ranges from 0 to 1. The entropy is 0 if all samples of a node belong to the same class (not good for training dataset), and the entropy is maximal if we have a uniform class distribution (good for training dataset). The equation of entropy is

*Information gain (IG)** measures how much “information” a feature gives us about the class. *The information gain is based on the decrease in entropy after a dataset is split on an attribute. It is the main parameter used to construct a Decision Tree. **An attribute with the highest Information gain will be tested/split first.**

**Information gain = base entropy — new entropy**

Let’s take an example of below cartoon dataset. Thanks to Minsuk Heo for sharing this example. You can check his youtube channel here

The dataset has a cartoon, winter, >1 attributes. Family winter photo is our target. Total 8 pictures. We need to teach baby to pick the correct winter family vacation photo.

We have to frame the conditions that split the data in such a way that the information gain is the highest. Please note gain is the measure of the decrease in entropy after splitting. First, will calculate the entropy for the above dataset.

Total of 8 photos. Winter family photo — 1 (Yes), Now winter family photo — 7 (No). If we substitute in the above entropy formula,

= -(1/8) * log2(1/8) — (7/8) * log2(7/8)

**Entropy = 0.543**

We got three attributes namely cartoon, winter and >1. So which attribute is a best for building a decision tree? We need to calculate information gain for all three attributes in order to choose the best one or root node. Our base entropy is 0.543

Information Gain for a cartoon, winter,

Cartoon has high information gain, so the root node is a cartoon character.

The root node is a cartoon character. We need to split again based on the other two attributes winter or >1. Calculate again information gain and choose the highest one for selecting the next split.

> 1 attribute has a high information gain, split the tree accordingly. Final tree as follows

- Decision trees are easy to visualize and interpret.
- It can easily capture non — linear patterns.
- It can handle both numerical and categorical data.
- Little effort required for data preparation. (example, no need to normalize the data)

- Overfitting is one of the most practical difficulties for decision tree models.
- Low accuracy for continuous variables: While working with continuous numerical variables, decision tree loses information when it categorizes variables in different categories.
- It is unstable, meaning that a small change in the data can lead to a large change in the structure of the optimal decision tree.
- Decision trees are biased with imbalance dataset, so it is recommended that balance out the dataset before creating the decision tree.

I will explain the CART algorithm and overfitting issues in my next article.

Keep learning and stay tuned for more!

If you find any mistakes or improvements required, please feel free to comment below.

Comments

07-08-2019
05:49 PM

- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content

07-08-2019
05:49 PM

11-20-2019
10:26 AM

- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content

11-20-2019
10:26 AM

Hi!

Where can I find the next article in this series?

Thank you.

Rain

11-20-2019
11:08 AM

- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content

11-20-2019
11:08 AM

**Don't miss out on SAS Innovate - Register now for the FREE Livestream!**

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

Data Literacy is for **all**, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.

Article Labels

Article Tags