SlideShare a Scribd company logo
Decision Tree
Decision Tree Tutorial
Decision Tree Tutorial
I think I have to buy a car
Decision Tree Tutorial
How do I decide which
one to buy?
Decision Tree Tutorial
Is mileage > 20?
Is Price < 15 Lakhs?
Will it be sufficient for 6
people?
Number of airbags = 4
Anti-lock brakes?
Decision Tree Tutorial
This seems good
What’s in it for you?
What is Machine Learning?
What is Decision Tree?
Problems in Machine Learning
What are the problems a Decision Tree solves?
Advantages of Decision Tree
Disadvantages of Decision Tree
How does Decision Tree work?
Use Case – Loan repayment prediction
Types of Machine Learning
What is Machine Learning?
What is Machine Learning?
I wish I was smarter
What is Machine Learning?
I wish I was smarter
What is Machine Learning?
Artificial Intelligence
What is Machine Learning?
Artificial Intelligence
What is Machine Learning?
I can think in
newer ways now
What is Machine Learning?
Learn
Analyze
Decide
Remember
Machine Learning
Recognize
Predict
What is Machine Learning?
Machine Learning is an application of Artificial Intelligence wherein the system gets the ability
to automatically learn and improve based on experience
Ordinary system
What is Machine Learning?
Machine Learning is an application of Artificial Intelligence wherein the system gets the ability
to automatically learn and improve based on experience
Ordinary system
With Artificial
Intelligence
What is Machine Learning?
Machine Learning is an application of Artificial Intelligence wherein the system gets the ability
to automatically learn and improve based on experience
Ordinary system
Ability to learn and improve on
its own
What is Machine Learning?
Machine Learning is an application of Artificial Intelligence wherein the system gets the ability
to automatically learn and improve based on experience
Ordinary system
Ability to learn and improve on
its own
Machine Learning
Types of Machine Learning
Types of Machine Learning
Supervised Learning
Types of Machine Learning
Supervised Learning Unsupervised Learning
Types of Machine Learning
Supervised Learning Unsupervised Learning Reinforcement Learning
Problems in Machine Learning?
Problems in Machine Learning
Classification
Problems with categorical
solutions like ‘Yes’ or ‘No’,
‘True’ or ‘False’,’1’ or ‘0’
Problems in Machine Learning
Classification Regression
Problems with categorical
solutions like ‘Yes’ or ‘No’,
‘True’ or ‘False’,’1’ or ‘0’
Problems wherein
continuous value needs to
be predicted like ‘Product
Prices’, ‘Profit’
Problems in Machine Learning
Classification
Problems with categorical
solutions like ‘Yes’ or ‘No’,
‘True’ or ‘False’,’1’ or ‘0’
Regression Clustering
Problems wherein
continuous value needs to
be predicted like ‘Product
Prices’, ‘Profit’
Problems wherein the data
needs to be organized to
find specific patterns like
in the case of ‘Product
Recommendation’
Problems in Machine Learning
Regression Clustering
Problems wherein
continuous value needs to
be predicted like ‘Product
Prices’, ‘Profit’
Problems wherein the data
needs to be organized to
find specific patterns like
in the case of ‘Product
Recommendation’
Classification
Problems with categorical
solutions like ‘Yes’ or ‘No’,
‘True’ or ‘False’,’1’ or ‘0’
Problems in Machine Learning
Classification
Problems with categorical
solutions like ‘Yes’ or ‘No’,
‘True’ or ‘False’,’1’ or ‘0’
Decision Tree
Problems in Machine Learning
Classification
Problems with categorical
solutions like ‘Yes’ or ‘No’,
‘True’ or ‘False’,’1’ or ‘0’
Decision Tree
Random Forest
Logistic Regression
Naïve Bayes
Problems in Machine Learning
Classification
Problems with categorical
solutions like ‘Yes’ or ‘No’,
‘True’ or ‘False’,’1’ or ‘0’
Decision Tree
Random Forest
Logistic Regression
Naïve Bayes
What is Decision Tree?
What is Decision Tree?
Decision Tree is a tree shaped diagram used to determine a course of action. Each branch of the tree represents a
possible decision, occurrence or reaction
What is Decision Tree?
Decision Tree is a tree shaped diagram used to determine a course of action. Each branch of the tree represents a
possible decision, occurrence or reaction
How do I Identify a random
vegetable from a shopping
bag?
What is Decision Tree?
Decision Tree is a tree shaped diagram used to determine a course of action. Each branch of the tree represents a
possible decision, occurrence or reaction
Which Vegetable?
Is color = red?
False True
diameter > 2
False True
What is Decision Tree?
Decision Tree is a tree shaped diagram used to determine a course of action. Each branch of the tree represents a
possible decision, occurrence or reaction
Which Vegetable?
So it’s a
capsicum
Is color = red?
False True
diameter > 2
False True
Problems that Decision Tree can solve
Problems that Decision Tree can solve
RegressionClassification
Problems that Decision Tree can solve
RegressionClassification
A classification tree will determine a set
of logical if-then conditions to classify
problems.
For example, discriminating between
three types of flowers based on certain
features
Problems that Decision Tree can solve
RegressionClassification
Regression tree is used when the
target variable is numerical or
continuous in nature. We fit a
regression model to the target
variable using each of the
independent variables. Each split
is made based on the sum of
squared error.
A classification tree will determine a set
of logical if-then conditions to classify
problems.
For example, discriminating between
three types of flowers based on certain
features
Advantages of Decision tree
Advantages of Decision Tree
Advantages of Decision Tree
Simple to
understand,
interpret and
visualize
Advantages of Decision Tree
Little effort required
for data preparation
Advantages of Decision Tree
Can handle both
numerical and
categorical data
Advantages of Decision Tree
Non linear
parameters don’t
effect its
performance
Disadvantages of Decision Tree
Disadvantages of Decision Tree
Overfitting
Map
Overfitting occurs when
the algorithm captures
noise in the data
Overfitting occurs when
the algorithm captures
noise in the data
Disadvantages of Decision Tree
Overfitting
High
Variance
Map
The model can get
unstable due to small
variation in data
Overfitting occurs when
the algorithm captures
noise in the data
The model can get
unstable due to small
variation in data
Disadvantages of Decision Tree
Overfitting
High
Variance
Low biased
Tree
Map
A highly complicated Decision
tree tends to have a low bias
which makes it difficult for the
model to work with new data
Decision Tree – Important Terms
Decision Tree – Important Terms
ImportantTerms
Decision Tree – Important Terms
Entropy Example
Entropy is the measure of
randomness or
unpredictability in the
dataset
This Dataset has a very
high entropy
High entropy
Decision Tree – Important Terms
Entropy Example
Entropy is the measure of
randomness or
unpredictability in the
dataset
High entropy(E1)
Color == yellow?
True False
Height=10? Height<10?
True TrueFalse False
Lower entropy(E2)
zero entropy
After split
Decision Tree – Important Terms
Information gain Example
High entropy(E1)
Color == yellow?
True False
Height=10? Height<10?
True TrueFalse False
Lower entropy(E2)
Gain = E1 - E2
After split
It is the measure of decrease
in entropy after the dataset
is split
Decision Tree – Important Terms
Leaf Node Example
Leaf Node
Color == yellow?
True False
Height=10? Height<10?
True TrueFalse False
Leaf node carries the
classification or the decision
Decision Tree – Important Terms
Decision Node Example
decision Node
Color == yellow?
True False
Height=10? Height<10?
True TrueFalse False
Decision node has two or
more branches
Decision Tree – Important Terms
Root Node Example
Root Node
Color == yellow?
True False
Height=10? Height<10?
True TrueFalse False
The top most Decision node
is known as the Root node
How does Decision Tree work?
Wonder what kind of animals I’ll get
in the jungle today
How does a Decision Tree work?
How does a Decision Tree work?
How does a Decision Tree work?
Let’s try to classify different types of
animals based on their features using a
DecisionTree
Problem statement
To classify the different types of animals
based on their features using decision tree
How does a Decision Tree work?
The dataset is looking quite messy and
the entropy is high in this case
How does a Decision Tree work?
Problem statement
To classify the different types of animals
based on their features using decision tree
The dataset is looking quite messy and
the entropy is high in this case
Training Dataset
Color Height Label
grey
Yellow
brown
grey
Yellow
10
3
10
10
4
elephant
elephant
giraffe
Tiger
Monkey
How does a Decision Tree work?
How to split the data
We have to frame the conditions that split
the data in such a way that the
information gain is the highest
How does a Decision Tree work?
How to split the data
We have to frame the conditions that split
the data in such a way that the
information gain is the highest
Note
Gain is the measure of
decrease in entropy after
splitting
How does a Decision Tree work?
𝑖=1
𝑘
𝑃(𝑣𝑎𝑙𝑢𝑒𝑖). 𝑙𝑜𝑔2(𝑃 𝑣𝑎𝑙𝑢𝑒𝑖
How does a Decision Tree work?
Formula for entropy
Let’s try to calculate the entropy
for the current dataset
How does a Decision Tree work?
total
3
2
1
2
8
How does a Decision Tree work?
𝑖=1
𝑘
𝑃(𝑣𝑎𝑙𝑢𝑒𝑖). 𝑙𝑜𝑔2(𝑃 𝑣𝑎𝑙𝑢𝑒𝑖
Let’s use the formula
How does a Decision Tree work?
𝑖=1
𝑘
𝑃(𝑣𝑎𝑙𝑢𝑒𝑖). 𝑙𝑜𝑔2(𝑃 𝑣𝑎𝑙𝑢𝑒𝑖
Let’s use the formula
Entropy = (
3
8
) 𝑙𝑜𝑔2(
3
8
) + (
2
8
) 𝑙𝑜𝑔2(
2
8
) + (
1
8
) 𝑙𝑜𝑔2(
1
8
)+(
2
8
) 𝑙𝑜𝑔2(
2
8
) Entropy=0.571
How does a Decision Tree work?
𝑖=1
𝑘
𝑃(𝑣𝑎𝑙𝑢𝑒𝑖). 𝑙𝑜𝑔2(𝑃 𝑣𝑎𝑙𝑢𝑒𝑖
Let’s use the formula
Entropy = (
3
8
) 𝑙𝑜𝑔2(
3
8
) + (
2
8
) 𝑙𝑜𝑔2(
2
8
) + (
1
8
) 𝑙𝑜𝑔2(
1
8
)+(
2
8
) 𝑙𝑜𝑔2(
2
8
) Entropy=0.571
We will calculate the entropy
of the dataset similarly after
every split to calculate the
gain
How does a Decision Tree work?
Gain can be calculated by
finding the difference of the
subsequent entropy values
after split
Now we will try to choose a
condition that gives us the
highest gain
How does a Decision Tree work?
Now we will try to choose a
condition that gives us the
highest gain
We will do that by splitting
the data using each condition
and checking the gain that
we get out them.
How does a Decision Tree work?
We will do that by splitting
the data using each condition
and checking the gain that
we get out them.
The condition that gives us
the highest gain will be used
to make the first split
How does a Decision Tree work?
Training Dataset
Color Height Label
grey
Yellow
brown
grey
Yellow
10
3
10
10
4
elephant
elephant
giraffe
Tiger
Monkey
How does a Decision Tree work?
Conditions
Color==Yellow?
Height>=10
Color== Brown?
Color==Grey
Diameter<10
Let’s say this condition gives us the
maximum gain
How does a Decision Tree work?
Conditions
Color==Yellow?
Height>=10
Color== Brown?
Color==Grey
Diameter<10
Training Dataset
Color Height Label
grey
Yellow
brown
grey
Yellow
10
3
10
10
4
elephant
elephant
giraffe
Tiger
Monkey
We split the data
How does a Decision Tree work?
Color == yellow?
True False
The entropy after splitting has
decreased considerably
How does a Decision Tree work?
Color == yellow?
True False
The entropy after splitting has
decreased considerablyhowever we still need
some splitting at both
the branches to attain an
entropy value equal to
zero
How does a Decision Tree work?
Color == yellow?
True False
So, we decide to split
both the nodes using
‘height’ as the condition
How does a Decision Tree work?
Color == yellow?
True False
Height>=10? Height<10?
True TrueFalse False
since every branch now contains
single label type, we can say that
the entropy in this case has
reached the least value
How does a Decision Tree work?
Color == yellow?
True False
Height>=10? Height<10?
True TrueFalse False
How does a Decision Tree work?
Color == yellow?
True False
Height>=10? Height<10?
True TrueFalse False
ThisTree can now predict all the
classes of animals present in the
dataset with 100% accuracy
How does a Decision Tree work?
Color == yellow?
True False
Height>=10? Height<10?
True TrueFalse False
ThisTree can now predict all the
classes of animals present in the
dataset with 100% accuracy
That was easy
Use Case – Loan Repayment Prediction
Use Case – Loan Repayment prediction
I need to find out if my customers are
going to return the loan they took
from my bank or not
Use Case – Problem Statement
Problem statement
To predict if a customer will repay loan
amount or not using DecisionTree
algorithm in python
Use Case – Implementation
#import the necessary packages
import numpy as np
import pandas as pd
from sklearn.cross_validation import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn import tree
#Loading data file
balance_data =pd.read_csv('C:/Users/anirban.dey/Desktop/data_2.csv',
sep= ',', header= 0)
Use Case – Implementation
#import the necessary packages
print ("Dataset Lenght:: "), len(balance_data)
print ("Dataset Shape:: "), balance_data.shape
Use Case – Implementation
print ("Dataset:: ")
balance_data.head()
Use Case – Implementation
#Seperating the Target variable
X = balance_data.values[:, 1:5]
Y = balance_data.values[:,0]
#Spliting Dataset into Test and Train
X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size = 0.3,
random_state = 100)
#Function to perform training with Entropy
clf_entropy = DecisionTreeClassifier(criterion = "entropy", random_state = 100,
max_depth=3, min_samples_leaf=5)
clf_entropy.fit(X_train, y_train)
Use Case – Implementation
#Function to make Predictions
y_pred_en = clf_entropy.predict(X_test)
y_pred_en
Use Case – Implementation
#Checking Accuracy
print ("Accuracy is "), accuracy_score(y_test,y_pred)*100
Use Case
So, we have created a model that uses
decision tree algorithm to predict
whether a customer will repay the loan
or not
Use Case
The Accuracy of the model is 94.6%
Use Case
The bank can use this model to decide
whether it should approve loan request
from a particular customer or not
Key takeaways
So what’s
your next step?
Ad

Recommended

PPTX
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
Simplilearn
 
PPTX
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
Simplilearn
 
PPTX
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Simplilearn
 
PDF
Decision trees in Machine Learning
Mohammad Junaid Khan
 
PPTX
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Simplilearn
 
PPTX
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Simplilearn
 
PPTX
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Simplilearn
 
PPTX
Decision Tree Learning
Md. Ariful Hoque
 
PPTX
Random forest
Ujjawal
 
PPTX
Naive bayes
Ashraf Uddin
 
PPTX
Classification and Regression
Megha Sharma
 
PPTX
K-Folds Cross Validation Method
SHUBHAM GUPTA
 
PPTX
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Simplilearn
 
PDF
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...
Edureka!
 
PDF
Decision tree
R A Akerkar
 
PDF
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Marina Santini
 
PPTX
Classification techniques in data mining
Kamal Acharya
 
PPTX
Uncertainty in AI
Amruth Veerabhadraiah
 
ODP
Machine Learning with Decision trees
Knoldus Inc.
 
PPTX
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Simplilearn
 
PPTX
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Md. Main Uddin Rony
 
PDF
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
PDF
Confusion Matrix
Rajat Gupta
 
PPTX
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Simplilearn
 
PPT
Decision tree
Ami_Surati
 
ODP
NAIVE BAYES CLASSIFIER
Knoldus Inc.
 
PPTX
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Simplilearn
 
PDF
Module 5: Decision Trees
Sara Hooker
 
PDF
Decision tree
Learnbay Datascience
 

More Related Content

What's hot (20)

PPTX
Random forest
Ujjawal
 
PPTX
Naive bayes
Ashraf Uddin
 
PPTX
Classification and Regression
Megha Sharma
 
PPTX
K-Folds Cross Validation Method
SHUBHAM GUPTA
 
PPTX
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Simplilearn
 
PDF
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...
Edureka!
 
PDF
Decision tree
R A Akerkar
 
PDF
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Marina Santini
 
PPTX
Classification techniques in data mining
Kamal Acharya
 
PPTX
Uncertainty in AI
Amruth Veerabhadraiah
 
ODP
Machine Learning with Decision trees
Knoldus Inc.
 
PPTX
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Simplilearn
 
PPTX
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Md. Main Uddin Rony
 
PDF
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
PDF
Confusion Matrix
Rajat Gupta
 
PPTX
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Simplilearn
 
PPT
Decision tree
Ami_Surati
 
ODP
NAIVE BAYES CLASSIFIER
Knoldus Inc.
 
PPTX
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Simplilearn
 
Random forest
Ujjawal
 
Naive bayes
Ashraf Uddin
 
Classification and Regression
Megha Sharma
 
K-Folds Cross Validation Method
SHUBHAM GUPTA
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Simplilearn
 
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...
Edureka!
 
Decision tree
R A Akerkar
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Marina Santini
 
Classification techniques in data mining
Kamal Acharya
 
Uncertainty in AI
Amruth Veerabhadraiah
 
Machine Learning with Decision trees
Knoldus Inc.
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Simplilearn
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Md. Main Uddin Rony
 
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
Confusion Matrix
Rajat Gupta
 
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Simplilearn
 
Decision tree
Ami_Surati
 
NAIVE BAYES CLASSIFIER
Knoldus Inc.
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Simplilearn
 

Similar to Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Data Science |Simplilearn (20)

PDF
Module 5: Decision Trees
Sara Hooker
 
PDF
Decision tree
Learnbay Datascience
 
PPTX
Machine Learning_PPT.pptx
RajeshBabu833061
 
PDF
Machine Learning - Decision Trees
Rupak Roy
 
PPTX
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Simplilearn
 
PPTX
Machine learning session6(decision trees random forrest)
Abhimanyu Dwivedi
 
PPTX
Random Forest and KNN is fun
Zhen Li
 
PDF
Quantitative Methods for Lawyers - Class #7 - Probability & Basic Statistics ...
Daniel Katz
 
PDF
Machine learning
Dr Geetha Mohan
 
PPTX
Application of algorithm in real life
Niloy Biswas
 
PPTX
Forms of learning in ai
Robert Antony
 
PPT
An Introduction to boosting
butest
 
PPTX
Decision Trees
CloudxLab
 
PPTX
Machine Learning with Python unit-2.pptx
GORANG6
 
PDF
Machine Learning Interview Questions
Rock Interview
 
PPTX
Decision tree
ShraddhaPandey45
 
PPTX
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
RaflyRizky2
 
PPTX
Ensemble Learning and Random Forests
CloudxLab
 
PPT
Variation analysis techniques.ppt
DeeptiBhoknal
 
PPTX
Store segmentation progresso
veesingh
 
Module 5: Decision Trees
Sara Hooker
 
Decision tree
Learnbay Datascience
 
Machine Learning_PPT.pptx
RajeshBabu833061
 
Machine Learning - Decision Trees
Rupak Roy
 
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Simplilearn
 
Machine learning session6(decision trees random forrest)
Abhimanyu Dwivedi
 
Random Forest and KNN is fun
Zhen Li
 
Quantitative Methods for Lawyers - Class #7 - Probability & Basic Statistics ...
Daniel Katz
 
Machine learning
Dr Geetha Mohan
 
Application of algorithm in real life
Niloy Biswas
 
Forms of learning in ai
Robert Antony
 
An Introduction to boosting
butest
 
Decision Trees
CloudxLab
 
Machine Learning with Python unit-2.pptx
GORANG6
 
Machine Learning Interview Questions
Rock Interview
 
Decision tree
ShraddhaPandey45
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
RaflyRizky2
 
Ensemble Learning and Random Forests
CloudxLab
 
Variation analysis techniques.ppt
DeeptiBhoknal
 
Store segmentation progresso
veesingh
 
Ad

More from Simplilearn (20)

PPTX
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Simplilearn
 
PPTX
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Simplilearn
 
PPTX
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Simplilearn
 
PPTX
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
Simplilearn
 
PPTX
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
Simplilearn
 
PPTX
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
Simplilearn
 
PPTX
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Simplilearn
 
PPTX
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
Simplilearn
 
PPTX
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
Simplilearn
 
PPTX
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
PPTX
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
PPTX
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Simplilearn
 
PPTX
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Simplilearn
 
PPTX
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Simplilearn
 
PPTX
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
Simplilearn
 
PPTX
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Simplilearn
 
PPTX
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Simplilearn
 
PPTX
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Simplilearn
 
PPTX
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
PPTX
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Simplilearn
 
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Simplilearn
 
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Simplilearn
 
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
Simplilearn
 
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
Simplilearn
 
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
Simplilearn
 
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Simplilearn
 
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
Simplilearn
 
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
Simplilearn
 
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Simplilearn
 
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Simplilearn
 
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Simplilearn
 
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
Simplilearn
 
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Simplilearn
 
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Simplilearn
 
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Simplilearn
 
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Ad

Recently uploaded (20)

PPTX
ESP 10 Edukasyon sa Pagpapakatao PowerPoint Lessons Quarter 1.pptx
Sir J.
 
DOCX
ANNOTATION on objective 10 on pmes 2022-2025
joviejanesegundo1
 
PPTX
How to Configure Refusal of Applicants in Odoo 18 Recruitment
Celine George
 
PDF
The Power of Compound Interest (Stanford Initiative for Financial Decision-Ma...
Stanford IFDM
 
PPTX
SYMPATHOMIMETICS[ADRENERGIC AGONISTS] pptx
saip95568
 
PPTX
How to Manage Wins & Losses in Odoo 18 CRM
Celine George
 
PPTX
2025 Completing the Pre-SET Plan Form.pptx
mansk2
 
PDF
Romanticism in Love and Sacrifice An Analysis of Oscar Wilde’s The Nightingal...
KaryanaTantri21
 
PDF
THE PSYCHOANALYTIC OF THE BLACK CAT BY EDGAR ALLAN POE (1).pdf
nabilahk908
 
PPTX
Elo the HeroTHIS IS A STORY ABOUT A BOY WHO SAVED A LITTLE GOAT .pptx
JoyIPanos
 
PPTX
Iván Bornacelly - Presentation of the report - Empowering the workforce in th...
EduSkills OECD
 
PPTX
How Physics Enhances Our Quality of Life.pptx
AngeliqueTolentinoDe
 
PPTX
How to use _name_search() method in Odoo 18
Celine George
 
PPTX
How to Add New Item in CogMenu in Odoo 18
Celine George
 
PDF
Gladiolous Cultivation practices by AKL.pdf
kushallamichhame
 
PPTX
Urban Hierarchy and Service Provisions.pptx
Islamic University of Bangladesh
 
PDF
VCE Literature Section A Exam Response Guide
jpinnuck
 
PPTX
Photo chemistry Power Point Presentation
mprpgcwa2024
 
PPTX
Elo the Hero is an story about a young boy who became hero.
TeacherEmily1
 
PPTX
F-BLOCK ELEMENTS POWER POINT PRESENTATIONS
mprpgcwa2024
 
ESP 10 Edukasyon sa Pagpapakatao PowerPoint Lessons Quarter 1.pptx
Sir J.
 
ANNOTATION on objective 10 on pmes 2022-2025
joviejanesegundo1
 
How to Configure Refusal of Applicants in Odoo 18 Recruitment
Celine George
 
The Power of Compound Interest (Stanford Initiative for Financial Decision-Ma...
Stanford IFDM
 
SYMPATHOMIMETICS[ADRENERGIC AGONISTS] pptx
saip95568
 
How to Manage Wins & Losses in Odoo 18 CRM
Celine George
 
2025 Completing the Pre-SET Plan Form.pptx
mansk2
 
Romanticism in Love and Sacrifice An Analysis of Oscar Wilde’s The Nightingal...
KaryanaTantri21
 
THE PSYCHOANALYTIC OF THE BLACK CAT BY EDGAR ALLAN POE (1).pdf
nabilahk908
 
Elo the HeroTHIS IS A STORY ABOUT A BOY WHO SAVED A LITTLE GOAT .pptx
JoyIPanos
 
Iván Bornacelly - Presentation of the report - Empowering the workforce in th...
EduSkills OECD
 
How Physics Enhances Our Quality of Life.pptx
AngeliqueTolentinoDe
 
How to use _name_search() method in Odoo 18
Celine George
 
How to Add New Item in CogMenu in Odoo 18
Celine George
 
Gladiolous Cultivation practices by AKL.pdf
kushallamichhame
 
Urban Hierarchy and Service Provisions.pptx
Islamic University of Bangladesh
 
VCE Literature Section A Exam Response Guide
jpinnuck
 
Photo chemistry Power Point Presentation
mprpgcwa2024
 
Elo the Hero is an story about a young boy who became hero.
TeacherEmily1
 
F-BLOCK ELEMENTS POWER POINT PRESENTATIONS
mprpgcwa2024
 

Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Data Science |Simplilearn

  • 3. Decision Tree Tutorial I think I have to buy a car
  • 4. Decision Tree Tutorial How do I decide which one to buy?
  • 5. Decision Tree Tutorial Is mileage > 20? Is Price < 15 Lakhs? Will it be sufficient for 6 people? Number of airbags = 4 Anti-lock brakes?
  • 7. What’s in it for you? What is Machine Learning? What is Decision Tree? Problems in Machine Learning What are the problems a Decision Tree solves? Advantages of Decision Tree Disadvantages of Decision Tree How does Decision Tree work? Use Case – Loan repayment prediction Types of Machine Learning
  • 8. What is Machine Learning?
  • 9. What is Machine Learning? I wish I was smarter
  • 10. What is Machine Learning? I wish I was smarter
  • 11. What is Machine Learning? Artificial Intelligence
  • 12. What is Machine Learning? Artificial Intelligence
  • 13. What is Machine Learning? I can think in newer ways now
  • 14. What is Machine Learning? Learn Analyze Decide Remember Machine Learning Recognize Predict
  • 15. What is Machine Learning? Machine Learning is an application of Artificial Intelligence wherein the system gets the ability to automatically learn and improve based on experience Ordinary system
  • 16. What is Machine Learning? Machine Learning is an application of Artificial Intelligence wherein the system gets the ability to automatically learn and improve based on experience Ordinary system With Artificial Intelligence
  • 17. What is Machine Learning? Machine Learning is an application of Artificial Intelligence wherein the system gets the ability to automatically learn and improve based on experience Ordinary system Ability to learn and improve on its own
  • 18. What is Machine Learning? Machine Learning is an application of Artificial Intelligence wherein the system gets the ability to automatically learn and improve based on experience Ordinary system Ability to learn and improve on its own Machine Learning
  • 19. Types of Machine Learning
  • 20. Types of Machine Learning Supervised Learning
  • 21. Types of Machine Learning Supervised Learning Unsupervised Learning
  • 22. Types of Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning
  • 23. Problems in Machine Learning?
  • 24. Problems in Machine Learning Classification Problems with categorical solutions like ‘Yes’ or ‘No’, ‘True’ or ‘False’,’1’ or ‘0’
  • 25. Problems in Machine Learning Classification Regression Problems with categorical solutions like ‘Yes’ or ‘No’, ‘True’ or ‘False’,’1’ or ‘0’ Problems wherein continuous value needs to be predicted like ‘Product Prices’, ‘Profit’
  • 26. Problems in Machine Learning Classification Problems with categorical solutions like ‘Yes’ or ‘No’, ‘True’ or ‘False’,’1’ or ‘0’ Regression Clustering Problems wherein continuous value needs to be predicted like ‘Product Prices’, ‘Profit’ Problems wherein the data needs to be organized to find specific patterns like in the case of ‘Product Recommendation’
  • 27. Problems in Machine Learning Regression Clustering Problems wherein continuous value needs to be predicted like ‘Product Prices’, ‘Profit’ Problems wherein the data needs to be organized to find specific patterns like in the case of ‘Product Recommendation’ Classification Problems with categorical solutions like ‘Yes’ or ‘No’, ‘True’ or ‘False’,’1’ or ‘0’
  • 28. Problems in Machine Learning Classification Problems with categorical solutions like ‘Yes’ or ‘No’, ‘True’ or ‘False’,’1’ or ‘0’ Decision Tree
  • 29. Problems in Machine Learning Classification Problems with categorical solutions like ‘Yes’ or ‘No’, ‘True’ or ‘False’,’1’ or ‘0’ Decision Tree Random Forest Logistic Regression Naïve Bayes
  • 30. Problems in Machine Learning Classification Problems with categorical solutions like ‘Yes’ or ‘No’, ‘True’ or ‘False’,’1’ or ‘0’ Decision Tree Random Forest Logistic Regression Naïve Bayes
  • 32. What is Decision Tree? Decision Tree is a tree shaped diagram used to determine a course of action. Each branch of the tree represents a possible decision, occurrence or reaction
  • 33. What is Decision Tree? Decision Tree is a tree shaped diagram used to determine a course of action. Each branch of the tree represents a possible decision, occurrence or reaction How do I Identify a random vegetable from a shopping bag?
  • 34. What is Decision Tree? Decision Tree is a tree shaped diagram used to determine a course of action. Each branch of the tree represents a possible decision, occurrence or reaction Which Vegetable? Is color = red? False True diameter > 2 False True
  • 35. What is Decision Tree? Decision Tree is a tree shaped diagram used to determine a course of action. Each branch of the tree represents a possible decision, occurrence or reaction Which Vegetable? So it’s a capsicum Is color = red? False True diameter > 2 False True
  • 36. Problems that Decision Tree can solve
  • 37. Problems that Decision Tree can solve RegressionClassification
  • 38. Problems that Decision Tree can solve RegressionClassification A classification tree will determine a set of logical if-then conditions to classify problems. For example, discriminating between three types of flowers based on certain features
  • 39. Problems that Decision Tree can solve RegressionClassification Regression tree is used when the target variable is numerical or continuous in nature. We fit a regression model to the target variable using each of the independent variables. Each split is made based on the sum of squared error. A classification tree will determine a set of logical if-then conditions to classify problems. For example, discriminating between three types of flowers based on certain features
  • 42. Advantages of Decision Tree Simple to understand, interpret and visualize
  • 43. Advantages of Decision Tree Little effort required for data preparation
  • 44. Advantages of Decision Tree Can handle both numerical and categorical data
  • 45. Advantages of Decision Tree Non linear parameters don’t effect its performance
  • 47. Disadvantages of Decision Tree Overfitting Map Overfitting occurs when the algorithm captures noise in the data
  • 48. Overfitting occurs when the algorithm captures noise in the data Disadvantages of Decision Tree Overfitting High Variance Map The model can get unstable due to small variation in data
  • 49. Overfitting occurs when the algorithm captures noise in the data The model can get unstable due to small variation in data Disadvantages of Decision Tree Overfitting High Variance Low biased Tree Map A highly complicated Decision tree tends to have a low bias which makes it difficult for the model to work with new data
  • 50. Decision Tree – Important Terms
  • 51. Decision Tree – Important Terms ImportantTerms
  • 52. Decision Tree – Important Terms Entropy Example Entropy is the measure of randomness or unpredictability in the dataset This Dataset has a very high entropy High entropy
  • 53. Decision Tree – Important Terms Entropy Example Entropy is the measure of randomness or unpredictability in the dataset High entropy(E1) Color == yellow? True False Height=10? Height<10? True TrueFalse False Lower entropy(E2) zero entropy After split
  • 54. Decision Tree – Important Terms Information gain Example High entropy(E1) Color == yellow? True False Height=10? Height<10? True TrueFalse False Lower entropy(E2) Gain = E1 - E2 After split It is the measure of decrease in entropy after the dataset is split
  • 55. Decision Tree – Important Terms Leaf Node Example Leaf Node Color == yellow? True False Height=10? Height<10? True TrueFalse False Leaf node carries the classification or the decision
  • 56. Decision Tree – Important Terms Decision Node Example decision Node Color == yellow? True False Height=10? Height<10? True TrueFalse False Decision node has two or more branches
  • 57. Decision Tree – Important Terms Root Node Example Root Node Color == yellow? True False Height=10? Height<10? True TrueFalse False The top most Decision node is known as the Root node
  • 58. How does Decision Tree work?
  • 59. Wonder what kind of animals I’ll get in the jungle today How does a Decision Tree work?
  • 60. How does a Decision Tree work?
  • 61. How does a Decision Tree work? Let’s try to classify different types of animals based on their features using a DecisionTree
  • 62. Problem statement To classify the different types of animals based on their features using decision tree How does a Decision Tree work?
  • 63. The dataset is looking quite messy and the entropy is high in this case How does a Decision Tree work? Problem statement To classify the different types of animals based on their features using decision tree
  • 64. The dataset is looking quite messy and the entropy is high in this case Training Dataset Color Height Label grey Yellow brown grey Yellow 10 3 10 10 4 elephant elephant giraffe Tiger Monkey How does a Decision Tree work?
  • 65. How to split the data We have to frame the conditions that split the data in such a way that the information gain is the highest How does a Decision Tree work?
  • 66. How to split the data We have to frame the conditions that split the data in such a way that the information gain is the highest Note Gain is the measure of decrease in entropy after splitting How does a Decision Tree work?
  • 67. 𝑖=1 𝑘 𝑃(𝑣𝑎𝑙𝑢𝑒𝑖). 𝑙𝑜𝑔2(𝑃 𝑣𝑎𝑙𝑢𝑒𝑖 How does a Decision Tree work? Formula for entropy Let’s try to calculate the entropy for the current dataset
  • 68. How does a Decision Tree work? total 3 2 1 2 8
  • 69. How does a Decision Tree work? 𝑖=1 𝑘 𝑃(𝑣𝑎𝑙𝑢𝑒𝑖). 𝑙𝑜𝑔2(𝑃 𝑣𝑎𝑙𝑢𝑒𝑖 Let’s use the formula
  • 70. How does a Decision Tree work? 𝑖=1 𝑘 𝑃(𝑣𝑎𝑙𝑢𝑒𝑖). 𝑙𝑜𝑔2(𝑃 𝑣𝑎𝑙𝑢𝑒𝑖 Let’s use the formula Entropy = ( 3 8 ) 𝑙𝑜𝑔2( 3 8 ) + ( 2 8 ) 𝑙𝑜𝑔2( 2 8 ) + ( 1 8 ) 𝑙𝑜𝑔2( 1 8 )+( 2 8 ) 𝑙𝑜𝑔2( 2 8 ) Entropy=0.571
  • 71. How does a Decision Tree work? 𝑖=1 𝑘 𝑃(𝑣𝑎𝑙𝑢𝑒𝑖). 𝑙𝑜𝑔2(𝑃 𝑣𝑎𝑙𝑢𝑒𝑖 Let’s use the formula Entropy = ( 3 8 ) 𝑙𝑜𝑔2( 3 8 ) + ( 2 8 ) 𝑙𝑜𝑔2( 2 8 ) + ( 1 8 ) 𝑙𝑜𝑔2( 1 8 )+( 2 8 ) 𝑙𝑜𝑔2( 2 8 ) Entropy=0.571 We will calculate the entropy of the dataset similarly after every split to calculate the gain
  • 72. How does a Decision Tree work? Gain can be calculated by finding the difference of the subsequent entropy values after split
  • 73. Now we will try to choose a condition that gives us the highest gain How does a Decision Tree work?
  • 74. Now we will try to choose a condition that gives us the highest gain We will do that by splitting the data using each condition and checking the gain that we get out them. How does a Decision Tree work?
  • 75. We will do that by splitting the data using each condition and checking the gain that we get out them. The condition that gives us the highest gain will be used to make the first split How does a Decision Tree work?
  • 76. Training Dataset Color Height Label grey Yellow brown grey Yellow 10 3 10 10 4 elephant elephant giraffe Tiger Monkey How does a Decision Tree work? Conditions Color==Yellow? Height>=10 Color== Brown? Color==Grey Diameter<10
  • 77. Let’s say this condition gives us the maximum gain How does a Decision Tree work? Conditions Color==Yellow? Height>=10 Color== Brown? Color==Grey Diameter<10 Training Dataset Color Height Label grey Yellow brown grey Yellow 10 3 10 10 4 elephant elephant giraffe Tiger Monkey
  • 78. We split the data How does a Decision Tree work? Color == yellow? True False
  • 79. The entropy after splitting has decreased considerably How does a Decision Tree work? Color == yellow? True False
  • 80. The entropy after splitting has decreased considerablyhowever we still need some splitting at both the branches to attain an entropy value equal to zero How does a Decision Tree work? Color == yellow? True False
  • 81. So, we decide to split both the nodes using ‘height’ as the condition How does a Decision Tree work? Color == yellow? True False Height>=10? Height<10? True TrueFalse False
  • 82. since every branch now contains single label type, we can say that the entropy in this case has reached the least value How does a Decision Tree work? Color == yellow? True False Height>=10? Height<10? True TrueFalse False
  • 83. How does a Decision Tree work? Color == yellow? True False Height>=10? Height<10? True TrueFalse False ThisTree can now predict all the classes of animals present in the dataset with 100% accuracy
  • 84. How does a Decision Tree work? Color == yellow? True False Height>=10? Height<10? True TrueFalse False ThisTree can now predict all the classes of animals present in the dataset with 100% accuracy That was easy
  • 85. Use Case – Loan Repayment Prediction
  • 86. Use Case – Loan Repayment prediction I need to find out if my customers are going to return the loan they took from my bank or not
  • 87. Use Case – Problem Statement Problem statement To predict if a customer will repay loan amount or not using DecisionTree algorithm in python
  • 88. Use Case – Implementation #import the necessary packages import numpy as np import pandas as pd from sklearn.cross_validation import train_test_split from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import accuracy_score from sklearn import tree #Loading data file balance_data =pd.read_csv('C:/Users/anirban.dey/Desktop/data_2.csv', sep= ',', header= 0)
  • 89. Use Case – Implementation #import the necessary packages print ("Dataset Lenght:: "), len(balance_data) print ("Dataset Shape:: "), balance_data.shape
  • 90. Use Case – Implementation print ("Dataset:: ") balance_data.head()
  • 91. Use Case – Implementation #Seperating the Target variable X = balance_data.values[:, 1:5] Y = balance_data.values[:,0] #Spliting Dataset into Test and Train X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size = 0.3, random_state = 100) #Function to perform training with Entropy clf_entropy = DecisionTreeClassifier(criterion = "entropy", random_state = 100, max_depth=3, min_samples_leaf=5) clf_entropy.fit(X_train, y_train)
  • 92. Use Case – Implementation #Function to make Predictions y_pred_en = clf_entropy.predict(X_test) y_pred_en
  • 93. Use Case – Implementation #Checking Accuracy print ("Accuracy is "), accuracy_score(y_test,y_pred)*100
  • 94. Use Case So, we have created a model that uses decision tree algorithm to predict whether a customer will repay the loan or not
  • 95. Use Case The Accuracy of the model is 94.6%
  • 96. Use Case The bank can use this model to decide whether it should approve loan request from a particular customer or not

Editor's Notes