SlideShare a Scribd company logo
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
What to expect?
 What is Machine Learning?
 Introduction to Classification
 Classification Algorithms
 What is Naive Bayes?
 Use Cases of Naive Bayes
 Demo – Employee Salary Prediction
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
What is Machine Learning?
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
What is Machine Learning?
 Machine Learning explores the study and construction of algorithms that can learn from
and make predictions on data.
 Closely related to computational statistics.
 Used to devise complex models and algorithms that lend themselves to a prediction
which in commercial use is known as predictive analytics.
Speech Recognition Face Recognition Anti Virus Weather Prediction
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
Supervised vs Unsupervised Learning
Supervised Learning Unsupervised Learning
Classification is the result of supervised
learning which means that there is a known
label that you want the system to generate.
Clustering is the result of unsupervised
learning which means that you’ve seen lots of
examples, but don’t have labels.
E.g. If you built a fruit classifier, the labels will be “this
is an orange, this is an apple and this is a banana”,
based on showing the classifier examples of apples,
oranges and bananas.
E.g. In the same example, a fruit clustering will
categorize as “fruits with soft skin and lots of dimples”,
“fruits with shiny hard skin” and “elongated yellow
fruits”.
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
Introduction to
Classification
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
Introduction to Classification
 Classification is the problem of identifying to which set of categories a
new observation belongs
 It is based on the training set of data containing observations.
Figure: Examples of Classification
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
Classification Algorithms
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
Classification Algorithms
Classifier
Quadratic
Linear
SVM
Logistic Regression
Naive Bayes
Neural Networks
Decision Trees
Kernel Estimation
Perceptron
Naive Bayes
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
What is Naive Bayes?
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
What is Naive Bayes?
Let us understand Naive Bayes with the help of an example
Hi! I just cannot seem to figure
out which are the best days to
play football with my friends.
Can you help me out?
Summer Monsoon Winter
Sunny No Sun
Windy No Wind
All possible weather combinations
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
What is Naive Bayes?
That is perfect. We will be
using Naive Bayes algorithm
to predict if you should play
on a particular day or not.
I have noted down all the days
it was good/bad to play football
and the combination of weather
metrics on that particular day.
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
What is Naive Bayes?
Summer Monsoon Winter
No
Yes
Season
Sunny
Case 1 – Sunny
 We have categorized the probability
to play into “High” (P>0.5) and “Low”
(P<0.5)
 Big circles represent “High”, i.e.
probability greater than 0.5
 Small circles represent “Low”, i.e.
probability less than 0.5
Case 1 – Sunny
Moving further we can draw charts based on the probabilities of days favouring games
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
 The second attribute is the wind speeds
on a particular day.
 Let us look at how wind affects the
chances of playing Football on a particular
day.
What is Naive Bayes?
Summer Monsoon Winter
No
Yes
Season
Windy
Case 2 – Windy
Here, we will look at days where there was wind and when it was good to play
Case 2 – Windy
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
What is Naive Bayes?
Summer Monsoon Winter
(Sunny = No,
Windy = Yes)
Sunny = No
(Sunny = No,
Windy = No)
Summer Monsoon Winter
(Sunny = Yes,
Windy = Yes)
Sunny = Yes
(Sunny = Yes,
Windy = No)
Here, we have the complete set of attributes and whether to play on that day or not.
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
What is Naive Bayes?
If you notice in summer, it
is advisable to play when
there is no sun. But the
second graph shows a
different picture.
This is because a day in
Summer which is not
Sunny might have P > 0.5
but when there is no wind,
the Posterior probability
P < 0.5
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
 Naive Bayes classifier is a simple probabilistic classifier based on applying
Bayes' theorem with strong (naive) independence assumptions between the
features.
 Bayes' theorem is stated mathematically as the following equation:
where A and B are events and P(B) ≠ 0.
What is Naive Bayes?
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
Understanding Bayes’ Theorem
P(c|x) =
P(x|c) P(c)
P(x)
Likelihood Class Prior Probability
Posterior Probability Predictor Prior Probability
 Let us understand how Bayes’ Theorem can be used in Naive Bayes classifier:
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
Understanding Bayes’ Theorem
In Figure 1, We have the Posterior
Probability of Sunny across seasons
excluding Wind speed.
In Figure 2, We have the Posterior
Probabilities ( E.g. Sunny = No,
Windy = Yes and Season = Summer
)
Figure 1
Figure 2
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
Understanding Bayes’ Theorem
We can use Naive Bayes
Classifier to predict whether to
play Football on
( Season = Winter, Sunny = No ,
Windy = Yes ).
Our Demo will help you clearly
understand Naive Bayes.
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
Understanding Bayes’ Theorem
Yes No
3 2
4 0
2 3
Summer
Monsoon
Winter
Season
Play
Frequency Table
Yes No
3 4
6 1
Yes
No
Sunny
Play
Frequency Table
Yes No
6 2
3 3
Yes
No
Windy
Play
Frequency Table
From the dataset we have obtained, we will populate
frequency tables for each of the attribute
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
Understanding Bayes’ Theorem
For each of the frequency tables, we will find the likelihoods for each of the cases
P(c | x) = P(Yes | Summer) = P(Summer | Yes)* P(Yes) / P(Summer) = (0.33 x 0.64) /0.36 = 0.60
Likelihood of ‘Yes’ given Summer is:
Yes No
3/9 2/5
4/9 0/5
2/9 3/5
Summer
Monsoon
Winter
Season
Play
Likelihood Table
9/14 5/14
5/14
4/14
5/14
P(x | c) = P(Summer | Yes)
= 3/9 = 0.33
P(c) = P(Yes)
= 9/14 = 0.64
P(x) = P(Summer)
= 5/14 = 0.36
Here, c = Play and x = Variables like Season, Sunny & Windy.
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
Understanding Bayes’ Theorem
Let us use the likelihood
table to predict whether
to play football on
( Season = Winter, Sunny
= No , Windy = Yes )
P(c | x) = P(Play = Yes | Winter, Sunny = No, Windy = Yes)
= P(Winter | Yes) * P(Sunny = No | Yes) * P(Windy = Yes | Yes) * P(Yes)
P(Winter) * P(Sunny = No) * P(Windy = Yes)
= (2/9) * (6/9) * (6/9) * (9/14) / (5/14) * (7/14) * (8/14) = 0.6223
Since the probability
is greater than 0.5,
we should play
football on that day.
Yayiee!!
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
Use Cases of Naive Bayes
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
Use Cases of Naive Bayes
Email Spam Detection
Categorizing News
Face Recognition
Sentiment Analysis
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
Use Cases of Naive Bayes
Weather Prediction
Digit RecognitionMedical Diagnosis
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
Demo – Employee
Salary Prediction
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
Demo – Problem Statement
Problem Statement: To devise a model to predict an employee’s salary based on the given
set of attributes using Naive Bayes classifier.
 We have an Employee Dataset where there are 14
attributes and our output variable is Employee’s Salary.
 We will use Naive Bayes Classifier to predict an
Employee’s Salary as high(>50k) or low(<50k)by finding
out the probabilities for the given attribute combination.
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
Demo – Employee Salary Prediction
Feature Selection
Divide Dataset
Implement Model
Optimize Model
Prediction
Model Validation
Data Acquisition
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
Demo – Employee Salary Prediction
Field Description
Age_Of_emp Age of the employee
Emp_Stat_type Type of the employment industry
srnumber Serial number of the employee
Edu_of_Emp Employee education details
Edu_Cat Employee’s education category
marital_Status Employee marital status
Occ_Of_Emp Job description of the employee
Emp_rel_status Employee relationship status
Emp_race_type Race of the employee
sex_of_emp Sex of the employee
capital_gain Income from investment sources apart from wages/salary
capital_loss Losses from investment sources apart from wages/salary
Work_hour_in_week Number of weekly working hours
country_of_res Country of residence
Emp_sal Employee’s salary
Feature Selection
Divide Dataset
Implement Model
Optimize Model
Prediction
Model Validation
Data Acquisition
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
Demo – Employee Salary Prediction
Feature Selection
Divide Dataset
Implement Model
Optimize Model
Prediction
Model Validation
Data Acquisition
 From the following fields, we need to filter out unnecessary columns which
will not affect the Employee’s Salary.
 We will be removing fields srnumber, marital_Status, Emp_rel_status,
Emp_race_type, sex_of_emp, capital_gain and capital_loss because these fields
are factors which do not affect a person’s salary.
 The remaining fields will be used to build our model.
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
Demo – Employee Salary Prediction
We will divide our entire dataset into two subsets as:
 Training dataset -> To train the model
 Testing dataset -> To validate and make predictions
Feature Selection
Divide Dataset
Implement Model
Optimize Model
Prediction
Model Validation
Data Acquisition
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
Demo – Employee Salary Prediction
 We model the Naive Bayes using the library ‘e1071’ on the
training dataset that we created just now.
 The model is called emp_nb.
Feature Selection
Divide Dataset
Implement Model
Optimize Model
Prediction
Model Validation
Data Acquisition
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
Demo – Employee Salary Prediction
The following is the output from emp_nb model
Feature Selection
Divide Dataset
Implement Model
Optimize Model
Prediction
Model Validation
Data Acquisition
Likelihood of High & Low Salaries
Likelihood of Employee Department
against High & Low Salaries
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
Demo – Employee Salary Prediction
Optimizing Models refers to modifying our model so as to achieve highest accuracy.
If the P-value is > 0.05, then we should reject the model. Our P-value is lesser than 0.05, so our
model is acceptable.
Kappa is the value obtained by:
Kappa = (totalAccuracy - randomAccuracy) / (1 - randomAccuracy)
Naive Bayes classifier can be further improved using the following steps:
 Include Laplace Correction
 Normalization
Feature Selection
Divide Dataset
Implement Model
Optimize Model
Prediction
Model Validation
Data Acquisition
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
Demo – Employee Salary Prediction
 We can go ahead and check the
validation of the predictions.
 We will populate the Confusion
Matrix which shows all the metrics to
measure the accuracy, sensitivity,
specificity, prevalence, etc.
Feature Selection
Divide Dataset
Implement Model
Optimize Model
Prediction
Model Validation
Data Acquisition
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
Demo – Employee Salary Prediction
 The final step in our project is to predict the Salary of the employee based on the Naive
Bayes model that we have created.
 The prediction for our specific input is Low.
Feature Selection
Divide Dataset
Implement Model
Optimize Model
Prediction
Model Validation
Data Acquisition
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
Summary
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
Introduction to ClassificationWhat is Machine Learning?
Summary
Use Cases of Naive BayesWhat is Naive Bayes? Demo
Classification Algorithms
www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING
Thank You …
Questions/Queries/Feedback

More Related Content

PDF
Naive Bayes
PPTX
Presentation on Sentiment Analysis
PPTX
Naive Bayes Presentation
PPTX
Cross validation.pptx
PDF
Feature Engineering
PDF
Introduction to unsupervised learning: outlier detection
PDF
Performance Metrics for Machine Learning Algorithms
PPTX
Presentation On Regression
Naive Bayes
Presentation on Sentiment Analysis
Naive Bayes Presentation
Cross validation.pptx
Feature Engineering
Introduction to unsupervised learning: outlier detection
Performance Metrics for Machine Learning Algorithms
Presentation On Regression

What's hot (20)

PDF
Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...
PPTX
Lecture 1 graphical models
ODP
NAIVE BAYES CLASSIFIER
PPTX
Naive bayes
PPT
Computational Learning Theory
PPTX
Naïve Bayes Classifier Algorithm.pptx
PPTX
Naive bayes
PPTX
lazy learners and other classication methods
PPTX
Machine Learning - Splitting Datasets
PPTX
Support Vector Machine ppt presentation
PDF
Introduction to Machine Learning Classifiers
PPT
Data Mining: Concepts and techniques: Chapter 13 trend
PPTX
Instance based learning
PDF
Naive Bayes Classifier
PDF
Linear models for classification
PPTX
Data mining techniques unit III
PPTX
Support vector machines (svm)
PPT
Basics of Machine Learning
PPTX
Classification in data mining
PPT
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...
Lecture 1 graphical models
NAIVE BAYES CLASSIFIER
Naive bayes
Computational Learning Theory
Naïve Bayes Classifier Algorithm.pptx
Naive bayes
lazy learners and other classication methods
Machine Learning - Splitting Datasets
Support Vector Machine ppt presentation
Introduction to Machine Learning Classifiers
Data Mining: Concepts and techniques: Chapter 13 trend
Instance based learning
Naive Bayes Classifier
Linear models for classification
Data mining techniques unit III
Support vector machines (svm)
Basics of Machine Learning
Classification in data mining
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Ad

Similar to Naive Bayes Classifier Tutorial | Naive Bayes Classifier Example | Naive Bayes in R | Edureka (20)

PDF
Naive.pdf
PPTX
Naive_hehe.pptx
PPTX
Belief Networks & Bayesian Classification
PPTX
Introduction to Naive Bayes Algorithm ppt
PDF
Naïve Bayes Machine Learning Classification with R Programming: A case study ...
PPTX
Navies bayes
PPT
Supervised algorithms
PDF
Data Science Training | Data Science Tutorial | Data Science Certification | ...
PPT
Unit-2.ppt
PDF
Machine Learning with Python- Machine Learning Algorithms- Naïve Bayes.pdf
PDF
Naive Bayes and Decision Tree Algorithm.pdf
PPTX
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
PDF
Using Open Source Tools for Machine Learning
PPTX
Naive Bayes_1.pptx Slides of NB in classical machine learning
PDF
NAIVE BAYES ALGORITHM
PPT
9-Decision Tree Induction-23-01-2025.ppt
PPTX
Machine learning algorithms
PPTX
Calculus in Machine Learning
PDF
Data Mining the City - A (practical) introduction to Machine Learning
PDF
Barga Data Science lecture 7
Naive.pdf
Naive_hehe.pptx
Belief Networks & Bayesian Classification
Introduction to Naive Bayes Algorithm ppt
Naïve Bayes Machine Learning Classification with R Programming: A case study ...
Navies bayes
Supervised algorithms
Data Science Training | Data Science Tutorial | Data Science Certification | ...
Unit-2.ppt
Machine Learning with Python- Machine Learning Algorithms- Naïve Bayes.pdf
Naive Bayes and Decision Tree Algorithm.pdf
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
Using Open Source Tools for Machine Learning
Naive Bayes_1.pptx Slides of NB in classical machine learning
NAIVE BAYES ALGORITHM
9-Decision Tree Induction-23-01-2025.ppt
Machine learning algorithms
Calculus in Machine Learning
Data Mining the City - A (practical) introduction to Machine Learning
Barga Data Science lecture 7
Ad

More from Edureka! (20)

PDF
What to learn during the 21 days Lockdown | Edureka
PDF
Top 10 Dying Programming Languages in 2020 | Edureka
PDF
Top 5 Trending Business Intelligence Tools | Edureka
PDF
Tableau Tutorial for Data Science | Edureka
PDF
Python Programming Tutorial | Edureka
PDF
Top 5 PMP Certifications | Edureka
PDF
Top Maven Interview Questions in 2020 | Edureka
PDF
Linux Mint Tutorial | Edureka
PDF
How to Deploy Java Web App in AWS| Edureka
PDF
Importance of Digital Marketing | Edureka
PDF
RPA in 2020 | Edureka
PDF
Email Notifications in Jenkins | Edureka
PDF
EA Algorithm in Machine Learning | Edureka
PDF
Cognitive AI Tutorial | Edureka
PDF
AWS Cloud Practitioner Tutorial | Edureka
PDF
Blue Prism Top Interview Questions | Edureka
PDF
Big Data on AWS Tutorial | Edureka
PDF
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
PDF
Kubernetes Installation on Ubuntu | Edureka
PDF
Introduction to DevOps | Edureka
What to learn during the 21 days Lockdown | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
Tableau Tutorial for Data Science | Edureka
Python Programming Tutorial | Edureka
Top 5 PMP Certifications | Edureka
Top Maven Interview Questions in 2020 | Edureka
Linux Mint Tutorial | Edureka
How to Deploy Java Web App in AWS| Edureka
Importance of Digital Marketing | Edureka
RPA in 2020 | Edureka
Email Notifications in Jenkins | Edureka
EA Algorithm in Machine Learning | Edureka
Cognitive AI Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
Blue Prism Top Interview Questions | Edureka
Big Data on AWS Tutorial | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Kubernetes Installation on Ubuntu | Edureka
Introduction to DevOps | Edureka

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Cloud computing and distributed systems.
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Spectroscopy.pptx food analysis technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPT
Teaching material agriculture food technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Network Security Unit 5.pdf for BCA BBA.
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Encapsulation_ Review paper, used for researhc scholars
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Machine learning based COVID-19 study performance prediction
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Cloud computing and distributed systems.
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Chapter 3 Spatial Domain Image Processing.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
20250228 LYD VKU AI Blended-Learning.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
“AI and Expert System Decision Support & Business Intelligence Systems”
Spectroscopy.pptx food analysis technology
The Rise and Fall of 3GPP – Time for a Sabbatical?
Digital-Transformation-Roadmap-for-Companies.pptx
Teaching material agriculture food technology

Naive Bayes Classifier Tutorial | Naive Bayes Classifier Example | Naive Bayes in R | Edureka

  • 2. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING What to expect?  What is Machine Learning?  Introduction to Classification  Classification Algorithms  What is Naive Bayes?  Use Cases of Naive Bayes  Demo – Employee Salary Prediction
  • 3. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING What is Machine Learning?
  • 4. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING What is Machine Learning?  Machine Learning explores the study and construction of algorithms that can learn from and make predictions on data.  Closely related to computational statistics.  Used to devise complex models and algorithms that lend themselves to a prediction which in commercial use is known as predictive analytics. Speech Recognition Face Recognition Anti Virus Weather Prediction
  • 5. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Supervised vs Unsupervised Learning Supervised Learning Unsupervised Learning Classification is the result of supervised learning which means that there is a known label that you want the system to generate. Clustering is the result of unsupervised learning which means that you’ve seen lots of examples, but don’t have labels. E.g. If you built a fruit classifier, the labels will be “this is an orange, this is an apple and this is a banana”, based on showing the classifier examples of apples, oranges and bananas. E.g. In the same example, a fruit clustering will categorize as “fruits with soft skin and lots of dimples”, “fruits with shiny hard skin” and “elongated yellow fruits”.
  • 6. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Introduction to Classification
  • 7. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Introduction to Classification  Classification is the problem of identifying to which set of categories a new observation belongs  It is based on the training set of data containing observations. Figure: Examples of Classification
  • 8. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Classification Algorithms
  • 9. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Classification Algorithms Classifier Quadratic Linear SVM Logistic Regression Naive Bayes Neural Networks Decision Trees Kernel Estimation Perceptron Naive Bayes
  • 10. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING What is Naive Bayes?
  • 11. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING What is Naive Bayes? Let us understand Naive Bayes with the help of an example Hi! I just cannot seem to figure out which are the best days to play football with my friends. Can you help me out? Summer Monsoon Winter Sunny No Sun Windy No Wind All possible weather combinations
  • 12. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING What is Naive Bayes? That is perfect. We will be using Naive Bayes algorithm to predict if you should play on a particular day or not. I have noted down all the days it was good/bad to play football and the combination of weather metrics on that particular day.
  • 13. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING What is Naive Bayes? Summer Monsoon Winter No Yes Season Sunny Case 1 – Sunny  We have categorized the probability to play into “High” (P>0.5) and “Low” (P<0.5)  Big circles represent “High”, i.e. probability greater than 0.5  Small circles represent “Low”, i.e. probability less than 0.5 Case 1 – Sunny Moving further we can draw charts based on the probabilities of days favouring games
  • 14. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING  The second attribute is the wind speeds on a particular day.  Let us look at how wind affects the chances of playing Football on a particular day. What is Naive Bayes? Summer Monsoon Winter No Yes Season Windy Case 2 – Windy Here, we will look at days where there was wind and when it was good to play Case 2 – Windy
  • 15. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING What is Naive Bayes? Summer Monsoon Winter (Sunny = No, Windy = Yes) Sunny = No (Sunny = No, Windy = No) Summer Monsoon Winter (Sunny = Yes, Windy = Yes) Sunny = Yes (Sunny = Yes, Windy = No) Here, we have the complete set of attributes and whether to play on that day or not.
  • 16. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING What is Naive Bayes? If you notice in summer, it is advisable to play when there is no sun. But the second graph shows a different picture. This is because a day in Summer which is not Sunny might have P > 0.5 but when there is no wind, the Posterior probability P < 0.5
  • 17. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING  Naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong (naive) independence assumptions between the features.  Bayes' theorem is stated mathematically as the following equation: where A and B are events and P(B) ≠ 0. What is Naive Bayes?
  • 18. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Understanding Bayes’ Theorem P(c|x) = P(x|c) P(c) P(x) Likelihood Class Prior Probability Posterior Probability Predictor Prior Probability  Let us understand how Bayes’ Theorem can be used in Naive Bayes classifier:
  • 19. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Understanding Bayes’ Theorem In Figure 1, We have the Posterior Probability of Sunny across seasons excluding Wind speed. In Figure 2, We have the Posterior Probabilities ( E.g. Sunny = No, Windy = Yes and Season = Summer ) Figure 1 Figure 2
  • 20. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Understanding Bayes’ Theorem We can use Naive Bayes Classifier to predict whether to play Football on ( Season = Winter, Sunny = No , Windy = Yes ). Our Demo will help you clearly understand Naive Bayes.
  • 21. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Understanding Bayes’ Theorem Yes No 3 2 4 0 2 3 Summer Monsoon Winter Season Play Frequency Table Yes No 3 4 6 1 Yes No Sunny Play Frequency Table Yes No 6 2 3 3 Yes No Windy Play Frequency Table From the dataset we have obtained, we will populate frequency tables for each of the attribute
  • 22. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Understanding Bayes’ Theorem For each of the frequency tables, we will find the likelihoods for each of the cases P(c | x) = P(Yes | Summer) = P(Summer | Yes)* P(Yes) / P(Summer) = (0.33 x 0.64) /0.36 = 0.60 Likelihood of ‘Yes’ given Summer is: Yes No 3/9 2/5 4/9 0/5 2/9 3/5 Summer Monsoon Winter Season Play Likelihood Table 9/14 5/14 5/14 4/14 5/14 P(x | c) = P(Summer | Yes) = 3/9 = 0.33 P(c) = P(Yes) = 9/14 = 0.64 P(x) = P(Summer) = 5/14 = 0.36 Here, c = Play and x = Variables like Season, Sunny & Windy.
  • 23. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Understanding Bayes’ Theorem Let us use the likelihood table to predict whether to play football on ( Season = Winter, Sunny = No , Windy = Yes ) P(c | x) = P(Play = Yes | Winter, Sunny = No, Windy = Yes) = P(Winter | Yes) * P(Sunny = No | Yes) * P(Windy = Yes | Yes) * P(Yes) P(Winter) * P(Sunny = No) * P(Windy = Yes) = (2/9) * (6/9) * (6/9) * (9/14) / (5/14) * (7/14) * (8/14) = 0.6223 Since the probability is greater than 0.5, we should play football on that day. Yayiee!!
  • 24. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Use Cases of Naive Bayes
  • 25. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Use Cases of Naive Bayes Email Spam Detection Categorizing News Face Recognition Sentiment Analysis
  • 26. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Use Cases of Naive Bayes Weather Prediction Digit RecognitionMedical Diagnosis
  • 27. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Demo – Employee Salary Prediction
  • 28. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Demo – Problem Statement Problem Statement: To devise a model to predict an employee’s salary based on the given set of attributes using Naive Bayes classifier.  We have an Employee Dataset where there are 14 attributes and our output variable is Employee’s Salary.  We will use Naive Bayes Classifier to predict an Employee’s Salary as high(>50k) or low(<50k)by finding out the probabilities for the given attribute combination.
  • 29. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Demo – Employee Salary Prediction Feature Selection Divide Dataset Implement Model Optimize Model Prediction Model Validation Data Acquisition
  • 30. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Demo – Employee Salary Prediction Field Description Age_Of_emp Age of the employee Emp_Stat_type Type of the employment industry srnumber Serial number of the employee Edu_of_Emp Employee education details Edu_Cat Employee’s education category marital_Status Employee marital status Occ_Of_Emp Job description of the employee Emp_rel_status Employee relationship status Emp_race_type Race of the employee sex_of_emp Sex of the employee capital_gain Income from investment sources apart from wages/salary capital_loss Losses from investment sources apart from wages/salary Work_hour_in_week Number of weekly working hours country_of_res Country of residence Emp_sal Employee’s salary Feature Selection Divide Dataset Implement Model Optimize Model Prediction Model Validation Data Acquisition
  • 31. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Demo – Employee Salary Prediction Feature Selection Divide Dataset Implement Model Optimize Model Prediction Model Validation Data Acquisition  From the following fields, we need to filter out unnecessary columns which will not affect the Employee’s Salary.  We will be removing fields srnumber, marital_Status, Emp_rel_status, Emp_race_type, sex_of_emp, capital_gain and capital_loss because these fields are factors which do not affect a person’s salary.  The remaining fields will be used to build our model.
  • 32. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Demo – Employee Salary Prediction We will divide our entire dataset into two subsets as:  Training dataset -> To train the model  Testing dataset -> To validate and make predictions Feature Selection Divide Dataset Implement Model Optimize Model Prediction Model Validation Data Acquisition
  • 33. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Demo – Employee Salary Prediction  We model the Naive Bayes using the library ‘e1071’ on the training dataset that we created just now.  The model is called emp_nb. Feature Selection Divide Dataset Implement Model Optimize Model Prediction Model Validation Data Acquisition
  • 34. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Demo – Employee Salary Prediction The following is the output from emp_nb model Feature Selection Divide Dataset Implement Model Optimize Model Prediction Model Validation Data Acquisition Likelihood of High & Low Salaries Likelihood of Employee Department against High & Low Salaries
  • 35. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Demo – Employee Salary Prediction Optimizing Models refers to modifying our model so as to achieve highest accuracy. If the P-value is > 0.05, then we should reject the model. Our P-value is lesser than 0.05, so our model is acceptable. Kappa is the value obtained by: Kappa = (totalAccuracy - randomAccuracy) / (1 - randomAccuracy) Naive Bayes classifier can be further improved using the following steps:  Include Laplace Correction  Normalization Feature Selection Divide Dataset Implement Model Optimize Model Prediction Model Validation Data Acquisition
  • 36. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Demo – Employee Salary Prediction  We can go ahead and check the validation of the predictions.  We will populate the Confusion Matrix which shows all the metrics to measure the accuracy, sensitivity, specificity, prevalence, etc. Feature Selection Divide Dataset Implement Model Optimize Model Prediction Model Validation Data Acquisition
  • 37. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Demo – Employee Salary Prediction  The final step in our project is to predict the Salary of the employee based on the Naive Bayes model that we have created.  The prediction for our specific input is Low. Feature Selection Divide Dataset Implement Model Optimize Model Prediction Model Validation Data Acquisition
  • 38. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Summary
  • 39. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Introduction to ClassificationWhat is Machine Learning? Summary Use Cases of Naive BayesWhat is Naive Bayes? Demo Classification Algorithms
  • 40. www.edureka.co/data-scienceEDUREKA DATA SCIENCE CERTIFICATION TRAINING Thank You … Questions/Queries/Feedback