Bayesian Learning - Naive Bayes Algorithm

Bayesian Learning
Understanding randomness, uncertainty, and learning from data
By
Sharmila Chidaravalli
Assistant Professor
Department of ISE
Global Academy of Technology

Introduction to Probability-based Learning
Definition:
Probability-based learning is a practical learning method that combines prior knowledge
(or prior probabilities) with observed data.
Concept:
It uses probability theory to model randomness, uncertainty, and noise for predicting future
events.
Application:
Useful for handling large datasets and for making inferences using Bayes' Rule.
Probabilistic Models:
•Involve randomness and provide solutions in the form of probability distributions.
•Can handle uncertain and noisy data effectively.
Deterministic Models:
•Do not involve randomness.
•Produce the same output every time for the same initial conditions.
•Result in a single, fixed outcome.

Introduction to Probability-based Learning
Bayesian Learning
Difference from General Probabilistic Learning:
•Bayesian learning uses subjective probabilities—based on an individual’s belief
or interpretation.
•These probabilities can change over time with new information.
Key Algorithms:
•Naïve Bayes Learning
•Bayesian Belief Network (BBN)
These use prior probabilities and apply Bayes’ Rule to draw conclusions and make
predictions.

Fundamentals of Bayes Theorem
Bayes Theorem
• Goal: To determine the most probable hypothesis, given the data D plus any initial knowledge
about the prior probabilities of the various hypotheses in H.
•P(h∣D): Posterior probability of hypothesis h given data D
→ Updated belief after seeing the data
•P(D∣h): Likelihood of data D given hypothesis h
→ How well h explains the data
•P(h): Prior probability of h
→ Initial belief before seeing the data
•P(D):Evidence or marginal likelihood
→ Total probability of data under all hypotheses

Bayes Theorem
• Prior probability of h, P(h): it reflects any background knowledge we have about the chance that h is a
correct hypothesis (before having observed the data).
• Prior probability of D, P(D): it reflects the probability that training data D will be observed given no
knowledge about which hypothesis h holds.
• Conditional Probability of observation D, P(D|h): it denotes the probability of observing data D given
some world in which hypothesis h holds.
• Posterior probability of h, P(h|D): it represents the probability that h holds given the observed training
data D. It reflects our confidence that h holds after we have seen the training data D and it is the
quantity that Machine Learning researchers are interested in.
Bayes Theorem allows us to compute P(h|D):
P(h|D)=[P(D|h)P(h)] /P(D)

Bayes Theorem - Example
Step 1: Calculated Probabilities
P(A) = 4/7
P(B) = 3/7
P(B | A) = 2/4 = 1/2
P(A | B) = 2/3
Is Bayes Theorem Correct???

Bayes Theorem - Example
Step 2: Bayes’ Theorem Verification
Bayes’ Theorem: P(A | B) = [P(B | A) * P(A)] / P(B)
Substituting values:
P(A | B) = [1/2 * 4/7] / 3/7 P(B | A) = [2/3 * 3/7] / 4/7
P(A | B) = (2/7) / (3/7) = 2/3 P(B | A) = (2/7) / (4/7) = 2/4
✅ Result matches actual value → Bayes’ Theorem holds true

Consider a boy who has a volleyball tournament on the next day, but today he feels sick. It is unusual that there
is only a 40% chance he would fall sick since he is a healthy boy. Now, Find the probability of the boy
participating in the tournament. The boy is very much interested in volley ball, so there is a 90% probability
that he would participate in tournaments and 20% that he will fall sick given that he participates in the
tournament.

Consider a boy who has a volleyball tournament on the next day, but today he feels sick. It is unusual that there
is only a 40% chance he would fall sick since he is a healthy boy. Now, Find the probability of the boy
participating in the tournament. The boy is very much interested in volley ball, so there is a 90% probability
that he would participate in tournaments and 20% that he will fall sick given that he participates in the
tournament.
The probability of the boy participating in the tournament given that he is sick is:
P (Boy participating in the tournament | He is sick)
= P (Boy participating in the tournament) × P (He is sick | Boy participating in the tournament)/P (He is Sick)
Solution:
P (Boy participating in the tournament) = 90%
P (He is sick | Boy participating in the tournament) = 20%
P (He is Sick) = 40%
P (Boy participating in the tournament | He is sick) = (0.9 × 0.2)/0.4 = 0.45
Hence, 45% is the probability that the boy will participate in the tournament given that he is sick.

Assume the following probabilities, the probability of a person having Malaria to be 0.02%, the probability of
the test to be positive on detecting Malaria, given that the person has Malaria is 98% and similarly the
probability of the test to be negative on detecting Malaria, given that the person doesn’t have malaria to be
95%. Find the probability of a person having Malaria; given that, the test result is positive.

Classification Using Bayes Model
Maximum A Posteriori (MAP) Hypothesis, hMAP
The learner considers some set of candidate hypotheses H and it is
interested in finding the most probable hypothesis h ∈ H given the observed data D
Any such maximally probable hypothesis is called a maximum a posteriori (MAP) hypothesis
hMAP
We can determine the MAP hypotheses by using Bayes theorem to calculate the posterior
probability of each candidate hypothesis.

Maximum Likelihood (ML) Hypothesis, hML
If we assume that every hypothesis in H is equally probable
i.e. P(hi) = P(hj) for all hi and hj in H
We can only consider P(D|h) to find the most probable hypothesis.
P(D|h) is often called the likelihood of the data D given h
Any hypothesis that maximizes P(D|h) is called a maximum likelihood
(ML) hypothesis, hML.

NAÏVE BAYES ALGORITHM
It is a supervised binary class or multi class classification algorithm that works on the principle of Bayes
theorem.
There is a family of Naïve Bayes classifiers based on a common principle.
These algorithms classify for datasets whose features are independent and each feature is assumed to be given
equal weightage.
It particularly works for a large dataset and is very fast. It is one of the most effective and simple classification
algorithms. This algorithm considers all features to be independent of each other even though they are
individually dependent on the classified object.
Each of the features contributes a probability value independently during classification and hence this algorithm
is called as Naïve algorithm.
Some important applications of these algorithms are text classification, recommendation system and face
recognition.

NAÏVE BAYES ALGORITHM
Naive Bayes Assumption

Problem 1:
Consider new instance = (Sunny, Cool, High, Strong) Yes/No????
Consider new instance = (Overcast, Hot, High, Strong) Yes/No????

Bayesian Learning - Naive Bayes Algorithm

Problem 2 : Assess students Performance using Naïve Bayes Algorithm with the set as given.
Predict whether a student gets a job offer or not in his final year of the course CGPA = >=9,
Interactiveness = Yes, Practical Knowledge = Average ,Communication Skills = Average.

Problem 3
New Instance to be classified = (Red, SUV, Domestic) ---Yes/No?????

Problem 4
New Instance to be classified = (Single Parent, Young, Low) ---Yes/No?????

Bayes Optimal Classifier
• Normally we consider:
What is the most probable hypothesis given the training data?
• We can also consider:
what is the most probable classification of the new instance given the training
data?

Consider
• Three possible hypotheses:
P(h1|D) = .4, P(h2|D) = .3, P(h3|D) = .3
• Given new instance x,
h1(x) = +, h2(x) = −, h3(x) = −
• What’s most probable classification of x?
Bayes Optimal Classifier

Given the hypothesis space with 4 hypothesis h1, h2, h3 and h4. Determine if the patient is diagnosed as COVID
positive or COVID negative using Bayes Optimal classifier.

Naïve Bayes classifiers work on the principle of Bayes’ Theorem, assuming conditional independence
between features.
For continuous attributes, we cannot use frequency counts directly as with categorical attributes.
Instead, we assume that the values follow a Gaussian (Normal) distribution.
Naïve Bayes Algorithm for Continuous Attributes (Gaussian Naïve Bayes)

Based on numerical data find the gender for a person with following data
Height = 6ft, Weight = 130lbs and foot-size = 8 inch ,find Gender =?
Use Naïve Bayes Algorithm
Gender Height Weight FootSize
Male 6.00 180 12
Male 5.92 190 11
Male 5.58 170 12
Male 5.92 165 10
Female 5.00 100 6
Female 5.50 150 8
Female 5.42 130 7
Female 5.75 150 9

Gender Height Weight FootSize
Male 6.00 180 12
Male 5.92 190 11
Male 5.58 170 12
Male 5.92 165 10
Female 5.00 100 6
Female 5.50 150 8
Female 5.42 130 7
Female 5.75 150 9

Analyze the student performance using Navie Bayes algorithm for continuous attribute. Predict whether
student with test instance CGPA=8.5, Interactiveness=Yes) will get job offer or not in the final year.

Bayesian Learning - Naive Bayes Algorithm

More Related Content

More from Sharmila Chidaravalli (10)

Recently uploaded (20)

Bayesian Learning - Naive Bayes Algorithm