SlideShare a Scribd company logo
Probability Distributions
Probability distribution
• Probability distribution is a function that gives the
likelihood of occurrence of all possible outcomes of an
experiment.
• Categories: -
• Discrete probability distribution
• Continuous probability distribution
• Functions used to describe a probability distribution: -
• Probability mass function (Discrete)
• Probability density function (Continuous)
A random variable is a variable that represents a numerical
outcome of a random experiment. Hence a probability
distribution function gives the probability of all the possible
values that a random variable can take.
Random variable may be discrete or continuous.
Why is probability distribution
significant?
• They show all the possible values for a set of data and how often they
occur.
• Distributions of data display the spread and shape of data
• Helps in standardized comparisons/analysis.
• Data exhibiting a defined distribution have predefined statistical
attributes
Mean = Median = Mode
Probability Distribution Function
• The probability distribution function is also known as the
cumulative distribution function (CDF).
• If there is a random variable, X, and its value is evaluated at
a point, x, then the probability distribution function gives
the probability that X will take a value lesser than or equal to
x. It can be written as
F(x) = P (X x)
≤
Probability distribution function can be used for both discrete
and continuous variables.
Probability Distribution Function
(Example)
• Let the random variable X represent the number of heads obtained in
two tosses of a coin.
• Sample space: {HH, HT, TH, TT}
• Probability distribution function:
• Probability of obtaining less than/equal to one head,
P(X 1) = P(X = 0) + P (X = 1)
≤
= ¼ + ½
= ¾
No. of heads 0 1 2 Sum
PDF, P(X) ¼ ½ ¼ 1
Probability distribution of a
discrete random variable
• A discrete random variable can be
defined as a variable that can take a
countable distinct value like 0, 1, 2, 3...
• Probability Mass Function: p(x) = P(X =
x)
• Probability Distribution Function: F(x) =
P (X x)
≤
• Examples of discrete probability
distribution: -
• Binomial distribution
• Bernoulli distribution
• Poisson distribution
Probability distribution of a discrete random
variable
https://www.youtube.com/watch?v=YXLVjCKVP7U&ab_channel=zedstatistics
Probability Distribution of a
Continuous Random Variable
• A continuous random variable can be
defined as a variable that can take on
infinitely many values.
• The probability that a continuous random
variable will take on an exact value is 0.
• Probability Distribution Function: F(x) = P (X
x)
≤
• Probability Density Function: f(x) = d/dx (F(x))
• Examples of continuous probability
distribution: -
• Normal distribution
• Uniform distribution
• Exponential distribution
Probability Distribution of a Continuous Random
Variable
• A
Bernoulli Distribution
• A Bernoulli distribution has only two possible outcomes, namely 1
(success) and 0 (failure), and a single trial.
• The random variable X can take the following values: -
• 1 with the probability of success, p
• 0 with the probability of failure, q = 1 – p
• Probability mass function (PMF), P(x)
• Expected value or mean = p
• Variance = p.q
Bernoulli Distribution
• Probability of success, p when x = 1 and failure, q when x = 0.
• Note: p and q may not be the same.
Binomial distribution
• When multiple trials of an experiment that yields a
success/failure (Bernoulli distribution) is conducted, it exhibits a
binomial distribution.
PMF, P
where, n = number of trials
x = number of successes
p = probability of success
q = probability of failure
• Expected value = n.p
• Variance = n.p.q
Binomial distribution (Example)
A store manager estimates the probability of a customer making a
purchase as 0.30. What is the probability that two of the next three
customers will make a purchase?
Solution:
The above exhibits a binomial distribution as there are three customers ( 3
trials) with every customer either making a purchase (success) or not
making a purchase (failure).
Probability that two of the next three customers will make a purchase,
P
Normal distribution
• In a normal distribution the data
tends to be around a central value
with no bias left or right.
• Also called a bell curve as it looks
like a bell.
• Many things follow a normal
distribution – heights of people,
marks scored in a test.
Normal distribution
Mean = Median = Mode
68% of data lie within one standard deviation
95% of data lie within one standard deviation
https://www.mathsisfun.com/data/standard-n
ormal-distribution.html
Skewness
Negative skew: The long tail is on the negative side of the peak
Positive skew: The long tail is on the positive side of the peak
https://www.mathsisfun.com/data/skewness.html
Uniform distribution
• In a Uniform Distribution there is an equal probability for all
values of the random variable between a and b.
Relationship between two variables
• Covariance and correlation and are two statistical measures
that describe the relationship between two variables.
• They both quantify how two variables change together, but
they differ in scale, interpretation, and units.
Covariance
• Covariance measures the direction of the linear relationship between
two variables.
• It tells you whether the variables move in the same direction (positive
covariance) or in opposite directions (negative covariance).
Covariance (Example)
Covariance between temperature and ice cream sales
Cov(X, Y) = 243
• Positive value indicates a positive
correlation between temperature and ice
cream sales.
• However, it does not specify the strength of
the relationship.
Correlation
• Correlation measures both the strength and direction of the linear
relationship between two variables.
• It lies within a within a standardized range.
• 1 – perfect positive correlation
• -1 – perfect negative correlation
• 0 – no correlation
Perfect
Positive
Correlation
Correlation(Example)
0.9575
Correlation
• Correlation only works for linear relationships.
• Correlation is 0.
Exploratory Data Analysis (EDA)
Exploratory Data Analysis refers to the critical process of performing initial
investigations on data so as to discover patterns, spot anomalies, test
hypothesis and to check assumptions with the help of summary statistics
and graphical representations.
Key Objectives of EDA:
• Understand the data structure: Gain insights into the data's size, types,
and completeness.
• Identify patterns: Detect trends, correlations, and groupings.
• Find anomalies: Spot outliers and inconsistencies in the data.
• Generate hypotheses: Form initial ideas for models, statistical testing, or
predictions.
• Refine data: Clean, transform, or filter the data for further analysis.
Steps in EDA
1. Data loading and inspection
2. Univariate analysis
3. Bivariate analysis
4. Multivariate analysis
5. Identifying missing values and outliers
6. Data transformation
7. Feature engineering
8. Hypothesis engineering
Data loading and inspection
Step 1. Load data into the workspace
df.head() command displays the first few records
Step 2. Data preview and
summary
Univariate
analysis
• Involves analyzing each
variable individually to
understand its distribution,
central tendency, and
spread.
• Numerical variables:
histograms, box plots, and
summary statistics (mean,
median, standard
deviation)
• Categorical variables: bar
charts, pie charts
References
• https://www.cuemath.com/data/probability-distribution/
• https://www.cuemath.com/data/bernoulli-distribution/

More Related Content

PPTX
Introduction-to-Probability-Distributions [Autosaved].pptx
PPTX
probabiity distributions.pptx its about types of probability distributions
PPTX
1853_Random Variable & Distribution.pptx
PDF
Prob distros
DOCX
Random variables and probability distributions Random Va.docx
PDF
CO Data Science - Workshop 1: Probability Distributions
PDF
CO Data Science - Workshop 1: Probability Distributiions
PPT
LSCM 2072_chapter 1.ppt social marketing management
Introduction-to-Probability-Distributions [Autosaved].pptx
probabiity distributions.pptx its about types of probability distributions
1853_Random Variable & Distribution.pptx
Prob distros
Random variables and probability distributions Random Va.docx
CO Data Science - Workshop 1: Probability Distributions
CO Data Science - Workshop 1: Probability Distributiions
LSCM 2072_chapter 1.ppt social marketing management

Similar to Fundamentals of Data Science Probability Distributions (20)

PPTX
random variable dkhbehudvwyetvf3ddet3evf
PDF
fi lecture 6 probability distribution pdf
PPTX
BDA_MO_1_S7_Apply_basic_analytics_methods_such_as_distributions.pptx
PDF
Machine Learning - Probability Distribution.pdf
PPTX
probability distribution term 1 IMI .pptx
PDF
Unit – III Spatial data Ajustment.pdf
DOCX
DMV (1) (1).docx
PPTX
Statistics for data science
PPTX
Statistics and probability pp
PPTX
probability types and definition and how to measure
PPTX
probability for beginners masters in africa.ppt
PPTX
Basic statistics 1
PDF
STAT-WEEK-1-2.pdfAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
PDF
review chapter-edited.pdf
PPTX
Statistical computing2
PPTX
template.pptx
DOC
Theory of probability and probability distribution
PPTX
Probability Distribution
PPTX
MODULE 1: Random Variables and Probability Distributions Quarter 3 Statistics...
PDF
Appendix 2 Probability And Statistics
random variable dkhbehudvwyetvf3ddet3evf
fi lecture 6 probability distribution pdf
BDA_MO_1_S7_Apply_basic_analytics_methods_such_as_distributions.pptx
Machine Learning - Probability Distribution.pdf
probability distribution term 1 IMI .pptx
Unit – III Spatial data Ajustment.pdf
DMV (1) (1).docx
Statistics for data science
Statistics and probability pp
probability types and definition and how to measure
probability for beginners masters in africa.ppt
Basic statistics 1
STAT-WEEK-1-2.pdfAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
review chapter-edited.pdf
Statistical computing2
template.pptx
Theory of probability and probability distribution
Probability Distribution
MODULE 1: Random Variables and Probability Distributions Quarter 3 Statistics...
Appendix 2 Probability And Statistics
Ad

More from RBeze58 (10)

PPTX
Fundamentals of Data Science Modeling Lec
DOCX
IT Laws and Practices Module 3 to Module 5
DOCX
ARTIFICIAL INTELLIGENCE 271_AI Lect Notes.docx
DOCX
COI/IT LAWS AND PRACTICES Case Study.docx
DOCX
COI/IT LAWS AND PRACTICES Module2_Casestudy.docx
PPTX
COI/ IT LAWS AND PRACTICES Module 3.pptx
PPTX
COI/ IT LAWS AND PRACTICES Module 2.pptx
PPTX
COI/ IT LAWS AND PRACTICES Module 1.pptx
PDF
Marketing Communication & Advertising.pdf
PPTX
Computer Networks 04 Data and Signal Fundamentals.pptx
Fundamentals of Data Science Modeling Lec
IT Laws and Practices Module 3 to Module 5
ARTIFICIAL INTELLIGENCE 271_AI Lect Notes.docx
COI/IT LAWS AND PRACTICES Case Study.docx
COI/IT LAWS AND PRACTICES Module2_Casestudy.docx
COI/ IT LAWS AND PRACTICES Module 3.pptx
COI/ IT LAWS AND PRACTICES Module 2.pptx
COI/ IT LAWS AND PRACTICES Module 1.pptx
Marketing Communication & Advertising.pdf
Computer Networks 04 Data and Signal Fundamentals.pptx
Ad

Recently uploaded (20)

PDF
PPT on Performance Review to get promotions
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
Current and future trends in Computer Vision.pptx
PPTX
Geodesy 1.pptx...............................................
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Well-logging-methods_new................
PPTX
additive manufacturing of ss316l using mig welding
PPTX
Sustainable Sites - Green Building Construction
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPT
Project quality management in manufacturing
PDF
737-MAX_SRG.pdf student reference guides
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PPT on Performance Review to get promotions
Internet of Things (IOT) - A guide to understanding
Current and future trends in Computer Vision.pptx
Geodesy 1.pptx...............................................
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
R24 SURVEYING LAB MANUAL for civil enggi
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Well-logging-methods_new................
additive manufacturing of ss316l using mig welding
Sustainable Sites - Green Building Construction
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Foundation to blockchain - A guide to Blockchain Tech
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
CYBER-CRIMES AND SECURITY A guide to understanding
Project quality management in manufacturing
737-MAX_SRG.pdf student reference guides
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Model Code of Practice - Construction Work - 21102022 .pdf
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT

Fundamentals of Data Science Probability Distributions

  • 2. Probability distribution • Probability distribution is a function that gives the likelihood of occurrence of all possible outcomes of an experiment. • Categories: - • Discrete probability distribution • Continuous probability distribution • Functions used to describe a probability distribution: - • Probability mass function (Discrete) • Probability density function (Continuous) A random variable is a variable that represents a numerical outcome of a random experiment. Hence a probability distribution function gives the probability of all the possible values that a random variable can take. Random variable may be discrete or continuous.
  • 3. Why is probability distribution significant? • They show all the possible values for a set of data and how often they occur. • Distributions of data display the spread and shape of data • Helps in standardized comparisons/analysis. • Data exhibiting a defined distribution have predefined statistical attributes Mean = Median = Mode
  • 4. Probability Distribution Function • The probability distribution function is also known as the cumulative distribution function (CDF). • If there is a random variable, X, and its value is evaluated at a point, x, then the probability distribution function gives the probability that X will take a value lesser than or equal to x. It can be written as F(x) = P (X x) ≤ Probability distribution function can be used for both discrete and continuous variables.
  • 5. Probability Distribution Function (Example) • Let the random variable X represent the number of heads obtained in two tosses of a coin. • Sample space: {HH, HT, TH, TT} • Probability distribution function: • Probability of obtaining less than/equal to one head, P(X 1) = P(X = 0) + P (X = 1) ≤ = ¼ + ½ = ¾ No. of heads 0 1 2 Sum PDF, P(X) ¼ ½ ¼ 1
  • 6. Probability distribution of a discrete random variable • A discrete random variable can be defined as a variable that can take a countable distinct value like 0, 1, 2, 3... • Probability Mass Function: p(x) = P(X = x) • Probability Distribution Function: F(x) = P (X x) ≤ • Examples of discrete probability distribution: - • Binomial distribution • Bernoulli distribution • Poisson distribution
  • 7. Probability distribution of a discrete random variable https://www.youtube.com/watch?v=YXLVjCKVP7U&ab_channel=zedstatistics
  • 8. Probability Distribution of a Continuous Random Variable • A continuous random variable can be defined as a variable that can take on infinitely many values. • The probability that a continuous random variable will take on an exact value is 0. • Probability Distribution Function: F(x) = P (X x) ≤ • Probability Density Function: f(x) = d/dx (F(x)) • Examples of continuous probability distribution: - • Normal distribution • Uniform distribution • Exponential distribution
  • 9. Probability Distribution of a Continuous Random Variable • A
  • 10. Bernoulli Distribution • A Bernoulli distribution has only two possible outcomes, namely 1 (success) and 0 (failure), and a single trial. • The random variable X can take the following values: - • 1 with the probability of success, p • 0 with the probability of failure, q = 1 – p • Probability mass function (PMF), P(x) • Expected value or mean = p • Variance = p.q
  • 11. Bernoulli Distribution • Probability of success, p when x = 1 and failure, q when x = 0. • Note: p and q may not be the same.
  • 12. Binomial distribution • When multiple trials of an experiment that yields a success/failure (Bernoulli distribution) is conducted, it exhibits a binomial distribution. PMF, P where, n = number of trials x = number of successes p = probability of success q = probability of failure • Expected value = n.p • Variance = n.p.q
  • 13. Binomial distribution (Example) A store manager estimates the probability of a customer making a purchase as 0.30. What is the probability that two of the next three customers will make a purchase? Solution: The above exhibits a binomial distribution as there are three customers ( 3 trials) with every customer either making a purchase (success) or not making a purchase (failure). Probability that two of the next three customers will make a purchase, P
  • 14. Normal distribution • In a normal distribution the data tends to be around a central value with no bias left or right. • Also called a bell curve as it looks like a bell. • Many things follow a normal distribution – heights of people, marks scored in a test.
  • 15. Normal distribution Mean = Median = Mode 68% of data lie within one standard deviation 95% of data lie within one standard deviation https://www.mathsisfun.com/data/standard-n ormal-distribution.html
  • 16. Skewness Negative skew: The long tail is on the negative side of the peak Positive skew: The long tail is on the positive side of the peak https://www.mathsisfun.com/data/skewness.html
  • 17. Uniform distribution • In a Uniform Distribution there is an equal probability for all values of the random variable between a and b.
  • 18. Relationship between two variables • Covariance and correlation and are two statistical measures that describe the relationship between two variables. • They both quantify how two variables change together, but they differ in scale, interpretation, and units.
  • 19. Covariance • Covariance measures the direction of the linear relationship between two variables. • It tells you whether the variables move in the same direction (positive covariance) or in opposite directions (negative covariance).
  • 20. Covariance (Example) Covariance between temperature and ice cream sales Cov(X, Y) = 243 • Positive value indicates a positive correlation between temperature and ice cream sales. • However, it does not specify the strength of the relationship.
  • 21. Correlation • Correlation measures both the strength and direction of the linear relationship between two variables. • It lies within a within a standardized range. • 1 – perfect positive correlation • -1 – perfect negative correlation • 0 – no correlation Perfect Positive Correlation
  • 23. Correlation • Correlation only works for linear relationships. • Correlation is 0.
  • 24. Exploratory Data Analysis (EDA) Exploratory Data Analysis refers to the critical process of performing initial investigations on data so as to discover patterns, spot anomalies, test hypothesis and to check assumptions with the help of summary statistics and graphical representations. Key Objectives of EDA: • Understand the data structure: Gain insights into the data's size, types, and completeness. • Identify patterns: Detect trends, correlations, and groupings. • Find anomalies: Spot outliers and inconsistencies in the data. • Generate hypotheses: Form initial ideas for models, statistical testing, or predictions. • Refine data: Clean, transform, or filter the data for further analysis.
  • 25. Steps in EDA 1. Data loading and inspection 2. Univariate analysis 3. Bivariate analysis 4. Multivariate analysis 5. Identifying missing values and outliers 6. Data transformation 7. Feature engineering 8. Hypothesis engineering
  • 26. Data loading and inspection Step 1. Load data into the workspace df.head() command displays the first few records Step 2. Data preview and summary
  • 27. Univariate analysis • Involves analyzing each variable individually to understand its distribution, central tendency, and spread. • Numerical variables: histograms, box plots, and summary statistics (mean, median, standard deviation) • Categorical variables: bar charts, pie charts