SlideShare a Scribd company logo
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tutorial | Simplilearn
y
1
y2
y3
x
1
x2
xn
This simple neural network must be trained to recognize
handwritten alphabets ‘a’, ‘b’ and ‘c’
a
b
c
Neural Network
The handwritten alphabets are present as images of 28*28 pixels
y
1
y2
y3
x
1
x2
xn
a
b
c
Neural Network
28
28
The 784 pixels are fed as input to the first layer of our neural
network
y
1
y2
y3
x
1
x2
xn
a
b
c
28*28=784
Neural Network
784
neurons
28
28
The initial prediction is made using the random weights assigned
to each channel
0.3
0.5
0.2
x
1
x2
xn
a
b
c
0.2
0.
8
1.
20.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.3
28
28*28=784
Neural Network
28
28
Our network predicts the input to be ‘b’ with a probability of 0.5
0.3
0.5
0.2
x
1
x2
xn
a
b
c
0.2
0.
8
1.
20.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.328*28=784
Neural Network
28
The predicted probabilities are compared against the actual
probabilities and the error is calculated
0.3
0.5
0.2
x
1
x2
xn
a
b
c
0.2
0.
8
1.
20.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.328*28=784
0.9
0.0
0.0
actual probabilities
+0.6
-0.5
-0.2
error = actual - prediction
Neural Network
28
The magnitude indicates the amount of change while the sign
indicates an increase or decrease in the weights
0.3
0.5
0.2
x
1
x2
xn
a
b
c
0.2
0.
8
1.
20.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.328*28=784
actual probabilities
error = actual - prediction
Neural Network
0.0
0.0
-0.5
-0.2
28
0.9 +0.6
The information is transmitted back through the network
0.3
0.5
0.2
x
1
x2
xn
a
b
c
0.2
0.
8
1.
20.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.328*28=784
actual probabilities
error = actual - prediction
Neural Network
0.0
0.0
-0.5
-0.2
28
0.9 +0.6
Weights through out the network are adjusted in order to reduce
the loss in prediction
0.6
0.2
0.0
x
1
x2
xn
a
b
c
28*28=784
actual probabilities
+0.3
-0.2
0.0
error = actual - prediction
0.2
0.
8
1.
20.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.3
Neural Network
0.0
0.0
28
0.9
a
b
c
28
28
28*28=784
actual probabilities
error = actual - prediction
Neural Network
+0.2
-0.1
0.0
In this manner, we keep training the network with multiple inputs
until it is able to predict with high accuracy
0.7
0.
1
0.0
x
1
x2
xn
0.2
0.
8
1.
20.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.3
0.0
0.0
0.9
a
b
c
28
28
28*28=784
actual probabilities
error = actual - prediction
Neural Network
0.0
0.0
0.0
In this manner, we keep training the network with multiple inputs
until it is able to predict with high accuracy
0.9
0.
1
0.0
x
1
x2
xn
0.2
0.
8
1.
20.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.3
0.0
0.0
0.9
28
28
1.0
0.0
1.
0
0.0
0.0
Neural Network
0.0
x
1
x2
xn
a
b
c
28*28=784
actual probabilities
0.0
0.0
0.0
error = actual - prediction
0.4
0.
8
0.
20.
3
0.2
1
0.2
6
0.6
1.0
0.9
1.3
Similarly, our network is trained with the images for ‘b’ and ‘c’ too
Here’s a straightforward dataset. Let’s build a neural network to
predict the outputs, given the inputs
Input Output
0
1
2
3
4
0
6
1
2
1
8
2
4
Example
Input Output
Neural Network
x y
This box
represents our
neural network
x*w
Example
Input Output
Neural Network
x y
‘w’ is the weight
x*w
Example
Input Output
Neural Network
x y
The network starts training
itself by choosing a
random value for w
x*w
Example
Input Output
Neural Network
x y
x*wW=3
Example
Example
Input Output
Neural Network
x y
x*wW=3
Input Output
Neural Network
x y
x*wW=3
Example
Input Output
Neural Network
x y
x*wW=6
Our second model has
w=6
Example
Example
Input Output
Neural Network
x y
x*wW=6
Our second model has
w=6
Input Output
Neural Network
x y
x*wW=6
Example
Input Output
Neural Network
x y
x*wW=9
And finally, our third model
has w=9
Example
Example
Input Output
Neural Network
x y
x*wW=9
And finally, our third model
has w=9
Input Output
Neural Network
x y
x*wW=9
Example
Example
Input Output
Neural Network
x y
x*wW=9
We, as humans, can know just by a look at the data that our weight
should be 6. But how does the machine come to this conclusion?
Loss function
The loss function is a measurement of error which defines the
precision lost on comparing the predicted output to the actual output
loss = [(actual output) – (predicted output)]2
Loss function
Let’s apply the loss
function to input
value “2”
Input Actual Output W=3 W=6 W=9
2 12 6 12 18
---loss (12-6)2 = 36 (12-12)2 = 0 (12-18)2 = 36Los
s
---
Loss function
We now plot a graph
for weight versus loss.
Loss function
This graphical method of
finding the minimum of a
function is called gradient
descent
Gradient descent
A random point on this curve is chosen
and the slope at this point is calculated
Gradient descent
A random point on this curve is chosen
and the slope at this point is calculated
A positive slope indicates an increase in
weight
Gradient descent
This time the slope is negative. Hence,
another random point towards its left is
chosen
A positive slope indicates an increase in
weight
A negative slope indicates a decrease in
weight
Gradient descent
Gradient descent
loss
This time the slope is negative. Hence,
another random point towards its left is
chosen
A positive slope indicates an increase in
weight
A negative slope indicates a decrease in
weight
We continue checking
slopes at various points in
this manner
Our aim is to reach a point where the
slope is zero
A positive slope indicates an increase in
weight
A negative slope indicates a decrease in
weight
A zero slope indicates the appropriate
weight
Gradient descent
Our aim is to reach a point where the
slope is zero
A positive slope indicates an increase in
weight
A negative slope indicates a decrease in
weight
A zero slope indicates the appropriate
weight
Gradient descent
Backpropagation
Backpropagation is the process of updating the weights of the
network in order to reduce the error in prediction
Backpropagation
The magnitude of loss at any point on our graph, combined with
the slope is fed back to the network
backpropagation
Backpropagation
A random point on the graph gives a loss value of 36 with a
positive slope
backpropagation
Backpropagation
A random point on the graph gives a loss value of 36 with a
positive slope
We continue checking
slopes at various points in
this manner
A random point on the graph gives a loss value of 36 with a
positive slope
36 is quite a large number. This means our current weight
needs to change by a large number
A positive slope indicates that the change in weight must
be positive
Backpropagation
A random point on the graph gives a loss value of 36 with a
positive slope
We continue checking
slopes at various points in
this manner
Similarly, another random point on the graph gives a loss value of
10 with a negative slope
10 is a small number. Hence, the weight requires to be
tuned quite less
A negative slope indicates that the weight needs to be
reduced rather than increased
Backpropagation
After multiple iterations of backpropagation, our weights are
assigned the appropriate value
Input Output
x y
x*6
Backpropagation
After multiple iterations of backpropagation, our weights are
assigned the appropriate value
Input Output
x y
x*6
At this point, our network is
trained and can be used to
make predictions
Backpropagation
After multiple iterations of backpropagation, our weights are
assigned the appropriate value
Input Output
x y
x*6
Let’s now get back to our first example and see where backpropagation
and gradient descent fall into place
As mentioned earlier, our predicted output is compared against
the actual output
0.3
0.5
0.2
x
1
x2
xn
a
b
c
0.2
0.
8
1.
20.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.3
28
28
28*28=784
1.
0
0.0
0.0
actual probabilities
+0.7
-0.5
-0.2
error = actual - prediction
Neural Network
As mentioned earlier, our predicted output is compared against
the actual output
0.3
0.5
0.2
x
1
x2
xn
a
b
c
0.2
0.
8
1.
20.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.3
28
28
28*28=784
1.
0
0.0
0.0
actual probabilities
error = actual - prediction
Neural Network
+0.7
-0.5
-0.2
loss(a)  0.72 = 0.49
loss(b) 0.52 = 0.25
loss(c)  0.22 = 0.04
1st iteration
Weights through out the network are adjusted in order to reduce
the loss in prediction
0.6
0.2
0.1
x
1
x2
xn
a
b
c
28
28
28*28=784
actual probabilities
+0.4
-0.2
-0.1
error = actual - prediction
0.2
0.
8
1.
30.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.7
Neural Network
1.
0
0.0
0.0
Weights through out the network are adjusted in order to reduce
the loss in prediction
0.6
0.2
0.1
x
1
x2
xn
a
b
c
28
28
28*28=784
actual probabilities
error = actual - prediction
0.2
0.
8
1.
30.
3
0.
2
0.3
6
0.3
6
1.4
0.9
0.7
Neural Network
1.
0
0.0
0.0
loss(a) 0.72 = 0.49
loss(b) 0.52 = 0.25
loss(c) 0.22 = 0.04
1st iteration 2nd iteration
loss(a) 0.42 = 0.16
loss(b) 0.22 = 0.04
loss(c)  0.12 = 0.01
+0.4
-0.2
-0.1
Weights through out the network are adjusted in order to reduce
the loss in prediction
0.8
0.1
0.1
x
1
x2
xn
a
b
c
28
28
28*28=784
actual probabilities
+0.2
-0.1
-0.1
error = actual - prediction
0.2
0.
2
1.
20.
3
1.
2
0.6
0.3
6
0.4
0.9
0.3
Neural Network
1.
0
0.0
0.0
Weights through out the network are adjusted in order to reduce
the loss in prediction
0.8
0.1
0.1
x
1
x2
xn
a
b
c
28
28
28*28=784
actual probabilities
error = actual - prediction
0.2
0.
2
1.
20.
3
1.
2
0.6
0.3
6
0.4
0.9
0.3
Neural Network
1.
0
0.0
0.0
loss(a) 0.72 = 0.49
loss(b) 0.52 = 0.25
loss(c) 0.22 = 0.04
1st iteration 2nd iteration
loss(a) 0.42 = 0.16
loss(b) 0.22 =
0.04
loss(c) 0.12 = 0.01
+0.2
-0.1
-0.1
3rd iteration
loss(a) 0.22 = 0.04
loss(b) 0.12 =
0.01
loss(c) 0.12 =
0.01
Weights through out the network are adjusted in order to reduce
the loss in prediction
0.8
0.1
0.1
x
1
x2
xn
a
b
c
28
28
28*28=784
actual probabilities
error = actual - prediction
0.2
0.
2
1.
20.
3
1.
2
0.6
0.3
6
0.4
0.9
0.3
Neural Network
1.
0
0.0
0.0
loss(a) 0.72 = 0.49
loss(b) 0.52 = 0.25
loss(c) 0.22 = 0.04
1st iteration 2nd iteration
loss(a) 0.42 = 0.16
loss(b) 0.22 =
0.04
loss(c) 0.12 = 0.01
+0.2
-0.1
-0.1
3rd iteration
loss(a) 0.22 = 0.04
loss(b) 0.12 =
0.01
loss(c) 0.12 =
0.01
Let’s focus on finding the
minimum loss for our
variable ‘a’
Weights through out the network are adjusted in order to reduce
the loss in prediction
0.8
0.1
0.1
x
1
x2
xn
a
b
c
28
28
28*28=784
actual probabilities
error = actual - prediction
0.2
0.
2
1.
20.
3
1.
2
0.6
0.3
6
0.4
0.9
0.3
Neural Network
1.
0
0.0
0.0
loss(a) 0.72 = 0.49
loss(b) 0.52 = 0.25
loss(c) 0.22 = 0.04
1st iteration 2nd iteration
loss(a) 0.42 = 0.16
loss(b) 0.22 =
0.04
loss(c) 0.12 = 0.01
+0.2
-0.1
-0.1
3rd iteration
loss(a) 0.22 = 0.04
loss(b) 0.12 =
0.01
loss(c) 0.12 =
0.01
Let’s focus on finding the
minimum loss for our
variable ‘a’
Weights through out the network are adjusted in order to reduce
the loss in prediction
0.8
0.1
0.1
x
1
x2
xn
a
b
c
28
28
28*28=784
actual probabilities
error = actual - prediction
0.2
0.
2
1.
20.
3
1.
2
0.6
0.3
6
0.4
0.9
0.3
Neural Network
1.
0
0.0
0.0
loss(a) 0.72 = 0.49
loss(b) 0.52 = 0.25
loss(c) 0.22 = 0.04
1st iteration 2nd iteration
loss(a) 0.42 = 0.16
loss(b) 0.22 =
0.04
loss(c) 0.12 = 0.01
+0.2
-0.1
-0.1
3rd iteration
loss(a) 0.22 = 0.04
loss(b) 0.12 =
0.01
loss(c) 0.12 =
0.01
And here is where gradient
descent comes into the
picture
Neural Network
weight
loss
0.5
1
0.1
7
0.34
w1 w2 w3
Let’s assume the below to be our graph for the loss of
prediction with variable a as compared to the weights
contributing to it from the second last layer
Neural Network
weight
loss
0.5
1
0.1
7
0.34
w1 w2 w3
Random points chosen on the graph is now backpropagated
through the network in order to adjust the weights
0.8
0.1
0.1
x
1
x2
xn
a
b
c
0.4
0.
8
0.
20.
3
0.2
1
0.2
6
0.6
1.0
0.9
1.3
Neural Network
weight
loss
0.5
1
0.1
7
0.34
w1 w2 w3
The network is run once again with the new weights
1.0
0.1
0.1
x
1
x2
xn
a
b
c
0.4
0.
8
0.
20.
3
0.2
1
0.2
6
0.6
1.0
0.9
1.3
Neural Network
weight
loss
0.5
1
0.1
7
0.34
w1 w2 w3
This process is repeated multiple times till it provides
accurate predictions
1.0
0.1
0.1
x
1
x2
xn
a
b
c
0.4
0.
8
0.
20.
3
0.2
1
0.2
6
0.6
1.0
0.9
1.3
Neural Network
weight
loss
0.5
1
0.1
7
0.34
w1 w2 w3
This process is repeated multiple times till it provides
accurate predictions
1.0
0.0
0.0
x
1
x2
xn
a
b
c
0.4
0.
8
0.
20.
3
0.2
1
0.2
6
0.6
1.0
0.9
1.3
Neural Network
weight
loss
0.5
1
0.1
7
0.34
w1 w2 w3
The weights are further adjust to identify ‘b’ and ‘c’ too
0.0
1.0
0.0
x
1
x2
xn
a
b
c
0.4
0.
8
0.
20.
3
0.2
1
0.2
6
0.6
1.0
0.9
1.3
Neural Network
0.0
1.0
0.0
x
1
x2
xn
a
b
c
0.5
0.
8
0.
20.
3
0.2
1
0.2
0.6
1.0
1.9
0.3
The weights are further adjust
to identify ‘b’ and ‘c’ too
Neural Network
0.0
00
1.0
x
1
x2
xn
a
b
c
0.4
0.
3
0.
20.
3
0.2
1
0.2
0.7
0.5
0.9
1.3
The weights are further adjust
to identify ‘b’ and ‘c’ too
Neural Network
0.0
00
1.0
x
1
x2
xn
a
b
c
0.4
0.
3
0.
20.
3
0.2
1
0.7
0.5
0.9
1.3
Thus, through gradient descent
and backpropagation, our
network is completely trained
0.2
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tutorial | Simplilearn

More Related Content

What's hot (20)

PPTX
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Simplilearn
 
PPTX
Feedforward neural network
Sopheaktra YONG
 
PPTX
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Simplilearn
 
PDF
Introduction to Neural Networks
Databricks
 
PDF
Naive Bayes
CloudxLab
 
PDF
Deep Dive into Hyperparameter Tuning
Shubhmay Potdar
 
PDF
Recurrent neural networks rnn
Kuppusamy P
 
PPTX
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Simplilearn
 
PDF
Introduction to Recurrent Neural Network
Knoldus Inc.
 
PPTX
Hyperparameter Tuning
Jon Lederman
 
PDF
Deep Learning - Convolutional Neural Networks
Christian Perone
 
PPTX
Seminar Presentation | Network Intrusion Detection using Supervised Machine L...
Jowin John Chemban
 
PDF
Convolutional Neural Networks : Popular Architectures
ananth
 
PDF
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
PDF
SVM Algorithm Explained | Support Vector Machine Tutorial Using R | Edureka
Edureka!
 
PPTX
Perceptron & Neural Networks
NAGUR SHAREEF SHAIK
 
PPTX
Kmeans
Nikita Goyal
 
PPTX
Classification Algorithm.
Megha Sharma
 
PDF
Machine Learning for dummies!
ZOLLHOF - Tech Incubator
 
PDF
Intro to Neural Networks
Dean Wyatte
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Simplilearn
 
Feedforward neural network
Sopheaktra YONG
 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Simplilearn
 
Introduction to Neural Networks
Databricks
 
Naive Bayes
CloudxLab
 
Deep Dive into Hyperparameter Tuning
Shubhmay Potdar
 
Recurrent neural networks rnn
Kuppusamy P
 
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Simplilearn
 
Introduction to Recurrent Neural Network
Knoldus Inc.
 
Hyperparameter Tuning
Jon Lederman
 
Deep Learning - Convolutional Neural Networks
Christian Perone
 
Seminar Presentation | Network Intrusion Detection using Supervised Machine L...
Jowin John Chemban
 
Convolutional Neural Networks : Popular Architectures
ananth
 
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
SVM Algorithm Explained | Support Vector Machine Tutorial Using R | Edureka
Edureka!
 
Perceptron & Neural Networks
NAGUR SHAREEF SHAIK
 
Kmeans
Nikita Goyal
 
Classification Algorithm.
Megha Sharma
 
Machine Learning for dummies!
ZOLLHOF - Tech Incubator
 
Intro to Neural Networks
Dean Wyatte
 

Similar to Backpropagation And Gradient Descent In Neural Networks | Neural Network Tutorial | Simplilearn (20)

PDF
Machine Learning 1
cairo university
 
PPTX
Artificial Neural Network
Dessy Amirudin
 
PDF
Capstone paper
Muhammad Saeed
 
PPT
Artificial Neural Network
Pratik Aggarwal
 
PPTX
Multilayer & Back propagation algorithm
swapnac12
 
PPTX
Deep learning study 2
San Kim
 
PPTX
PRML Chapter 5
Sunwoo Kim
 
PDF
Lesson_8_DeepLearning.pdf
ssuser7f0b19
 
PPTX
ML_ Unit 2_Part_B
Srimatre K
 
PPTX
Neural network - how does it work - I mean... literally!
Christoph Diefenthal
 
PPTX
Machine learning Module-2, 6th Semester Elective
MayuraD1
 
PPTX
08 neural networks
ankit_ppt
 
PPTX
CS532L4_Backpropagation.pptx
MFaisalRiaz5
 
PDF
Mlp trainning algorithm
Hưng Đặng
 
PPTX
Machine Learning Essentials Demystified part2 | Big Data Demystified
Omid Vahdaty
 
PDF
Chapter3 bp
kumar tm
 
PPTX
04 Multi-layer Feedforward Networks
Tamer Ahmed Farrag, PhD
 
PDF
ARTIFICIAL-NEURAL-NETWORKMACHINELEARNING
mohanapriyastp
 
PPTX
Back Propagation-11-11-2qwasdddddd024.pptx
vinodkumarthatipamul
 
Machine Learning 1
cairo university
 
Artificial Neural Network
Dessy Amirudin
 
Capstone paper
Muhammad Saeed
 
Artificial Neural Network
Pratik Aggarwal
 
Multilayer & Back propagation algorithm
swapnac12
 
Deep learning study 2
San Kim
 
PRML Chapter 5
Sunwoo Kim
 
Lesson_8_DeepLearning.pdf
ssuser7f0b19
 
ML_ Unit 2_Part_B
Srimatre K
 
Neural network - how does it work - I mean... literally!
Christoph Diefenthal
 
Machine learning Module-2, 6th Semester Elective
MayuraD1
 
08 neural networks
ankit_ppt
 
CS532L4_Backpropagation.pptx
MFaisalRiaz5
 
Mlp trainning algorithm
Hưng Đặng
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Omid Vahdaty
 
Chapter3 bp
kumar tm
 
04 Multi-layer Feedforward Networks
Tamer Ahmed Farrag, PhD
 
ARTIFICIAL-NEURAL-NETWORKMACHINELEARNING
mohanapriyastp
 
Back Propagation-11-11-2qwasdddddd024.pptx
vinodkumarthatipamul
 
Ad

More from Simplilearn (20)

PPTX
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Simplilearn
 
PPTX
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Simplilearn
 
PPTX
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Simplilearn
 
PPTX
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
Simplilearn
 
PPTX
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
Simplilearn
 
PPTX
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
Simplilearn
 
PPTX
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Simplilearn
 
PPTX
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
Simplilearn
 
PPTX
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
Simplilearn
 
PPTX
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
PPTX
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
PPTX
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Simplilearn
 
PPTX
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Simplilearn
 
PPTX
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Simplilearn
 
PPTX
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
Simplilearn
 
PPTX
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Simplilearn
 
PPTX
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Simplilearn
 
PPTX
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Simplilearn
 
PPTX
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
PPTX
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Simplilearn
 
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Simplilearn
 
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Simplilearn
 
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
Simplilearn
 
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
Simplilearn
 
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
Simplilearn
 
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Simplilearn
 
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
Simplilearn
 
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
Simplilearn
 
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Simplilearn
 
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Simplilearn
 
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Simplilearn
 
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
Simplilearn
 
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Simplilearn
 
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Simplilearn
 
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Simplilearn
 
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Ad

Recently uploaded (20)

PDF
Our Guide to the July 2025 USPS® Rate Change
Postal Advocate Inc.
 
PPTX
PLANNING FOR EMERGENCY AND DISASTER MANAGEMENT ppt.pptx
PRADEEP ABOTHU
 
PPTX
Lesson 1 Cell (Structures, Functions, and Theory).pptx
marvinnbustamante1
 
PPTX
SYMPATHOMIMETICS[ADRENERGIC AGONISTS] pptx
saip95568
 
PDF
Gladiolous Cultivation practices by AKL.pdf
kushallamichhame
 
PPTX
Iván Bornacelly - Presentation of the report - Empowering the workforce in th...
EduSkills OECD
 
PPTX
Aerobic and Anaerobic respiration and CPR.pptx
Olivier Rochester
 
PPTX
How to Configure Taxes in Company Currency in Odoo 18 Accounting
Celine George
 
PDF
Lesson 1 : Science and the Art of Geography Ecosystem
marvinnbustamante1
 
PDF
Public Health For The 21st Century 1st Edition Judy Orme Jane Powell
trjnesjnqg7801
 
PPTX
Natural Language processing using nltk.pptx
Ramakrishna Reddy Bijjam
 
PPTX
How to Manage Wins & Losses in Odoo 18 CRM
Celine George
 
DOCX
Lesson 1 - Nature and Inquiry of Research
marvinnbustamante1
 
PDF
TechSoup Microsoft Copilot Nonprofit Use Cases and Live Demo - 2025.06.25.pdf
TechSoup
 
PDF
Lesson 1 - Nature of Inquiry and Research.pdf
marvinnbustamante1
 
DOCX
MUSIC AND ARTS 5 DLL MATATAG LESSON EXEMPLAR QUARTER 1_Q1_W1.docx
DianaValiente5
 
PDF
Free eBook ~100 Common English Proverbs (ebook) pdf.pdf
OH TEIK BIN
 
PPTX
The Gift of the Magi by O Henry-A Story of True Love, Sacrifice, and Selfless...
Beena E S
 
PDF
Learning Styles Inventory for Senior High School Students
Thelma Villaflores
 
PDF
CAD25 Gbadago and Fafa Presentation Revised-Aston Business School, UK.pdf
Kweku Zurek
 
Our Guide to the July 2025 USPS® Rate Change
Postal Advocate Inc.
 
PLANNING FOR EMERGENCY AND DISASTER MANAGEMENT ppt.pptx
PRADEEP ABOTHU
 
Lesson 1 Cell (Structures, Functions, and Theory).pptx
marvinnbustamante1
 
SYMPATHOMIMETICS[ADRENERGIC AGONISTS] pptx
saip95568
 
Gladiolous Cultivation practices by AKL.pdf
kushallamichhame
 
Iván Bornacelly - Presentation of the report - Empowering the workforce in th...
EduSkills OECD
 
Aerobic and Anaerobic respiration and CPR.pptx
Olivier Rochester
 
How to Configure Taxes in Company Currency in Odoo 18 Accounting
Celine George
 
Lesson 1 : Science and the Art of Geography Ecosystem
marvinnbustamante1
 
Public Health For The 21st Century 1st Edition Judy Orme Jane Powell
trjnesjnqg7801
 
Natural Language processing using nltk.pptx
Ramakrishna Reddy Bijjam
 
How to Manage Wins & Losses in Odoo 18 CRM
Celine George
 
Lesson 1 - Nature and Inquiry of Research
marvinnbustamante1
 
TechSoup Microsoft Copilot Nonprofit Use Cases and Live Demo - 2025.06.25.pdf
TechSoup
 
Lesson 1 - Nature of Inquiry and Research.pdf
marvinnbustamante1
 
MUSIC AND ARTS 5 DLL MATATAG LESSON EXEMPLAR QUARTER 1_Q1_W1.docx
DianaValiente5
 
Free eBook ~100 Common English Proverbs (ebook) pdf.pdf
OH TEIK BIN
 
The Gift of the Magi by O Henry-A Story of True Love, Sacrifice, and Selfless...
Beena E S
 
Learning Styles Inventory for Senior High School Students
Thelma Villaflores
 
CAD25 Gbadago and Fafa Presentation Revised-Aston Business School, UK.pdf
Kweku Zurek
 

Backpropagation And Gradient Descent In Neural Networks | Neural Network Tutorial | Simplilearn

  • 2. y 1 y2 y3 x 1 x2 xn This simple neural network must be trained to recognize handwritten alphabets ‘a’, ‘b’ and ‘c’ a b c Neural Network
  • 3. The handwritten alphabets are present as images of 28*28 pixels y 1 y2 y3 x 1 x2 xn a b c Neural Network 28 28
  • 4. The 784 pixels are fed as input to the first layer of our neural network y 1 y2 y3 x 1 x2 xn a b c 28*28=784 Neural Network 784 neurons 28 28
  • 5. The initial prediction is made using the random weights assigned to each channel 0.3 0.5 0.2 x 1 x2 xn a b c 0.2 0. 8 1. 20. 3 0. 2 0.3 6 0.3 6 1.4 0.9 0.3 28 28*28=784 Neural Network 28 28
  • 6. Our network predicts the input to be ‘b’ with a probability of 0.5 0.3 0.5 0.2 x 1 x2 xn a b c 0.2 0. 8 1. 20. 3 0. 2 0.3 6 0.3 6 1.4 0.9 0.328*28=784 Neural Network 28
  • 7. The predicted probabilities are compared against the actual probabilities and the error is calculated 0.3 0.5 0.2 x 1 x2 xn a b c 0.2 0. 8 1. 20. 3 0. 2 0.3 6 0.3 6 1.4 0.9 0.328*28=784 0.9 0.0 0.0 actual probabilities +0.6 -0.5 -0.2 error = actual - prediction Neural Network 28
  • 8. The magnitude indicates the amount of change while the sign indicates an increase or decrease in the weights 0.3 0.5 0.2 x 1 x2 xn a b c 0.2 0. 8 1. 20. 3 0. 2 0.3 6 0.3 6 1.4 0.9 0.328*28=784 actual probabilities error = actual - prediction Neural Network 0.0 0.0 -0.5 -0.2 28 0.9 +0.6
  • 9. The information is transmitted back through the network 0.3 0.5 0.2 x 1 x2 xn a b c 0.2 0. 8 1. 20. 3 0. 2 0.3 6 0.3 6 1.4 0.9 0.328*28=784 actual probabilities error = actual - prediction Neural Network 0.0 0.0 -0.5 -0.2 28 0.9 +0.6
  • 10. Weights through out the network are adjusted in order to reduce the loss in prediction 0.6 0.2 0.0 x 1 x2 xn a b c 28*28=784 actual probabilities +0.3 -0.2 0.0 error = actual - prediction 0.2 0. 8 1. 20. 3 0. 2 0.3 6 0.3 6 1.4 0.9 0.3 Neural Network 0.0 0.0 28 0.9
  • 11. a b c 28 28 28*28=784 actual probabilities error = actual - prediction Neural Network +0.2 -0.1 0.0 In this manner, we keep training the network with multiple inputs until it is able to predict with high accuracy 0.7 0. 1 0.0 x 1 x2 xn 0.2 0. 8 1. 20. 3 0. 2 0.3 6 0.3 6 1.4 0.9 0.3 0.0 0.0 0.9
  • 12. a b c 28 28 28*28=784 actual probabilities error = actual - prediction Neural Network 0.0 0.0 0.0 In this manner, we keep training the network with multiple inputs until it is able to predict with high accuracy 0.9 0. 1 0.0 x 1 x2 xn 0.2 0. 8 1. 20. 3 0. 2 0.3 6 0.3 6 1.4 0.9 0.3 0.0 0.0 0.9
  • 13. 28 28 1.0 0.0 1. 0 0.0 0.0 Neural Network 0.0 x 1 x2 xn a b c 28*28=784 actual probabilities 0.0 0.0 0.0 error = actual - prediction 0.4 0. 8 0. 20. 3 0.2 1 0.2 6 0.6 1.0 0.9 1.3 Similarly, our network is trained with the images for ‘b’ and ‘c’ too
  • 14. Here’s a straightforward dataset. Let’s build a neural network to predict the outputs, given the inputs Input Output 0 1 2 3 4 0 6 1 2 1 8 2 4 Example
  • 15. Input Output Neural Network x y This box represents our neural network x*w Example
  • 16. Input Output Neural Network x y ‘w’ is the weight x*w Example
  • 17. Input Output Neural Network x y The network starts training itself by choosing a random value for w x*w Example
  • 18. Input Output Neural Network x y x*wW=3 Example
  • 20. Input Output Neural Network x y x*wW=3 Example
  • 21. Input Output Neural Network x y x*wW=6 Our second model has w=6 Example
  • 22. Example Input Output Neural Network x y x*wW=6 Our second model has w=6
  • 23. Input Output Neural Network x y x*wW=6 Example
  • 24. Input Output Neural Network x y x*wW=9 And finally, our third model has w=9 Example
  • 25. Example Input Output Neural Network x y x*wW=9 And finally, our third model has w=9
  • 26. Input Output Neural Network x y x*wW=9 Example
  • 27. Example Input Output Neural Network x y x*wW=9 We, as humans, can know just by a look at the data that our weight should be 6. But how does the machine come to this conclusion?
  • 28. Loss function The loss function is a measurement of error which defines the precision lost on comparing the predicted output to the actual output loss = [(actual output) – (predicted output)]2
  • 29. Loss function Let’s apply the loss function to input value “2” Input Actual Output W=3 W=6 W=9 2 12 6 12 18 ---loss (12-6)2 = 36 (12-12)2 = 0 (12-18)2 = 36Los s ---
  • 30. Loss function We now plot a graph for weight versus loss.
  • 31. Loss function This graphical method of finding the minimum of a function is called gradient descent
  • 33. A random point on this curve is chosen and the slope at this point is calculated Gradient descent
  • 34. A random point on this curve is chosen and the slope at this point is calculated A positive slope indicates an increase in weight Gradient descent
  • 35. This time the slope is negative. Hence, another random point towards its left is chosen A positive slope indicates an increase in weight A negative slope indicates a decrease in weight Gradient descent
  • 36. Gradient descent loss This time the slope is negative. Hence, another random point towards its left is chosen A positive slope indicates an increase in weight A negative slope indicates a decrease in weight We continue checking slopes at various points in this manner
  • 37. Our aim is to reach a point where the slope is zero A positive slope indicates an increase in weight A negative slope indicates a decrease in weight A zero slope indicates the appropriate weight Gradient descent
  • 38. Our aim is to reach a point where the slope is zero A positive slope indicates an increase in weight A negative slope indicates a decrease in weight A zero slope indicates the appropriate weight Gradient descent
  • 39. Backpropagation Backpropagation is the process of updating the weights of the network in order to reduce the error in prediction
  • 40. Backpropagation The magnitude of loss at any point on our graph, combined with the slope is fed back to the network backpropagation
  • 41. Backpropagation A random point on the graph gives a loss value of 36 with a positive slope backpropagation
  • 42. Backpropagation A random point on the graph gives a loss value of 36 with a positive slope We continue checking slopes at various points in this manner A random point on the graph gives a loss value of 36 with a positive slope 36 is quite a large number. This means our current weight needs to change by a large number A positive slope indicates that the change in weight must be positive
  • 43. Backpropagation A random point on the graph gives a loss value of 36 with a positive slope We continue checking slopes at various points in this manner Similarly, another random point on the graph gives a loss value of 10 with a negative slope 10 is a small number. Hence, the weight requires to be tuned quite less A negative slope indicates that the weight needs to be reduced rather than increased
  • 44. Backpropagation After multiple iterations of backpropagation, our weights are assigned the appropriate value Input Output x y x*6
  • 45. Backpropagation After multiple iterations of backpropagation, our weights are assigned the appropriate value Input Output x y x*6 At this point, our network is trained and can be used to make predictions
  • 46. Backpropagation After multiple iterations of backpropagation, our weights are assigned the appropriate value Input Output x y x*6 Let’s now get back to our first example and see where backpropagation and gradient descent fall into place
  • 47. As mentioned earlier, our predicted output is compared against the actual output 0.3 0.5 0.2 x 1 x2 xn a b c 0.2 0. 8 1. 20. 3 0. 2 0.3 6 0.3 6 1.4 0.9 0.3 28 28 28*28=784 1. 0 0.0 0.0 actual probabilities +0.7 -0.5 -0.2 error = actual - prediction Neural Network
  • 48. As mentioned earlier, our predicted output is compared against the actual output 0.3 0.5 0.2 x 1 x2 xn a b c 0.2 0. 8 1. 20. 3 0. 2 0.3 6 0.3 6 1.4 0.9 0.3 28 28 28*28=784 1. 0 0.0 0.0 actual probabilities error = actual - prediction Neural Network +0.7 -0.5 -0.2 loss(a)  0.72 = 0.49 loss(b) 0.52 = 0.25 loss(c)  0.22 = 0.04 1st iteration
  • 49. Weights through out the network are adjusted in order to reduce the loss in prediction 0.6 0.2 0.1 x 1 x2 xn a b c 28 28 28*28=784 actual probabilities +0.4 -0.2 -0.1 error = actual - prediction 0.2 0. 8 1. 30. 3 0. 2 0.3 6 0.3 6 1.4 0.9 0.7 Neural Network 1. 0 0.0 0.0
  • 50. Weights through out the network are adjusted in order to reduce the loss in prediction 0.6 0.2 0.1 x 1 x2 xn a b c 28 28 28*28=784 actual probabilities error = actual - prediction 0.2 0. 8 1. 30. 3 0. 2 0.3 6 0.3 6 1.4 0.9 0.7 Neural Network 1. 0 0.0 0.0 loss(a) 0.72 = 0.49 loss(b) 0.52 = 0.25 loss(c) 0.22 = 0.04 1st iteration 2nd iteration loss(a) 0.42 = 0.16 loss(b) 0.22 = 0.04 loss(c)  0.12 = 0.01 +0.4 -0.2 -0.1
  • 51. Weights through out the network are adjusted in order to reduce the loss in prediction 0.8 0.1 0.1 x 1 x2 xn a b c 28 28 28*28=784 actual probabilities +0.2 -0.1 -0.1 error = actual - prediction 0.2 0. 2 1. 20. 3 1. 2 0.6 0.3 6 0.4 0.9 0.3 Neural Network 1. 0 0.0 0.0
  • 52. Weights through out the network are adjusted in order to reduce the loss in prediction 0.8 0.1 0.1 x 1 x2 xn a b c 28 28 28*28=784 actual probabilities error = actual - prediction 0.2 0. 2 1. 20. 3 1. 2 0.6 0.3 6 0.4 0.9 0.3 Neural Network 1. 0 0.0 0.0 loss(a) 0.72 = 0.49 loss(b) 0.52 = 0.25 loss(c) 0.22 = 0.04 1st iteration 2nd iteration loss(a) 0.42 = 0.16 loss(b) 0.22 = 0.04 loss(c) 0.12 = 0.01 +0.2 -0.1 -0.1 3rd iteration loss(a) 0.22 = 0.04 loss(b) 0.12 = 0.01 loss(c) 0.12 = 0.01
  • 53. Weights through out the network are adjusted in order to reduce the loss in prediction 0.8 0.1 0.1 x 1 x2 xn a b c 28 28 28*28=784 actual probabilities error = actual - prediction 0.2 0. 2 1. 20. 3 1. 2 0.6 0.3 6 0.4 0.9 0.3 Neural Network 1. 0 0.0 0.0 loss(a) 0.72 = 0.49 loss(b) 0.52 = 0.25 loss(c) 0.22 = 0.04 1st iteration 2nd iteration loss(a) 0.42 = 0.16 loss(b) 0.22 = 0.04 loss(c) 0.12 = 0.01 +0.2 -0.1 -0.1 3rd iteration loss(a) 0.22 = 0.04 loss(b) 0.12 = 0.01 loss(c) 0.12 = 0.01 Let’s focus on finding the minimum loss for our variable ‘a’
  • 54. Weights through out the network are adjusted in order to reduce the loss in prediction 0.8 0.1 0.1 x 1 x2 xn a b c 28 28 28*28=784 actual probabilities error = actual - prediction 0.2 0. 2 1. 20. 3 1. 2 0.6 0.3 6 0.4 0.9 0.3 Neural Network 1. 0 0.0 0.0 loss(a) 0.72 = 0.49 loss(b) 0.52 = 0.25 loss(c) 0.22 = 0.04 1st iteration 2nd iteration loss(a) 0.42 = 0.16 loss(b) 0.22 = 0.04 loss(c) 0.12 = 0.01 +0.2 -0.1 -0.1 3rd iteration loss(a) 0.22 = 0.04 loss(b) 0.12 = 0.01 loss(c) 0.12 = 0.01 Let’s focus on finding the minimum loss for our variable ‘a’
  • 55. Weights through out the network are adjusted in order to reduce the loss in prediction 0.8 0.1 0.1 x 1 x2 xn a b c 28 28 28*28=784 actual probabilities error = actual - prediction 0.2 0. 2 1. 20. 3 1. 2 0.6 0.3 6 0.4 0.9 0.3 Neural Network 1. 0 0.0 0.0 loss(a) 0.72 = 0.49 loss(b) 0.52 = 0.25 loss(c) 0.22 = 0.04 1st iteration 2nd iteration loss(a) 0.42 = 0.16 loss(b) 0.22 = 0.04 loss(c) 0.12 = 0.01 +0.2 -0.1 -0.1 3rd iteration loss(a) 0.22 = 0.04 loss(b) 0.12 = 0.01 loss(c) 0.12 = 0.01 And here is where gradient descent comes into the picture
  • 56. Neural Network weight loss 0.5 1 0.1 7 0.34 w1 w2 w3 Let’s assume the below to be our graph for the loss of prediction with variable a as compared to the weights contributing to it from the second last layer
  • 57. Neural Network weight loss 0.5 1 0.1 7 0.34 w1 w2 w3 Random points chosen on the graph is now backpropagated through the network in order to adjust the weights 0.8 0.1 0.1 x 1 x2 xn a b c 0.4 0. 8 0. 20. 3 0.2 1 0.2 6 0.6 1.0 0.9 1.3
  • 58. Neural Network weight loss 0.5 1 0.1 7 0.34 w1 w2 w3 The network is run once again with the new weights 1.0 0.1 0.1 x 1 x2 xn a b c 0.4 0. 8 0. 20. 3 0.2 1 0.2 6 0.6 1.0 0.9 1.3
  • 59. Neural Network weight loss 0.5 1 0.1 7 0.34 w1 w2 w3 This process is repeated multiple times till it provides accurate predictions 1.0 0.1 0.1 x 1 x2 xn a b c 0.4 0. 8 0. 20. 3 0.2 1 0.2 6 0.6 1.0 0.9 1.3
  • 60. Neural Network weight loss 0.5 1 0.1 7 0.34 w1 w2 w3 This process is repeated multiple times till it provides accurate predictions 1.0 0.0 0.0 x 1 x2 xn a b c 0.4 0. 8 0. 20. 3 0.2 1 0.2 6 0.6 1.0 0.9 1.3
  • 61. Neural Network weight loss 0.5 1 0.1 7 0.34 w1 w2 w3 The weights are further adjust to identify ‘b’ and ‘c’ too 0.0 1.0 0.0 x 1 x2 xn a b c 0.4 0. 8 0. 20. 3 0.2 1 0.2 6 0.6 1.0 0.9 1.3
  • 64. Neural Network 0.0 00 1.0 x 1 x2 xn a b c 0.4 0. 3 0. 20. 3 0.2 1 0.7 0.5 0.9 1.3 Thus, through gradient descent and backpropagation, our network is completely trained 0.2

Editor's Notes