Numpy Gradient - Descent Optimizer of Neural Networks
Last Updated :
29 Mar, 2023
NumPy Gradient Descent Optimizer is a commonly used optimization algorithm in neural network training that is based on the gradient descent algorithm. It is used to minimize the cost function of a neural network model, by adjusting the model's weights and biases through a series of iterations.
The basic steps of NumPy Gradient Descent Optimizer are as follows:
Initialize the model's weights and biases to small random values.
- Calculate the output of the model for a given input using the forward propagation algorithm.
- Calculate the error between the predicted output and the actual output using a cost function.
- Calculate the gradient of the cost function with respect to the weights and biases using backpropagation algorithm.
- Update the weights and biases using the gradient and a learning rate parameter.
- Repeat steps 2-5 for a number of iterations or until convergence.
- The advantage of using NumPy Gradient Descent Optimizer is that it is a simple, yet effective algorithm for minimizing the cost function of a neural network model. It can handle large amounts of data, is easy to implement, and can be applied to different types of neural network models.
However, there are some potential disadvantages to using this algorithm. One potential disadvantage is that it can be slow to converge, especially if the learning rate is set too low. Another potential disadvantage is that it may get stuck in local minima, resulting in suboptimal solutions. To mitigate these issues, several variations of gradient descent algorithm such as Stochastic Gradient Descent, Mini-batch Gradient Descent and Adam optimization have been developed.
Advantages of NumPy Gradient Descent Optimizer:
- Simple and easy to implement: NumPy Gradient Descent Optimizer is a simple algorithm that is easy to implement, making it a popular choice for optimizing neural network models.
- Can handle large datasets: NumPy Gradient Descent Optimizer is efficient at handling large datasets, making it suitable for training deep neural network models with a large number of parameters.
- Can be applied to different neural network architectures: NumPy Gradient Descent Optimizer can be applied to different neural network architectures, including feedforward, convolutional, and recurrent neural networks.
- Can be parallelized: The computation involved in NumPy Gradient Descent Optimizer can be easily parallelized, allowing for faster training on multi-core CPUs and GPUs.
Disadvantages of NumPy Gradient Descent Optimizer:
- Can be slow to converge: NumPy Gradient Descent Optimizer can be slow to converge, especially if the learning rate is set too low, which can result in longer training times.
- May get stuck in local minima: NumPy Gradient Descent Optimizer can get stuck in local minima, which can result in suboptimal solutions.
- Requires careful hyperparameter tuning: The performance of NumPy Gradient Descent Optimizer depends on the choice of hyperparameters, such as the learning rate, batch size, and number of iterations, which requires careful tuning.
- Sensitive to feature scaling: NumPy Gradient Descent Optimizer can be sensitive to feature scaling, which requires normalization or standardization of input features to improve the convergence speed and accuracy of the algorithm.
In differential calculus, the derivative of a function tells us how much the output changes with a small nudge in the input variable. This idea can be extended to multivariable functions as well. This article shows the implementation of the Gradient Descent Algorithm using NumPy. The idea is very simple- start with an arbitrary starting point and move towards the minimum (that is -ve of gradient value), and return a point that is as close to the minimum.
GD() is a user-defined function employed for this purpose. It takes the following parameters:
- gradient is a function which or it can be a python callable object which takes a vector & returns the gradient of a function which we are trying to minimize.
- start is the arbitrary starting point which we give to the function, it is a single independent variable. It can also be a list, Numpy array for multivariable.
- learn_rate controls the magnitude by which the vectors get updated.
- n_iter is the number of iterations the operation should run.
- tol is the tolerance level that specifies the minimum movement in each iteration.
Given below is the implementation to produce out required functionality.
Example:
Python3
import numpy as np
def GD(f, start, lr, n_iter=50, tol=1e-05):
res = start
for _ in range(n_iter):
# gradient is calculated using the np.gradient
# function.
new_val = -lr * np.gradient(f)
if np.all(np.abs(new_val) <= tol):
break
res += new_val
# we return a vector as the gradient can be
# multivariable function. if the function has 1
# dependent variable then it returns a scalar value.
return res
# Example 1
f = np.array([1, 2, 4, 7, 11, 16], dtype=float)
print(f"The vector notation of global minima:{GD(f,10,0.01)}")
# Example 2
f = np.array([2, 4], dtype=float)
print(f'The vector notation of global minima: {GD(f,10,0.1)}')
Output:Â
The vector notation of global minima:[9.5 Â 9.25 8.75 8.25 7.75 7.5 ]
The vector notation of global minima: [2.0539126e-15 2.0539126e-15]
Lets see relevant concepts used in this function in detail.
Tolerance Level Application
The below line of code enables GD() to terminate early and return before n_iter is completed if the update is less than or equal to tolerance level this particularly speeds up the process when we reach a local minimum or a saddle point where the increment movement is very slow due to very low gradient thus it speeds up the convergence rate.
Python3
if np.all(np.abs(new_val) <= tol):
break
Learning Rate Usage (Hyper-parameter)
- The learning rate is a very crucial hyper-parameter as it affects the behavior of the gradient descent algorithm. For example, if we change the learning rate from 0.2 to 0.7 we get another solution that's very close to 0, but because of the high learning rate there is a large change in x and i.e it passes the minimum value multiple times, hence it oscillates before settling to zero. This oscillation increases the convergence time of the entire algorithm.
- A small learning rate can lead to slow convergence and to make the matter worst if the no of iterations is limiting small then the algorithm might even return before it finds the minimum.
Given below is an example to show how learning rate affects out result.
Example:
Python3
import numpy as np
def GD(f, start, lr, n_iter=50, tol=1e-05):
res = start
for _ in range(n_iter):
# gradient is calculated using the np.gradient function.
new_val = -lr * np.gradient(f)
if np.all(np.abs(new_val) <= tol):
break
res += new_val
# we return a vector as the gradient can be multivariable function.
# if the function has 1 dependent variable then it returns a scalar value.
return res
f = np.array([2, 4], dtype=float)
# low learning rate doesn't allow to converge at global minima
print(f'The vector notation of global minima: {GD(f,10,0.001)}')
Output:Â
[9.9 9.9]
The value returned by the algorithm is not even close to 0. This indicates that our algorithm returns before converging to global minima.
Similar Reads
Optimization Rule in Deep Neural Networks
In machine learning, optimizers and loss functions are two fundamental components that help improve a modelâs performance.A loss function evaluates a model's effectiveness by computing the difference between expected and actual outputs. Common loss functions include log loss, hinge loss, and mean sq
5 min read
Optimization in Neural Networks and Newton's Method
In machine learning, optimizers and loss functions are two components that help improve the performance of the model. A loss function measures the performance of a model by measuring the difference between the output expected from the model and the actual output obtained from the model. Mean square
12 min read
Implementation of neural network from scratch using NumPy
Neural networks are a core component of deep learning models, and implementing them from scratch is a great way to understand their inner workings. we will demonstrate how to implement a basic Neural networks algorithm from scratch using the NumPy library in Python, focusing on building a three-lett
7 min read
How to implement neural networks in PyTorch?
This tutorial shows how to use PyTorch to create a basic neural network for classifying handwritten digits from the MNIST dataset. Neural networks, which are central to modern AI, enable machines to learn tasks like regression, classification, and generation. With PyTorch, you'll learn how to design
5 min read
ML | Momentum-based Gradient Optimizer
Momentum-based gradient optimizers are used to optimize the training of machine learning models. They are more advanced than the classic gradient descent method and helps to accelerate the training process especially for large-scale datasets and deep neural networks.By incorporating a "momentum" ter
4 min read
Gradient Descent Optimization in Tensorflow
Gradient descent is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function. In other words, gradient descent is an iterative algorithm that helps to find the optimal solution to a given problem.In this blog, we will discuss gra
15+ min read
A single neuron neural network in Python
Neural networks are the core of deep learning, a field that has practical applications in many different areas. Today neural networks are used for image classification, speech recognition, object detection, etc. Now, Let's try to understand the basic unit behind all these states of art techniques.A
3 min read
GrowNet: Gradient Boosting Neural Networks
GrowNet was proposed in 2020 by students from Purdue, UCLA, and Virginia Tech in collaboration with engineers from Amazon and LinkedIn California. They proposed a new gradient boosting algorithm where they used a shallow neural network as the weak learners, a general loss function for training the g
6 min read
Stochastic Gradient Descent In R
Gradient Descent is an iterative optimization process that searches for an objective functionâs optimum value (Minimum/Maximum). It is one of the most used methods for changing a modelâs parameters to reduce a cost function in machine learning projects. In this article, we will learn the concept of
9 min read
Neural Network Pruning in Deep Learning
As deep learning models have grown larger and more complex, they have also become more resource-intensive in terms of computational power and memory. In many real-world applications, especially on edge devices like mobile phones or embedded systems, these resource-heavy models are not feasible to de
7 min read