Gated Recurrent Unit Networks
Last Updated :
12 Jul, 2025
In machine learning Recurrent Neural Networks (RNNs) are essential for tasks involving sequential data such as text, speech and time-series analysis. While traditional RNNs struggle with capturing long-term dependencies due to the vanishing gradient problem architectures like Long Short-Term Memory (LSTM) networks were developed to overcome this limitation.
However LSTMs are very complex structure with higher computational cost. To overcome this Gated Recurrent Unit (GRU) where introduced which uses LSTM architecture by merging its gating mechanisms offering a more efficient solution for many sequential tasks without sacrificing performance. In this article we'll learn more about them.
What are Gated Recurrent Units (GRU) ?
Gated Recurrent Units (GRUs) are a type of RNN introduced by Cho et al. in 2014. The core idea behind GRUs is to use gating mechanisms to selectively update the hidden state at each time step allowing them to remember important information while discarding irrelevant details. GRUs aim to simplify the LSTM architecture by merging some of its components and focusing on just two main gates: the update gate and the reset gate.
Structure of GRUsThe GRU consists of two main gates:
- Update Gate (z_t): This gate decides how much information from previous hidden state should be retained for the next time step.
- Reset Gate (r_t): This gate determines how much of the past hidden state should be forgotten.
These gates allow GRU to control the flow of information in a more efficient manner compared to traditional RNNs which solely rely on hidden state.
Equations for GRU Operations
The internal workings of a GRU can be described using following equations:
1. Reset gate:
r_t = \sigma \left( W_r \cdot [h_{t-1}, x_t] \right)
The reset gate determines how much of the previous hidden state h_{t-1} should be forgotten.
2. Update gate:
z_t = \sigma(W_z \cdot [h_{t-1}, x_t])
The update gate controls how much of the new information x_t should be used to update the hidden state.
Architecture of GRUs3. Candidate hidden state:
h_t' = \tanh(W_h \cdot [r_t \cdot h_{t-1}, x_t])
This is the potential new hidden state calculated based on the current input and the previous hidden state.
4. Hidden state:
h_t = (1 - z_t) \cdot h_{t-1} + z_t \cdot h_t'
The final hidden state is a weighted average of the previous hidden state h_{t-1} and the candidate hidden state h_t' based on the update gate z_t.
How GRUs Solve the Vanishing Gradient Problem
Like LSTMs, GRUs were designed to address the vanishing gradient problem which is common in traditional RNNs. GRUs help mitigate this issue by using gates that regulate the flow of gradients during training ensuring that important information is preserved and that gradients do not shrink excessively over time. By using these gates, GRUs maintain a balance between remembering important past information and learning new, relevant data.
GRU vs LSTM
GRUs are more computationally efficient because they combine the forget and input gates into a single update gate. GRUs do not maintain an internal cell state as LSTMs do, instead they store information directly in the hidden state making them simpler and faster.
Feature | LSTM (Long Short-Term Memory) | GRU (Gated Recurrent Unit) |
---|
Gates | 3 (Input, Forget, Output) | 2 (Update, Reset) |
---|
Cell State | Yes it has cell state | No (Hidden state only) |
---|
Training Speed | Slower due to complexity | Faster due to simpler architecture |
---|
Computational Load | Higher due to more gates and parameters | Lower due to fewer gates and parameters |
---|
Performance | Often better in tasks requiring long-term memory | Performs similarly in many tasks with less complexity |
---|
Implementation in Python
Now let's implement simple GRU model in Python using Keras. We'll start by preparing the necessary libraries and dataset.
1. Importing Libraries
We will import the following libraries for implementing our GRU model.
- numpy: For handling numerical data and array manipulations.
- pandas: For data manipulation and reading datasets (CSV files).
- MinMaxScaler: For normalizing the dataset.
- TensorFlow: For building and training the GRU model.
- Adam: An optimization algorithm used during training.
Python
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import GRU, Dense
from tensorflow.keras.optimizers import Adam
2. Loading the Dataset
The dataset we're using is a time-series dataset containing daily temperature data i.e forecasting dataset. It spans 8,000 days starting from January 1, 2010. You can download dataset from here.
- pd.read_csv(): Reads a CSV file into a pandas DataFrame. Here, we are assuming that the dataset has a Date column which is set as the index of the DataFrame.
- date_parser=True: Ensures that pandas parses the 'Date' column as datetime.
Python
df = pd.read_csv('data.csv', parse_dates=['Date'], index_col='Date')
print(df.head())
Output:
Loading the Dataset3. Preprocessing the Data
We will scale our data to ensure all features have equal weight and avoid any bias. In this example, we will use MinMaxScaler, which scales the data to a range between 0 and 1. Proper scaling is important because neural networks tend to perform better when input features are normalized.
Python
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(df.values)
4. Preparing Data for GRU
We will define a function to prepare our data for training our model.
- create_dataset(): Prepares the dataset for time-series forecasting. It creates sliding windows of time_step length to predict the next time step.
- X.reshape(): Reshapes the input data to fit the expected shape for the GRU which is 3D: [samples, time steps, features].
Python
def create_dataset(data, time_step=1):
X, y = [], []
for i in range(len(data) - time_step - 1):
X.append(data[i:(i + time_step), 0])
y.append(data[i + time_step, 0])
return np.array(X), np.array(y)
time_step = 100
X, y = create_dataset(scaled_data, time_step)
X = X.reshape(X.shape[0], X.shape[1], 1)
5. Building the GRU Model
We will define our GRU model with the following components:
- GRU(units=50): Adds a GRU layer with 50 units (neurons).
- return_sequences=True: Ensures that the GRU layer returns the entire sequence (required for stacking multiple GRU layers).
- Dense(units=1): The output layer which predicts a single value for the next time step.
- Adam(): An adaptive optimizer commonly used in deep learning.
Python
model = Sequential()
model.add(GRU(units=50, return_sequences=True, input_shape=(X.shape[1], 1)))
model.add(GRU(units=50))
model.add(Dense(units=1))
model.compile(optimizer=Adam(learning_rate=0.001), loss='mean_squared_error')
Output:
GRU Model6. Training the Model
model.fit() trains the model on the prepared dataset. The epochs=10 specifies the number of iterations over the entire dataset, and batch_size=32 defines the number of samples per batch.
Python
model.fit(X, y, epochs=10, batch_size=32)
Output:
Training the Model7. Making Predictions
We will be now making predictions using our trained GRU model.
- Input Sequence: The code takes the last 100 temperature values from the dataset (scaled_data[-time_step:]) as an input sequence.
- Reshaping the Input Sequence: The input sequence is reshaped into the shape (1, time_step, 1) because the GRU model expects a 3D input: [samples, time_steps, features]. Here samples=1 because we are making one prediction, time_steps=100 (the length of the input sequence) and features=1 because we are predicting only the temperature value.
- model.predict(): Uses the trained model to predict future values based on the input data.
Python
input_sequence = scaled_data[-time_step:].reshape(1, time_step, 1)
predicted_values = model.predict(input_sequence)
Output:
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 64ms/step
Inverse Transforming the Predictions refers to the process of converting the scaled (normalized) predictions back to their original scale.
- scaler.inverse_transform(): Converts the normalized predictions back to their original scale.
Python
predicted_values = scaler.inverse_transform(predicted_values)
print(f"The predicted temperature for the next day is: {predicted_values[0][0]:.2f}°C")
Output:
The predicted temperature for the next day is: 25.03°C
The output 25.03^\omicron \text{C} is the GRU model's prediction for the next day's temperature based on the past 100 days of data. The model uses historical patterns to forecast future values and converts the prediction back to the original temperature scale.
Similar Reads
Machine Learning Tutorial Machine learning is a branch of Artificial Intelligence that focuses on developing models and algorithms that let computers learn from data without being explicitly programmed for every task. In simple words, ML teaches the systems to think and understand like humans by learning from the data.Do you
5 min read
Introduction to Machine Learning
Python for Machine Learning
Machine Learning with Python TutorialPython language is widely used in Machine Learning because it provides libraries like NumPy, Pandas, Scikit-learn, TensorFlow, and Keras. These libraries offer tools and functions essential for data manipulation, analysis, and building machine learning models. It is well-known for its readability an
5 min read
Pandas TutorialPandas is an open-source software library designed for data manipulation and analysis. It provides data structures like series and DataFrames to easily clean, transform and analyze large datasets and integrates with other Python libraries, such as NumPy and Matplotlib. It offers functions for data t
6 min read
NumPy Tutorial - Python LibraryNumPy (short for Numerical Python ) is one of the most fundamental libraries in Python for scientific computing. It provides support for large, multi-dimensional arrays and matrices along with a collection of mathematical functions to operate on arrays.At its core it introduces the ndarray (n-dimens
3 min read
Scikit Learn TutorialScikit-learn (also known as sklearn) is a widely-used open-source Python library for machine learning. It builds on other scientific libraries like NumPy, SciPy and Matplotlib to provide efficient tools for predictive data analysis and data mining.It offers a consistent and simple interface for a ra
3 min read
ML | Data Preprocessing in PythonData preprocessing is a important step in the data science transforming raw data into a clean structured format for analysis. It involves tasks like handling missing values, normalizing data and encoding variables. Mastering preprocessing in Python ensures reliable insights for accurate predictions
6 min read
EDA - Exploratory Data Analysis in PythonExploratory Data Analysis (EDA) is a important step in data analysis which focuses on understanding patterns, trends and relationships through statistical tools and visualizations. Python offers various libraries like pandas, numPy, matplotlib, seaborn and plotly which enables effective exploration
6 min read
Feature Engineering
Supervised Learning
Unsupervised Learning
Model Evaluation and Tuning
Advance Machine Learning Technique
Machine Learning Practice