Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | Simplilearn

What’s in it for you?
RNN
What is a Neural Network?
Why Recurrent Neural Network?
Popular Neural Networks
How does a RNN work?
Vanishing and Exploding Gradient Problem
Long Short Term Memory (LSTM)
Use case implementation of LSTM
What is a Recurrent Neural Network?

Introduction to RNN
Do you know how Google’s autocomplete feature predicts the rest of the words a user is typing?
h
y
x
A
B
C
h
y
x
A
B
C
h
y
x
A
B
C
Fed to a Recurrent Neural
Network
What is the best food to
eat in Las
Vegas
Google search
Autocompletes the
search

Introduction to RNN
h
y
x
A
B
C
h
y
x
A
B
C
h
y
x
A
B
C
Network
eat in Las
Vegas
Google search
Autocompletes the
search
Collection of large volumes of
most frequently occurring
consecutive words

Introduction to RNN
consecutive words
h
y
x
A
B
C
h
y
x
A
B
C
h
y
x
A
B
C
Network
eat in Las
Vegas
Google search
Autocompletes the
search
Analyses the data by finding the
sequence of words occurring
frequently and builds a model to
predict the next word in the sentence

Introduction to RNN
consecutive words
h
y
x
A
B
C
h
y
x
A
B
C
h
y
x
A
B
C
Network
eat in Las
Vegas
Google search
Autocompletes the
search

Neural Networks used in Deep Learning, consists of different layers connected to each other and work on
the structure and functions of a human brain. It learns from huge volumes of data and uses complex
algorithms to train a neural net.
Input Layer Hidden Layers
Output Layer
German Shepherd
Labrador
Image pixels of 2 different breeds
of dog
Identifies the dogs
breed

Neural Networks used in Deep Learning, consists of different layers connected to each other and work on
the structure and functions of a human brain. It learns from huge volumes of data and uses complex
algorithms to train a neural net.
Hidden Layers
Output Layer
Input Layer
German Shepherd
Labrador
Image pixels of 2 different breeds
of dog
Identifies the dogs
breed
Such networks do not
require memorizing the
past output

Popular Neural Networks
Feed Forward Neural
Network
Used in general Regression and Classification
problems
Convolution Neural
Network
Used for Image Recognition
Deep Neural Network
Used for Acoustic Modeling
Deep Belief Network Used for Cancer Detection
Recurrent Neural
Network
Used for Speech Recognition

Feed Forward Neural Network
Input Layers Hidden Layers Output Layer
x y^
input Predicted
output
Simplified presentation
In a Feed-Forward Network , information flows only in forward direction, from the input nodes, through the hidden layers (if
any) and to the output nodes. There are no cycles or loops in the network.
Input Layer Hidden Layers
Output Layer
• Decisions are based on current input
• No memory about the past
• No future scope

Feed Forward Neural
Network
01
02
03
cannot handle
sequential data
cannot memorize
previous inputs
considers only the
current input
Issues in Feed Forward Neural Network

h
y
x
A
B
C
Recurrent Neural
Network
01
02
03
can handle
sequential data
can memorize previous
inputs due to its internal
memory
considers the current input and
also the previously received
inputs
Solution to Feed Forward Neural Network

Applications of RNN
Image captioning
RNN is used to caption an image by analyzing the activities present in it
“A Dog catching a ball in mid air”

Applications of RNN
Time series prediction
Any time series problem like predicting the prices of stocks in a particular month can
be solved using RNN

Applications of RNN
Natural Language Processing
Text mining and Sentiment analysis can be carried out using RNN for Natural
Language Processing
When it rains, look for rainbows. When
it’s dark, look for stars.
Positive Sentiment

Applications of RNN
Machine Translation
Given an input in one language, RNN can be used to translate the input into
different languages as output
Here the person is speaking in
English and it is getting translated into
Chinese, Italian, French, German and
Spanish languages

What is a Recurrent Neural Network?
Recurrent Neural Network works on the principle of saving the output of a layer and feeding
this back to the input in order to predict the output of the layer.
h
y
x
A
B
C
Recurrent Neural Network
x
x
x
h
h
h
h
h
h
h
h
y
y
Input Layer Hidden Layers Output Layer

www.simplilearn.com
How does a RNN look like?
A, B and C are the parameters

How does a RNN look like?
www.simplilearn.com

How does a RNN work?
= new state
fc = function with parameter c
= old state
= input vector at time step t
= f (h(t-1) ,c )
C
A
B
C
A
B
C
A
B
y(t-1) y(t+1)y(t)
x(t-1) x(t+1)
h(t-1) h(t+1)
h(t)
h(t)
x(t)
x(t)
h(t)
h(t-1)
x(t)

Types of Recurrent Neural Network
one to one
Single output
Single input
one to one network is known as the Vanilla Neural
Network. Used for regular machine learning
problems

one to many
Multiple outputs
Single input
one to many network generates sequence of
outputs. Example: Image captioning

many to one
Single output
Multiple inputs
many to one network takes in a sequence of
inputs. Example: Sentiment analysis where a
given sentence can be classified as expressing
positive or negative sentiments

many to many
Multiple outputs
Multiple inputs
many to many network takes in a sequence of
inputs and generates a sequence of outputs.
Example: Machine Translation

Vanishing Gradient Problem
While training a RNN, your slope can be either too
small or very large and this makes training difficult.
When the slope is too small, the problem is know as
Vanishing gradient.
Change in Y
Change in X
Y
X
s0
E0
x0
s1
E1
s2
E2
x2x1
s3
E3
x3
Loss of Information through time
Backpropagate the error

Exploding Gradient Problem
When the slope tends to grow exponentially
instead of decaying, this problem is called
Exploding gradient.
Y
X
• Long training time
• Poor performance
• Bad accuracys0
E0
x0
s1
E1
s2
E2
x2x1
s3
E3
x3
Loss of Information through time
Backpropagate the error
Issues in Gradient Problem

Explaining Gradient Problem
s1
y1
x1
s2
y2
s3
y3
x3x2
st
yt
xt
…….s0
Consider the following 2 examples to understand what should be the next word in the sequence:
The person who took my bike and…………………………………………., a thief.
The students who got into Engineering with …………………………………………., from Asia.
____
____
was
were

s1
y1
x1
s2
y2
s3
y3
x3x2
st
yt
xt
…….s0
Consider the following 2 examples:
The person who took my bike and…………………………………………., a thief.
The students who got into Engineering with …………………………………………., were from Asia .
____was
In order to understand what would be the next word in the sequence, the RNN must memorize the
previous context whether the subject was singular noun or plural noun

Consider the following 2 examples:
The person who took my bike and…………………………………………., was a thief.
The students who got into Engineering with …………………………………………., from Asia .
In order to understand what would be the next word in the sequence, the RNN must memorize the
previous context whether the subject was singular noun or plural noun
____were
It might be sometimes difficult for
the error to backpropagate to the
beginning of the sequence to
predict what should be the output
s1
y1
x1
s2
y2
s3
y3
x3x2
st
yt
xt
…….s0

Solution to Gradient Problem
Identity
Initialization
Truncated
Backpropagation
Gradient
Clipping
2
1 3
Exploding Gradient
Weight
initialization
Choosing the right Activation
Function
Long Short-Term
Memory Networks
(LSTMs)
2
1 3
Vanishing Gradient

Long-Term Dependencies
Here we do not need any further context. It’s pretty
clear the last word is going to be “sky”.
Suppose we try to predict the
last word in the text
“ The clouds are in the ___”sky

Long-Term Dependencies
Suppose we try to predict the
last word in the text
“I have been staying in Spain for the last 10 years… I
can speak fluent ______.”
• Here we need the context of Spain to predict the last word in
the text.
• It’s possible that the gap between the relevant information
and the point where it is needed to become very large.
• LSTMs help us solve this problem.
Spanish
The word we predict will depend on the
previous few words in context

Long Short-Term Memory Networks
LSTMs are special kind of Recurrent Neural Networks, capable of learning long-term dependencies. Remembering
information for long periods of time is their default behavior.
tanh
ht-1
xt-1
tanh
ht
xt
tanh
ht+1
Xt+1
A AA
All recurrent neural networks have the form of a chain of repeating modules of neural network. In standard RNNs, this repeating
module will have a very simple structure, such as a single tanh layer.

LSTMs are special kind of Recurrent Neural Networks, capable of learning long-term dependencies. Remembering
information for long periods of time is their default behavior.
+x
tanh
x
tanh
x
+x
tanh
x
tanh
x
+x
tanh
x
tanh
x
ht-1 ht Ht+1
xt-1
xt Xt+1
A A
LSTMs also have a chain like structure, but the repeating module has a different structure. Instead of having a single neural
network layer, there are four interacting layers communicating in a very special way.

3 step process of LSTMs
Step 1 Step 2 Step 3
Forget irrelevant parts
of previous state
Selectively update cell
state values
Output certain parts of
cell state
Ct-1 Ct

ht
Ct
Working of LSTMs
First step in the LSTM is to decide which information to be omitted in from
the cell in that particular time step. It is decided by the sigmoid function. It
looks at the previous state (ht-1 ) and the current input xt and computes the
function.
+x
tanh
x
tanh
x
ht
xt
Ct-1
ht-1
ft
ft = forget gate
Decides which information to delete
that is not important from previous
time step
Decides how much of the past it should rememberStep-1

ht
Ct
Working of LSTMs
First step in the LSTM is to decide which information to be omitted in from
the cell in that particular time step. It is decided by the sigmoid function. It
looks at the previous state (ht-1 ) and the current input xt and computes the
function.
+x
tanh
x
tanh
x
ht
xt
Ct-1
ht-1
ft
ft = forget gate
Decides which information to delete
that is not important from previous
time step
Consider an LSTM is fed with the following inputs from previous and present time step :
Alice is good in Physics. John on the other hand is good in Chemistry.
John plays football well. He told me yesterday over phone that he had served as
the captain of his college football team.
Previous Output
Current Input
ht-1
xt
Forget gate realizes there might be a change in context after
encountering the first full stop.
Compares with the current input sentence at xt
The next sentence talks about John, so the information on
Alice is deleted.
The position of subject is vacated and is assigned to John

ht
Ct
Working of LSTMs
In the second layer, there are 2 parts. One is the sigmoid function and the
other is the tanh. In the sigmoid function, it decides which values to let
through(0 or 1). tanh function gives the weightage to the values which are
passed deciding their level of importance(-1 to 1).
+x
tanh
x
tanh
x
ht
xt
Ct-1
ht-1
ft
Ct
it ~
Decides how much should this unit add to the current stateStep-2
it = input gate
Determines which information to let
through based on its significance in the
current time step

ht
Ct
Working of LSTMs
+x
tanh
x
tanh
x
ht
xt
Ct-1
ht-1
ft
Ct
it ~
Consider the current input at xt
the captain of his college football team. Current Input
Input gate analyses
the important
information

ht
Ct
Working of LSTMs
+x
tanh
x
tanh
x
ht
xt
Ct-1
ht-1
ft
Ct
it ~
John plays football
and he was the
captain of his college
team is important

ht
Ct
Working of LSTMs
+x
tanh
x
tanh
x
ht
xt
Ct-1
ht-1
ft
Ct
it ~
He told me over
phone yesterday is
less important, hence
it is forgotten

ht
Ct
Working of LSTMs
+x
tanh
x
tanh
x
ht
xt
Ct-1
ht-1
ft
Ct
it ~
This process of
adding some new
information can be
done via
the input gate

Working of LSTMs
The third step is to decide what will be our output. First, we run a sigmoid
layer which decides what parts of the cell state make it to the output.
Then, we put the cell state through tanh to push the values to be between
-1 and 1 and multiply it by the output of the sigmoid gate.
+x
tanh
x
tanh
x
ht
xt
Ct-1
ht-1
ft
Ct
it ~
ht
ot
Ct
Decides what part of the current cell state makes it to the outputStep-3
ot = output gate
Allows the passed in information to
impact the output in the current time step

Working of LSTMs
+x
tanh
x
tanh
x
ht
xt
Ct-1
ht-1
ft
Ct
it ~
ht
ot
Ct
Let’s consider this example to predicting the next word in the sentence:
John played tremendously well against the opponent and won for his team. For his
contributions, brave ____ was awarded player of the match.
There could be a lot
of choices for the
empty space

Working of LSTMs
+x
tanh
x
tanh
x
ht
xt
Ct-1
ht-1
ft
Ct
it ~
ht
ot
Ct
Current input brave is
an adjective

Working of LSTMs
+x
tanh
x
tanh
x
ht
xt
Ct-1
ht-1
ft
Ct
it ~
ht
ot
Ct
Adjectives describe a
noun

Working of LSTMs
+x
tanh
x
tanh
x
ht
xt
Ct-1
ht-1
ft
Ct
it ~
ht
ot
Ct
contributions, brave ____ was awarded player of the match.John
“John” could be the
best output after
brave

Let’s predict the prices of stocks using LSTM network
Based on the stock price
data between 2012-
2016
Predict the stock prices of 2017

1. Import the Libraries
2. Import the training dataset
3. Feature Scaling

4. Create a data structure with 60 timesteps and 1 output
5. Import keras libraries and packages

6. Initialize the RNN
7. Adding the LSTM layers and some Dropout regularization

8. Adding the output layer
9. Compile the RNN
10. Fit the RNN to the training set

11. Load the stock price test data for 2017
12. Get the predicted stock price of 2017

13. Visualize the results of predicted and real stock price

Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | Simplilearn

Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | Simplilearn

More Related Content

What's hot (20)

Similar to Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | Simplilearn (20)

More from Simplilearn (20)

Recently uploaded (20)

Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | Simplilearn

Editor's Notes