Introduction to Neural networks (under graduate course) Lecture 9 of 9

Neural Networks
Dr. Randa Elanwar
Lecture 9

Lecture Content
• Mapping networks:
– Back-propagation neural network
– Self-organizing map
– Counter propagation network
• Spatiotemporal Network
• Stochastic Networks
– Boltzmann machine
• Neurocognition network
2Neural Networks Dr. Randa Elanwar

Mapping networks
• When the problem is non linear and no straight line
could ever separate samples in the feature space we
need multilayer perceptrons (having hidden layer’s’) to
achieve nonlinearity.
• The idea is that we map/transform/translate our data
to another feature space that is linearly separable. Thus
we call them mapping networks.
• We will discuss three types of mapping networks: the
back-propagation neural network, self-organizing map,
counter propagation network.

Mapping networks
• Networks without hidden units are very limited in the input-output
mappings they can model.
– More layers of linear units do not help. Its still linear.
– Fixed output non-linearities are not enough
• We need multiple layers of adaptive non-linear hidden units.
• But how can we train such nets?
– We need an efficient way of adapting all the weights, not just the
last layer. i.e., Learning the weights going into hidden units . This is
hard.
– Why?
– Because: Nobody is telling us directly what hidden units should
do.
– Solution: This can be achieved using ‘Backpropagation’ learning

Learning with hidden layers
• Mathematically, the learning process is an optimization problem. We
initiate the NN system with some parameters (weights) and use
known examples to find out the optimal values of such weights.
• Generally, the solution of an optimization problem is to find the
parameter value that leads to minimum value of an optimization
function.
G(t)
t
In our case, the optimization function that we
need to minimize to get the final weights is the
error function
E = ydes-yact
E = ydes-f(W.X)
To get the minimum value mathematically we
differentiate the error function with respect to
the parameter we need to get we call it W
E



Learning with hidden layers
• We define the “gradient”: w = . . X
• If  is +ve this means that the current values of W makes the
differentiation result +ve which is wrong. We want
differentiation result to be = 0 (minimum point) we must
move in the opposite direction of the gradient (subtract). The
opposite is also true.
• If  is =0 this means that the current values of W makes the
differentiation result = 0 which is right. These weights are the
optimal values (solution) and w should stop the algorithm. The
network now is trained and ready for use.

The back propagation algorithm
• The backpropagation learning algorithm can be divided into two
phases: propagation and weight update.
Phase 1: Propagation
1.Forward propagation of a training
pattern's input through the neural
network in order to generate the
propagation's output activations
(yact).
2.Backward propagation of the
propagation's output activations
through the neural network using
the training pattern's target (ydes) in
order to generate the deltas () of
all output and hidden neurons.
Phase 2: Weight update
For each weight follow the following steps:
1.Multiply its output delta () and input
activation (x) and the learning rate () to
get the gradient of the weight (w).
2.Bring the weight in the opposite direction
of the gradient by subtracting it from the
weight.
- The sign of the gradient of a weight
indicates where the error is increasing, this
is why the weight must be updated in the
opposite direction.
- Repeat phase 1 and 2 until the
performance of the network is satisfactory.

Backpropagation Networks
• They are the nonlinear (mapping) neural networks using the
backpropagation supervised learning technique.
• Modes of learning of nonlinear nets:
• There are three modes of learning to choose from: on-line
(pattern), batch and stochastic.
• In on-line and stochastic learning, each propagation is followed
immediately by a weight update.
• In batch learning, many propagations occur before updating the
weights.
• Batch learning requires more memory capacity, but on-line and
stochastic learning require more updates.

• On-line learning is used for dynamic environments that
provide a continuous stream of new patterns.
• Stochastic learning and batch learning both make use
of a training set of static patterns. Stochastic goes
through the data set in a random order in order to
reduce its chances of getting stuck in local minima.
• Stochastic learning is also much faster than batch
learning since weights are updated immediately after
each propagation. Yet batch learning will yield a much
more stable descent to a local minima since each
update is performed based on all patterns.

• Applications of supervised learning (Backpropagation NN)
include
• Pattern recognition
• Credit approval
• Target marketing
• Medical diagnosis
• Defective parts identification in manufacturing
• Crime zoning
• Treatment effectiveness analysis
• Etc

Self-organizing map
• We can also train networks where there is no teacher. This is called
unsupervised learning. The network learns a prototype based on the
distribution of patterns in the training data. Such networks allow us
to:
– Discover underlying structure of the data
– Encode or compress the data
– Transform the data
• Self-organizing maps (SOMs) are a data visualization technique
invented by Professor Teuvo Kohonen
– Also called Kohonen Networks, Competitive Learning, Winner-Take-All
Learning
– Generally reduces the dimensions of data through the use of self-
organizing neural networks
– Useful for data visualization; humans cannot visualize high dimensional
data so this is often a useful technique to make sense of large data sets

Self-organizing map
• SOM structure:
1. Weights in neuron must represent a class of
pattern. We have a neuron for each class.
2. Inputs pattern presented to all neurons and each
produces an output. Output: measure of the match
between input pattern and pattern stored by
neuron.
3. A competitive learning strategy selects neuron with
largest response.
4. A method of reinforcing the largest response.

Self-organizing map
• Unsupervised classification learning is based on clustering of
input data. No a priori knowledge is about an input’s
membership in a particular class.
• Instead, gradually detected characteristics and a history of
training will be used to assist the network in defining classes
and possible boundaries between them.
• Clustering is understood to be the grouping of similar objects
and separating of dissimilar ones.
• We discuss Kohonen’s network which classifies input vectors
into one of the specified number of m categories, according to
the clusters detected in the training set

Kohonen’s Network
Kohonen network
X
•The Kohonen network is a self-organising
network with the following
characteristics:
1. Neurons are arranged on a 2D grid
2. Inputs are sent to all neurons
3. There are no connections between
neurons
4. For a neuron output (j) is a weighted
sum of multiplication of x and w
vectors, where x is the input, w is the
weights
5. There is no threshold or bias
6. Input values and weights are
normalized

Self-organizing map
Learning in Kohonen networks:
• Initially the weights in each neuron are random
• Input values are sent to all the neurons
• The outputs of each neuron are compared
• The “winner” is the neuron with the largest output value
• Having found the winner, the weights of the winning neuron are
adjusted
• Weights of neurons in a surrounding neighbourhood are also
adjusted
• As training progresses the neighbourhood gets smaller
• Weights are adjusted according to the following formula:

Self-organizing map
• The learning coefficient (alpha) starts with a value of 1 and
gradually reduces to 0
• This has the effect of making big changes to the weights initially,
but no changes at the end
• The weights are adjusted so that they more closely resemble
the input patterns
Applications of unsupervised learning (Kohonen’s NN) include
• Clustering
• Vector quantization
• Data compression
• Feature extraction

Counter propagation network
• The counterpropagation network (CPN) is a fast-learning
combination of unsupervised and supervised learning.
• Although this network uses linear neurons, it can learn nonlinear
functions by means of a hidden layer of competitive units.
• Moreover, the network is able to learn a function and its inverse
at the same time.
• However, to simplify things, we will only consider the
feedforward mechanism of the CPN.

• Training:
1.Randomly select a vector pair (x, y) from the training set.
2.Measure the similarity between the input vector and the
activation of the hidden-layer units.
3.In the hidden (competitive) layer, determine the unit with the
largest activation (the winner). I.e., the neuron whose weight
vector is most similar to the current input vector is the “winner.”
4.Adjust the connection weights inbetween
5.Repeat until each input pattern is consistently associated with
the same competitive unit.

• After the first phase of the training, each hidden-layer neuron is
associated with a subset of input vectors (class of patterns).
• In the second phase of the training, we adjust the weights in the
network’s output layer in such a way that, for any winning hidden-
layer unit, the network’s output is as close as possible to the desired
output for the winning unit’s associated input vectors.
• The idea is that when we later use the network to compute
functions, the output of the winning hidden-layer unit is 1, and the
output of all other hidden-layer units is 0.

Spatiotemporal Networks
•A spatio-temporal neural net differs from other neural networks in two ways:
1. Neurons has recurrent links that have different propagation delays
2. The state of the network depends not only on which nodes are firing, but
also on the relative firing times of nodes. i.e., the significance of a node
varies with time and depends on the firing state of other nodes.
•The use of recurrence and multiple links with variable propagation delays
provides a rich mechanism for feature extraction and pattern recognition:
1. Recurrent links enable nodes to integrate and differentiate inputs. I.e.,
detect features
2. multiple links with variable propagation delays between nodes serve as a
short-term memory.

Spatiotemporal Networks
• Applications:
• Problems such as speech recognition and time series prediction where the
input signal has an explicit temporal aspect.
• Tasks like image recognition do not have an explicit temporal aspect, but
can also be done by converting static patterns into time-varying (spatio-
temporal) signals via scanning the image. This would lead to a number of
significant advantages:
– The recognition system becomes ‘shift invarient’
– The spatio-temporal approach explains the image geometry since the local
spatial relationships in the image are expressed as local temporal variations in
the scanned input.
– Reduction of complexity (from 2D to 1D)
– The scanning approach allows a visual pattern recognition system to deal with
inputs of arbitrary extent (not only static fixed 2D pattern)

Stochastic neural networks
• Stochastic neural networks are a type of artificial neural
networks, which is a tool of artificial intelligence. They are
built by introducing random variations into the network,
either by giving the network's neurons stochastic transfer
functions, or by giving them stochastic weights. This makes
them useful tools for optimization problems, since the
random fluctuations help it escape from local minima.
• Stochastic neural networks that are built by using stochastic
transfer functions are often called Boltzmann machines.
• Stochastic neural networks have found applications in risk
management, oncology, bioinformatics, and other similar
fields

Stochastic Networks: Boltzmann machine
• The neurons are stochastic: at any time there is a probability
attached to whether the neurons fires.
• Used for solving constrained optimization problems.
• Typical Boltzmann Machine:
– Weights are fixed to represent the constrains of the problem and the
function to be optimized.
– The net seeks the solution by changing the activations of the units (0 or
1) based on a probability distribution and the effect that the change
would have on the energy function or consensus function for the net.
• May use either supervised or unsupervised learning.
• Learning in Boltzmann Machine is accomplished by using a
Simulated Annealing technique which has stochastic nature. This is
used to reduce the probability of the net becoming trapped in a
local minimum which is not a global minimum.

• Learning characteristics:
– Each neuron fires with bipolar values.
– All connections are symmetric.
– In activation passing, the next neuron whose state we
wish to update is selected randomly.
– There are no self-feedback (connections from a neuron
to itself)

• There are three phases in operation of the network:
– The clamped phase in which the input and output of visible
neurons are held fixed, while the hidden neurons are allowed
to vary.
– The free running phase in which only the inputs are held fixed
and other neurons are allowed to vary.
– The learning phase.
• These phases iterate till learning has created a
Boltzmann Machine which can be said to have learned
the input patterns and will converge to the learned
patterns when noisy or incomplete pattern is
presented.

• For unsupervised learning Generally the initial weights of the net
are randomly set to values in a small range e.g. -0.5 to +0.5.
• Then an input pattern is presented to the net and clamped to the
visible neurons.
• choose a hidden neurons at random and flip its state from sj to –sj
according to certain probability distribution
• The activation passing can continue till the net hidden neurons
reach equilibrium.
• During free running phase, after presentation of the input patterns
all neurons can update their states.
• The learning phase depends whether weight are changed depend
on the difference between the "real" distribution (neuron state) in
clamped phase and the one which will be produced (eventually) by
the machine in free mode.

• For supervised learning the set of visible neurons is split into
input and output neurons, and the machine will be used to
associate an input pattern with an output pattern.
• During the clamped phase, the input and output patterns are
clamped to the appropriate units.
• The hidden neurons’ activations can settle at the various
values.
• During free running phase, only the input neurons are
clamped – both the output neurons and the hidden neurons
can pass activation round till the activations in the network
settles.
• Learning rule here is the same as before but must be
modulated (multiplied) by the probability of the input’s
patterns

Neurocognition network
• Neurocognitive networks are large-scale systems of
distributed and interconnected neuronal populations in
the central nervous system organized to perform
cognitive functions.
• many computer scientists try to simulate human
cognition with computers. This line of research can be
roughly split into two types: research seeking to create
machines as adept as humans (or more so), and
research attempting to figure out the computational
basis of human cognition — that is, how the brain
actually carries out its computations. This latter branch
of research can be called computational modeling
(while the former is often called artificial intelligence or
AI).

Introduction to Neural networks (under graduate course) Lecture 9 of 9

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Introduction to Neural networks (under graduate course) Lecture 9 of 9 (20)

More from Randa Elanwar (20)

Recently uploaded (20)

Introduction to Neural networks (under graduate course) Lecture 9 of 9