SlideShare a Scribd company logo
IJSRD - International Journal for Scientific Research & Development| Vol. 1, Issue 6, 2013 | ISSN (online): 2321-0613
All rights reserved by www.ijsrd.com 1300
Abstract— The standard back-propagation algorithm is
one of the most widely used algorithm for training feed-
forward neural networks. One major drawback of this
algorithm is it might fall into local minima and slow
convergence rate. Natural gradient descent is principal
method for solving nonlinear function is presented and is
combined with the modified back-propagation algorithm
yielding a new fast training multilayer algorithm. This paper
describes new approach to natural gradient learning in
which the number of parameters necessary is much smaller
than the natural gradient algorithm. This new method
exploits the algebraic structure of the parameter space to
reduce the space and time complexity of algorithm and
improve its performance.
I. INTRODUCTION
The back-propagation (BP) training algorithm is a
supervised learning method for multi-layered feed-forward
neural networks. It is essentially a gradient descent local
optimization technique which involves backward error
correction of network weights. Despite the general success
of back-propagation method in the learning process, several
major deficiencies are still needed to be solved. The
convergence rate of back-propagation is very low and hence
it becomes unsuitable for large problems. Furthermore, the
convergence behavior of the back-propagation algorithm
depends on the choice of initial values of connection
weights and other parameters used in the algorithm such as
the learning rate and the momentum term.
Amari has developed natural gradient learning for multilayer
perceptrons [18], which uses Quasi-Newton method [6]
instead of the steepest descent direction. The Fisher
information matrix is a technique used to estimate hidden
parameters in terms observed random variables. It fits very
nicely into Quasi-Newton optimization framework.
This paper suggests that a simple modification to the initial
search direction, in the above algorithm i.e. changing the
gradient of error with respect to weights, to improve the
training efficiency. It was discovered that if the gradient
based search direction is locally modified by a gain value
used in the activation function of the corresponding node,
significant improvements in the convergence rates can be
achieved [24].
II. BACKPROPAGATION LEARNING ALGORITHM
An artificial neural network consist of input vector x and
gives output y. when network has m hidden units, the output
of hidden layer is φ( wα · x), α = 1,. . .,m where wα is an n
dimensional connection weight vector from input to the α-th
hidden unit, and φ is a sigmoidal output function. Let vα be a
connection weight from the α-th hidden unit to the linear
output unit and let ζ be a bias. Then the output of the neural
network is written as
∑ ( ) (1)
Any perceptron is specified by the parameter { w1, . . . ,wα; v}.
We summarize them into a single m(n+1) dimensional
vector θ. We call the space S consisting of all multilayer
neurons. The parameter θ plays the role of a coordinate
system of S.
The vector θ of dimension m(n+1) can represent a single
neuron.
The output of a neuron is a random variable depends on
input x. Hence the input output relation of the neuron having
parameter θ is described by the conditional probability of
output y on input x,
( | )
√
[ * ( )+ ] (2)
Where ( ) ∑ ( ) (3)
is the mean value of y given input x. Its logarithm is
( | ) * ( )+ (√ ) (4)
This can be regarded as the negative of the square of an
error when y is a target value and f(x,θ) is the output of the
network.
Hence, the maximization of the likelihood is equivalent to
the minimization of the square error
( ) * ( )+ (5)
The conventional on-line learning method modifies the
current parameter θt by using the gradient ( ) of the loss
function such that
θt+1 = θt - ηt (xt ,y*t ;θt ) (6)
here ηt is a learning rate, and
( ) { ( )} (7)
is the gradient of the loss function l* and y*t is the desired
output signal given from teacher.
The steepest descent direction of the loss function l*(θ) in a
Riemannian space is given [18] by
( ) ( ) ( ) (8)
Where G-1
is the inverse of a matrix G = (gij) called the
Riemannian metric tensor. This gradient is called natural
gradient of the loss function l*(θ) in the Riemannian space.
III. NATURAL GRADIENT LEARNING ALGORITHM
In the multilayer neural network, the Riemannian metric
tensor G(θ)= (gij(θ)) is given by the Fisher information
matrix[18],
gij(θ)= E[
( | ) ( | )
] (9)
where E denotes expectation with respect to the input output
pair (x,y) given in Eq.(2).
The natural gradient learning algorithm updates the current
θt by
Improving Performance of Back propagation Learning Algorithm
Harikrishna B Jethva1
Dr. V. M. Chavda2
1
Ph. D. Scholar,Department of Computer Science and Engineering
1
Bhagwant University, Sikar Road Ajmer, Rajasthan
2
SVICS, Kadi, Gujarat
Improving Performance of Back propagation Learning Algorithm
(IJSRD/Vol. 1/Issue 6/2013/007)
All rights reserved by www.ijsrd.com 1301
θt+1 = θt - 𝜂t ( ) (10)
IV. ADAPTIVE IMPLEMENTATION OF NATURAL GRADIENT
LEARNING
The Fisher information G(θ) depends on the probability
distribution of x which is usually unknown. Hence, it is
difficult to obtain G(θ). Moreover, its inversion is costly.
Here, we show an adaptive method of directly estimating G-
1
(θ) [5].
Since the Fisher information of Eq. (9) can be rewritten, by
using Eq. (4), as
Gt = E[
( | ) ( | )
]
* ( )+ +
E[
( ) ( )
]
[
( ) ( )
] (11)
where ‗denotes transposition of a vector or matrix.
We have following recursive estimation of G-1
[23]
( ) – ( ) (12)
Where εt is a small learning rate, ( ) and
ft = f(xt,θt). Together with
θt+1 = θt – ηt l(xt ,yt ;θt ) (13)
this gives the adaptive method of natural gradient learning.
This is different from the Newton method, but can be
regarded as an adaptive version of Gauss Newton method.
Moreover, information geometry suggests the important
geometric properties of hierarchical statistical model in
general.
V. EXPERIMENTAL RESULTS
We conducted an experiment for comparing convergence
speeds between conventional Natural Gradient Learning
(NGL) algorithm, and the Adaptive Natural Gradient
learning (ANGL) algorithms.
We take XOR problem because it is not linearly separable
problem. We use NN architecture with two hidden units and
hyperbolic tangent transfer function between both the
hidden units and output units.
The inputs and outputs are:
X0 = [ ] X1 = [ ] X2 = [ ] X3 = [ ]
Y0 = -1 Y1 = 1 Y2 = 1 Y3 = -1
Respectively.
Thus the error for each pattern is
Єn = yn – tanh(W2tanh(W1xn+b1)+b2)2
(14)
There are two hidden units and each layer has bias. Hence
W1 is a 2-by-2 matrix and W2 is a 1-by-2 matrix.
The performance compared with sum squared error metric.
Neural network training algorithms are very sensitive to the
learning rate. So we use step size η‖ ‖ for NGL algorithm.
An interesting point of comparison is the relative step size
of this algorithm. For ANGL, the effective learning rate is
the product of the learning rate η and the largest eigenvalue
of the G-1
.
Figure 1, 2 shows the sum squared error of each learning
epoch for NGL and ANGL. Table 1 show the parameters
used in the three learning algorithms and some of the result
of the experiment.
Parameter NGL ANGL
Hidden units
Learning rate
Adaption rate
Learning Epoch
When SSE < 0.02
Final SSE
Final Learning Rate
2
0.25
N.A.
10000
0.0817
1e-4
2
0.25
0.1
320
3.55e-4
0.144
Table 1: The result of XOR Experiment And Parameter
Used
VI. CONCLUSION
Natural Gradient Descent learning works well for many
problems. Amari[18] had developed an algorithm to avoid
local minima by following the curvature of a manifold in the
parameter space of neuron. By using recursive estimate of
the inverse of the Fisher information matrix of the
parameters, the algorithm is able to accelerate learning in
the direction of descent.
The experiment have shown that the performance of natural
gradient algorithm improved by using adaptive gradient
method of learning.
There are many areas of research in which this research can
be applied, like speech recognition etc.
REFERENCES
[1] D. E. Rumelhart and J. L. McClelland, Parallel
Distributed Processing. Cambridge, MA: MIT Press,
1986.
[2] D. O. Hebb, The Organization of Behavior. New
York: John Wiley & Sons, 1949.
[3] D. J. C. MacKay, Information Theory, Inference, and
Learning Algorithms. New York: Cambridge
University Press, 2003.
[4] F. Rosenblatt, Principles of Neurodynamics:
Perceptrons and the Theory of Brain Mechanisms.
Washington DC: Spartan Books, 1962.
[5] H. Park, S. Amari, and K. Fukumizu, ―Adaptive
natural gradient learning algorithms for various
stochastic models,‖ Neural Networks, vol. 13, no. 7,
pp. 755–764, 2000.
[6] James A. Freeman David M. Skapura, Neural
Networks Algorithms, Applications, and
Programming Techniques, Addison-Wesley
Publishing Company (1991)
[7] Jinwook Go, Gunhee Han, Hagbae Kim
Multigradient: A New Neural Network Learning
Algorithm for Pattern Classification IEEE
TRANSACTIONS ON GEOSCIENCE AND
REMOTE SENSING, VOL. 39, NO. 5, MAY 2001
[8] Kenji Fukumizu, Shun-ichi Amari Local Minima and
Plateaus in Hierarchical Structures of Multilayer
Perceptrons Brain Science Institute The Institute of
Physical and Chemical Research (RIKEN) E-mail:
ffuku,amarig@brain.riken.go.jp Oct 22, 1999
[9] Kavita Burse, Manish Manoria, Vishnu P. S. Kirar
Improved Back Propagation Algorithm to Avoid
Local Minima in Multiplicative Neuron Model World
Improving Performance of Back propagation Learning Algorithm
(IJSRD/Vol. 1/Issue 6/2013/007)
All rights reserved by www.ijsrd.com 1302
Academy of Science, Engineering and Technology 72
2010
[10] M. Abramowitz and I. A. Stegun, Eds., Handbook of
Mathematical Functions with Formulas, Graphs, and
Mathematical Tables. Washington, DC: US
Government Printing Office, 1972.
[11] M. Biehl and H. Schwarze, ―Learning by online
gradient descent,‖ Journal of Physics, vol. A, no. 28,
pp. 643–656, 1995.
[12] N. Murata, ―Astatistical study of on-line learning,‖ in
On-line Learning in Neural Networks, D. Saad, Ed.,
pp. 63–92. New York: Cambridge University Press,
1999.
[13] N. M. Nawi, M. R. Ransing, and R. S. RansingAn
Improved Learning Algorithm based on the
Conjugate Gradient Method for Back Propagation
Neural NetworksInternational Journal of Applied
Science, Engineering and Technology
www.waset.org Spring 2006
[14] R. Rojas, Neural Networks, ch. 7. New York:
Springer-Verlag, 1996.
[15] RIKEN Brain Science Institute(RIKEN BSI) Japan
http://www.brain.riken.jp/
[16] R. A. Fisher, ―On the mathematical foundations of
theoretical statistics,‖ Philosophical Transactions of
the Royal Society of London, vol. 222, pp. 309–68,
1922.
[17] S. Amari, ―Neural learning in structured parameter
spaces — natural riemannian gradient,‖ in Advances
in Neural Information Processing Systems, M. C.
Mozer, M. I. Jordan, and T. Petsche, Eds., vol. 9, p.
127. Cambridge, MA: The MIT Press, 1997.
[18] S. Amari, ―Natural gradient works efficiently in
learning,‖ Neural Computation, vol. 10, no. 2, pp.
251–276, 1998.
[19] S. Amari, H. Park, and T. Ozeki, Geometrical
singularities in the neuromanifold of multilayer
perceptrons, no. 14. Cambridge, MA: MIT Press,
2002.
[20] S. Amari and H. Nagaoka, Methods of Information
Geometry, Translations of Mathematical
Monographs, vol. 191. New York: Oxford University
Press, 2000.
[21] Simon Haykin Neural Networks A comprehension
foundation Pearson education seventh edition (2007)
[22] T. Heskes and B. Kappen, ―On-line learning
processes in artificial neural networks,‖ in
Mathematical Foundations of Neural Networks, J.
Taylor, Ed., pp. 199–233. Amsterdam, Netherlands:
Elsevier, 1993.
[23] Todd K. Moon and Wynn Stirling Mathematical
Methods and Algorithms for Signal Processing,
Prentice Hall, 1999
[24] Weixing, Xugang, Zheng TANG Avoiding the Local
Minima Problem in Backpropagation Algorithm with
Modified Error Function IEICE TRANS.
FUNDAMENTALS, VOL.E88–A, NO.12
DECEMBER 2005
Fig. 1: The Sum Squared Error of NGL
.
Fig. 2: The Sum Squared Error of ANGL.

More Related Content

What's hot (20)

PDF
An Automatic Medical Image Segmentation using Teaching Learning Based Optimiz...
idescitation
 
PDF
Artificial bee colony with fcm for data clustering
Alie Banyuripan
 
PDF
K010218188
IOSR Journals
 
PDF
Parallel processing technique for high speed image segmentation using color
IAEME Publication
 
PDF
OTSU Thresholding Method for Flower Image Segmentation
ijceronline
 
PPTX
Fuzzy logic member functions
Dr. C.V. Suresh Babu
 
PDF
Expert system design for elastic scattering neutrons optical model using bpnn
ijcsa
 
PDF
Performance Evaluation of Object Tracking Technique Based on Position Vectors
CSCJournals
 
PDF
A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...
IJECEIAES
 
PPTX
Neural networks
Dr. C.V. Suresh Babu
 
PDF
Colour Image Segmentation Using Soft Rough Fuzzy-C-Means and Multi Class SVM
ijcisjournal
 
PDF
Matching networks for one shot learning
Kazuki Fujikawa
 
PDF
Fuzzy entropy based optimal
ijsc
 
PPTX
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Seongwon Hwang
 
PDF
Kernels in convolution
Revanth Kumar
 
PDF
Learning to Reconstruct
Jonas Adler
 
PDF
neural networksNnf
Sandilya Sridhara
 
PDF
Bistablecamnets
martindudziak
 
PDF
Feed forward neural network for sine
ijcsa
 
PDF
Lesson 38
Avijit Kumar
 
An Automatic Medical Image Segmentation using Teaching Learning Based Optimiz...
idescitation
 
Artificial bee colony with fcm for data clustering
Alie Banyuripan
 
K010218188
IOSR Journals
 
Parallel processing technique for high speed image segmentation using color
IAEME Publication
 
OTSU Thresholding Method for Flower Image Segmentation
ijceronline
 
Fuzzy logic member functions
Dr. C.V. Suresh Babu
 
Expert system design for elastic scattering neutrons optical model using bpnn
ijcsa
 
Performance Evaluation of Object Tracking Technique Based on Position Vectors
CSCJournals
 
A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...
IJECEIAES
 
Neural networks
Dr. C.V. Suresh Babu
 
Colour Image Segmentation Using Soft Rough Fuzzy-C-Means and Multi Class SVM
ijcisjournal
 
Matching networks for one shot learning
Kazuki Fujikawa
 
Fuzzy entropy based optimal
ijsc
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Seongwon Hwang
 
Kernels in convolution
Revanth Kumar
 
Learning to Reconstruct
Jonas Adler
 
neural networksNnf
Sandilya Sridhara
 
Bistablecamnets
martindudziak
 
Feed forward neural network for sine
ijcsa
 
Lesson 38
Avijit Kumar
 

Viewers also liked (17)

PPTX
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Randa Elanwar
 
PPTX
Introduction to Neural networks (under graduate course) Lecture 1 of 9
Randa Elanwar
 
PDF
The Back Propagation Learning Algorithm
ESCOM
 
PDF
AI Lesson 39
Assistant Professor
 
PDF
Setting Artificial Neural Networks parameters
Madhumita Tamhane
 
PPTX
2.3.1 properties of functions
Northside ISD
 
PPTX
Introduction to Neural networks (under graduate course) Lecture 6 of 9
Randa Elanwar
 
PPTX
hopfield neural network
Abhishikha Sinha
 
PPTX
Back propagation network
HIRA Zaidi
 
PDF
Artificial Neural Networks Lect3: Neural Network Learning rules
Mohammed Bennamoun
 
PPTX
Hopfield Networks
Kanchana Rani G
 
PPT
backpropagation in neural networks
Akash Goel
 
PPTX
HOPFIELD NETWORK
ankita pandey
 
PPT
Back propagation
Nagarajan
 
PPTX
Neurophysiological and evolutionary
SnowPea Guh
 
PPTX
Neural network & its applications
Ahmed_hashmi
 
PPT
Hebbian Learning
ESCOM
 
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Randa Elanwar
 
Introduction to Neural networks (under graduate course) Lecture 1 of 9
Randa Elanwar
 
The Back Propagation Learning Algorithm
ESCOM
 
AI Lesson 39
Assistant Professor
 
Setting Artificial Neural Networks parameters
Madhumita Tamhane
 
2.3.1 properties of functions
Northside ISD
 
Introduction to Neural networks (under graduate course) Lecture 6 of 9
Randa Elanwar
 
hopfield neural network
Abhishikha Sinha
 
Back propagation network
HIRA Zaidi
 
Artificial Neural Networks Lect3: Neural Network Learning rules
Mohammed Bennamoun
 
Hopfield Networks
Kanchana Rani G
 
backpropagation in neural networks
Akash Goel
 
HOPFIELD NETWORK
ankita pandey
 
Back propagation
Nagarajan
 
Neurophysiological and evolutionary
SnowPea Guh
 
Neural network & its applications
Ahmed_hashmi
 
Hebbian Learning
ESCOM
 
Ad

Similar to Improving Performance of Back propagation Learning Algorithm (20)

PPTX
Feed forward back propogation algorithm .pptx
neelamsanjeevkumar
 
PDF
Adaptive modified backpropagation algorithm based on differential errors
IJCSEA Journal
 
PPT
Back_propagation_algorithm.Back_propagation_algorithm.Back_propagation_algorithm
sureshkumarece1
 
PDF
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
Cemal Ardil
 
PDF
International Refereed Journal of Engineering and Science (IRJES)
irjes
 
PDF
Neural Network Based Individual Classification System
IRJET Journal
 
PDF
Design of airfoil using backpropagation training with mixed approach
Editor Jacotech
 
PDF
Design of airfoil using backpropagation training with mixed approach
Editor Jacotech
 
PDF
Levenberg marquardt-algorithm-for-karachi-stock-exchange-share-rates-forecast...
Cemal Ardil
 
PDF
1-s2.0-S092523121401087X-main
Praveen Jesudhas
 
PPTX
CST413 KTU S7 CSE Machine Learning Neural Networks and Support Vector Machine...
resming1
 
PDF
Analysis_molf
Praveen Jesudhas
 
DOCX
Backpropagation
ariffast
 
PPTX
Unit ii supervised ii
Indira Priyadarsini
 
PDF
Neural network based numerical digits recognization using nnt in matlab
ijcses
 
PPTX
ML_Unit_2_Part_A
Srimatre K
 
PDF
Classification by back propagation, multi layered feed forward neural network...
bihira aggrey
 
PPT
nural network ER. Abhishek k. upadhyay
abhishek upadhyay
 
PDF
Machine Learning 1
cairo university
 
PPTX
back propagation1_presenation_lab 6.pptx
someyamohsen2
 
Feed forward back propogation algorithm .pptx
neelamsanjeevkumar
 
Adaptive modified backpropagation algorithm based on differential errors
IJCSEA Journal
 
Back_propagation_algorithm.Back_propagation_algorithm.Back_propagation_algorithm
sureshkumarece1
 
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
Cemal Ardil
 
International Refereed Journal of Engineering and Science (IRJES)
irjes
 
Neural Network Based Individual Classification System
IRJET Journal
 
Design of airfoil using backpropagation training with mixed approach
Editor Jacotech
 
Design of airfoil using backpropagation training with mixed approach
Editor Jacotech
 
Levenberg marquardt-algorithm-for-karachi-stock-exchange-share-rates-forecast...
Cemal Ardil
 
1-s2.0-S092523121401087X-main
Praveen Jesudhas
 
CST413 KTU S7 CSE Machine Learning Neural Networks and Support Vector Machine...
resming1
 
Analysis_molf
Praveen Jesudhas
 
Backpropagation
ariffast
 
Unit ii supervised ii
Indira Priyadarsini
 
Neural network based numerical digits recognization using nnt in matlab
ijcses
 
ML_Unit_2_Part_A
Srimatre K
 
Classification by back propagation, multi layered feed forward neural network...
bihira aggrey
 
nural network ER. Abhishek k. upadhyay
abhishek upadhyay
 
Machine Learning 1
cairo university
 
back propagation1_presenation_lab 6.pptx
someyamohsen2
 
Ad

More from ijsrd.com (20)

PDF
IoT Enabled Smart Grid
ijsrd.com
 
PDF
A Survey Report on : Security & Challenges in Internet of Things
ijsrd.com
 
PDF
IoT for Everyday Life
ijsrd.com
 
PDF
Study on Issues in Managing and Protecting Data of IOT
ijsrd.com
 
PDF
Interactive Technologies for Improving Quality of Education to Build Collabor...
ijsrd.com
 
PDF
Internet of Things - Paradigm Shift of Future Internet Application for Specia...
ijsrd.com
 
PDF
A Study of the Adverse Effects of IoT on Student's Life
ijsrd.com
 
PDF
Pedagogy for Effective use of ICT in English Language Learning
ijsrd.com
 
PDF
Virtual Eye - Smart Traffic Navigation System
ijsrd.com
 
PDF
Ontological Model of Educational Programs in Computer Science (Bachelor and M...
ijsrd.com
 
PDF
Understanding IoT Management for Smart Refrigerator
ijsrd.com
 
PDF
DESIGN AND ANALYSIS OF DOUBLE WISHBONE SUSPENSION SYSTEM USING FINITE ELEMENT...
ijsrd.com
 
PDF
A Review: Microwave Energy for materials processing
ijsrd.com
 
PDF
Web Usage Mining: A Survey on User's Navigation Pattern from Web Logs
ijsrd.com
 
PDF
APPLICATION OF STATCOM to IMPROVED DYNAMIC PERFORMANCE OF POWER SYSTEM
ijsrd.com
 
PDF
Making model of dual axis solar tracking with Maximum Power Point Tracking
ijsrd.com
 
PDF
A REVIEW PAPER ON PERFORMANCE AND EMISSION TEST OF 4 STROKE DIESEL ENGINE USI...
ijsrd.com
 
PDF
Study and Review on Various Current Comparators
ijsrd.com
 
PDF
Reducing Silicon Real Estate and Switching Activity Using Low Power Test Patt...
ijsrd.com
 
PDF
Defending Reactive Jammers in WSN using a Trigger Identification Service.
ijsrd.com
 
IoT Enabled Smart Grid
ijsrd.com
 
A Survey Report on : Security & Challenges in Internet of Things
ijsrd.com
 
IoT for Everyday Life
ijsrd.com
 
Study on Issues in Managing and Protecting Data of IOT
ijsrd.com
 
Interactive Technologies for Improving Quality of Education to Build Collabor...
ijsrd.com
 
Internet of Things - Paradigm Shift of Future Internet Application for Specia...
ijsrd.com
 
A Study of the Adverse Effects of IoT on Student's Life
ijsrd.com
 
Pedagogy for Effective use of ICT in English Language Learning
ijsrd.com
 
Virtual Eye - Smart Traffic Navigation System
ijsrd.com
 
Ontological Model of Educational Programs in Computer Science (Bachelor and M...
ijsrd.com
 
Understanding IoT Management for Smart Refrigerator
ijsrd.com
 
DESIGN AND ANALYSIS OF DOUBLE WISHBONE SUSPENSION SYSTEM USING FINITE ELEMENT...
ijsrd.com
 
A Review: Microwave Energy for materials processing
ijsrd.com
 
Web Usage Mining: A Survey on User's Navigation Pattern from Web Logs
ijsrd.com
 
APPLICATION OF STATCOM to IMPROVED DYNAMIC PERFORMANCE OF POWER SYSTEM
ijsrd.com
 
Making model of dual axis solar tracking with Maximum Power Point Tracking
ijsrd.com
 
A REVIEW PAPER ON PERFORMANCE AND EMISSION TEST OF 4 STROKE DIESEL ENGINE USI...
ijsrd.com
 
Study and Review on Various Current Comparators
ijsrd.com
 
Reducing Silicon Real Estate and Switching Activity Using Low Power Test Patt...
ijsrd.com
 
Defending Reactive Jammers in WSN using a Trigger Identification Service.
ijsrd.com
 

Recently uploaded (20)

PDF
Plant Control_EST_85520-01_en_AllChanges_20220127.pdf
DarshanaChathuranga4
 
PPTX
ASBC application presentation template (ENG)_v3 (1).pptx
HassanMohammed730118
 
PDF
Tesia Dobrydnia - An Avid Hiker And Backpacker
Tesia Dobrydnia
 
PDF
01-introduction to the ProcessDesign.pdf
StiveBrack
 
PPTX
Artificial Intelligence jejeiejj3iriejrjifirirjdjeie
VikingsGaming2
 
PDF
lesson4-occupationalsafetyandhealthohsstandards-240812020130-1a7246d0.pdf
arvingallosa3
 
PDF
Designing for Tomorrow – Architecture’s Role in the Sustainability Movement
BIM Services
 
PDF
How to Buy Verified CashApp Accounts IN 2025
Buy Verified CashApp Accounts
 
PDF
PROGRAMMING REQUESTS/RESPONSES WITH GREATFREE IN THE CLOUD ENVIRONMENT
samueljackson3773
 
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
AsadShad4
 
PPTX
Explore USA’s Best Structural And Non Structural Steel Detailing
Silicon Engineering Consultants LLC
 
PDF
NFPA 10 - Estandar para extintores de incendios portatiles (ed.22 ENG).pdf
Oscar Orozco
 
PPTX
Introduction to File Transfer Protocol with commands in FTP
BeulahS2
 
PPTX
CST413 KTU S7 CSE Machine Learning Introduction Parameter Estimation MLE MAP ...
resming1
 
DOCX
Engineering Geology Field Report to Malekhu .docx
justprashant567
 
PDF
LLC CM NCP1399 SIMPLIS MODEL MANUAL.PDF
ssuser1be9ce
 
PDF
June 2025 - Top 10 Read Articles in Network Security and Its Applications
IJNSA Journal
 
PDF
Artificial Neural Network-Types,Perceptron,Problems
Sharmila Chidaravalli
 
PDF
June 2025 Top 10 Sites -Electrical and Electronics Engineering: An Internatio...
elelijjournal653
 
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
AsadShad4
 
Plant Control_EST_85520-01_en_AllChanges_20220127.pdf
DarshanaChathuranga4
 
ASBC application presentation template (ENG)_v3 (1).pptx
HassanMohammed730118
 
Tesia Dobrydnia - An Avid Hiker And Backpacker
Tesia Dobrydnia
 
01-introduction to the ProcessDesign.pdf
StiveBrack
 
Artificial Intelligence jejeiejj3iriejrjifirirjdjeie
VikingsGaming2
 
lesson4-occupationalsafetyandhealthohsstandards-240812020130-1a7246d0.pdf
arvingallosa3
 
Designing for Tomorrow – Architecture’s Role in the Sustainability Movement
BIM Services
 
How to Buy Verified CashApp Accounts IN 2025
Buy Verified CashApp Accounts
 
PROGRAMMING REQUESTS/RESPONSES WITH GREATFREE IN THE CLOUD ENVIRONMENT
samueljackson3773
 
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
AsadShad4
 
Explore USA’s Best Structural And Non Structural Steel Detailing
Silicon Engineering Consultants LLC
 
NFPA 10 - Estandar para extintores de incendios portatiles (ed.22 ENG).pdf
Oscar Orozco
 
Introduction to File Transfer Protocol with commands in FTP
BeulahS2
 
CST413 KTU S7 CSE Machine Learning Introduction Parameter Estimation MLE MAP ...
resming1
 
Engineering Geology Field Report to Malekhu .docx
justprashant567
 
LLC CM NCP1399 SIMPLIS MODEL MANUAL.PDF
ssuser1be9ce
 
June 2025 - Top 10 Read Articles in Network Security and Its Applications
IJNSA Journal
 
Artificial Neural Network-Types,Perceptron,Problems
Sharmila Chidaravalli
 
June 2025 Top 10 Sites -Electrical and Electronics Engineering: An Internatio...
elelijjournal653
 
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
AsadShad4
 

Improving Performance of Back propagation Learning Algorithm

  • 1. IJSRD - International Journal for Scientific Research & Development| Vol. 1, Issue 6, 2013 | ISSN (online): 2321-0613 All rights reserved by www.ijsrd.com 1300 Abstract— The standard back-propagation algorithm is one of the most widely used algorithm for training feed- forward neural networks. One major drawback of this algorithm is it might fall into local minima and slow convergence rate. Natural gradient descent is principal method for solving nonlinear function is presented and is combined with the modified back-propagation algorithm yielding a new fast training multilayer algorithm. This paper describes new approach to natural gradient learning in which the number of parameters necessary is much smaller than the natural gradient algorithm. This new method exploits the algebraic structure of the parameter space to reduce the space and time complexity of algorithm and improve its performance. I. INTRODUCTION The back-propagation (BP) training algorithm is a supervised learning method for multi-layered feed-forward neural networks. It is essentially a gradient descent local optimization technique which involves backward error correction of network weights. Despite the general success of back-propagation method in the learning process, several major deficiencies are still needed to be solved. The convergence rate of back-propagation is very low and hence it becomes unsuitable for large problems. Furthermore, the convergence behavior of the back-propagation algorithm depends on the choice of initial values of connection weights and other parameters used in the algorithm such as the learning rate and the momentum term. Amari has developed natural gradient learning for multilayer perceptrons [18], which uses Quasi-Newton method [6] instead of the steepest descent direction. The Fisher information matrix is a technique used to estimate hidden parameters in terms observed random variables. It fits very nicely into Quasi-Newton optimization framework. This paper suggests that a simple modification to the initial search direction, in the above algorithm i.e. changing the gradient of error with respect to weights, to improve the training efficiency. It was discovered that if the gradient based search direction is locally modified by a gain value used in the activation function of the corresponding node, significant improvements in the convergence rates can be achieved [24]. II. BACKPROPAGATION LEARNING ALGORITHM An artificial neural network consist of input vector x and gives output y. when network has m hidden units, the output of hidden layer is φ( wα · x), α = 1,. . .,m where wα is an n dimensional connection weight vector from input to the α-th hidden unit, and φ is a sigmoidal output function. Let vα be a connection weight from the α-th hidden unit to the linear output unit and let ζ be a bias. Then the output of the neural network is written as ∑ ( ) (1) Any perceptron is specified by the parameter { w1, . . . ,wα; v}. We summarize them into a single m(n+1) dimensional vector θ. We call the space S consisting of all multilayer neurons. The parameter θ plays the role of a coordinate system of S. The vector θ of dimension m(n+1) can represent a single neuron. The output of a neuron is a random variable depends on input x. Hence the input output relation of the neuron having parameter θ is described by the conditional probability of output y on input x, ( | ) √ [ * ( )+ ] (2) Where ( ) ∑ ( ) (3) is the mean value of y given input x. Its logarithm is ( | ) * ( )+ (√ ) (4) This can be regarded as the negative of the square of an error when y is a target value and f(x,θ) is the output of the network. Hence, the maximization of the likelihood is equivalent to the minimization of the square error ( ) * ( )+ (5) The conventional on-line learning method modifies the current parameter θt by using the gradient ( ) of the loss function such that θt+1 = θt - ηt (xt ,y*t ;θt ) (6) here ηt is a learning rate, and ( ) { ( )} (7) is the gradient of the loss function l* and y*t is the desired output signal given from teacher. The steepest descent direction of the loss function l*(θ) in a Riemannian space is given [18] by ( ) ( ) ( ) (8) Where G-1 is the inverse of a matrix G = (gij) called the Riemannian metric tensor. This gradient is called natural gradient of the loss function l*(θ) in the Riemannian space. III. NATURAL GRADIENT LEARNING ALGORITHM In the multilayer neural network, the Riemannian metric tensor G(θ)= (gij(θ)) is given by the Fisher information matrix[18], gij(θ)= E[ ( | ) ( | ) ] (9) where E denotes expectation with respect to the input output pair (x,y) given in Eq.(2). The natural gradient learning algorithm updates the current θt by Improving Performance of Back propagation Learning Algorithm Harikrishna B Jethva1 Dr. V. M. Chavda2 1 Ph. D. Scholar,Department of Computer Science and Engineering 1 Bhagwant University, Sikar Road Ajmer, Rajasthan 2 SVICS, Kadi, Gujarat
  • 2. Improving Performance of Back propagation Learning Algorithm (IJSRD/Vol. 1/Issue 6/2013/007) All rights reserved by www.ijsrd.com 1301 θt+1 = θt - 𝜂t ( ) (10) IV. ADAPTIVE IMPLEMENTATION OF NATURAL GRADIENT LEARNING The Fisher information G(θ) depends on the probability distribution of x which is usually unknown. Hence, it is difficult to obtain G(θ). Moreover, its inversion is costly. Here, we show an adaptive method of directly estimating G- 1 (θ) [5]. Since the Fisher information of Eq. (9) can be rewritten, by using Eq. (4), as Gt = E[ ( | ) ( | ) ] * ( )+ + E[ ( ) ( ) ] [ ( ) ( ) ] (11) where ‗denotes transposition of a vector or matrix. We have following recursive estimation of G-1 [23] ( ) – ( ) (12) Where εt is a small learning rate, ( ) and ft = f(xt,θt). Together with θt+1 = θt – ηt l(xt ,yt ;θt ) (13) this gives the adaptive method of natural gradient learning. This is different from the Newton method, but can be regarded as an adaptive version of Gauss Newton method. Moreover, information geometry suggests the important geometric properties of hierarchical statistical model in general. V. EXPERIMENTAL RESULTS We conducted an experiment for comparing convergence speeds between conventional Natural Gradient Learning (NGL) algorithm, and the Adaptive Natural Gradient learning (ANGL) algorithms. We take XOR problem because it is not linearly separable problem. We use NN architecture with two hidden units and hyperbolic tangent transfer function between both the hidden units and output units. The inputs and outputs are: X0 = [ ] X1 = [ ] X2 = [ ] X3 = [ ] Y0 = -1 Y1 = 1 Y2 = 1 Y3 = -1 Respectively. Thus the error for each pattern is Єn = yn – tanh(W2tanh(W1xn+b1)+b2)2 (14) There are two hidden units and each layer has bias. Hence W1 is a 2-by-2 matrix and W2 is a 1-by-2 matrix. The performance compared with sum squared error metric. Neural network training algorithms are very sensitive to the learning rate. So we use step size η‖ ‖ for NGL algorithm. An interesting point of comparison is the relative step size of this algorithm. For ANGL, the effective learning rate is the product of the learning rate η and the largest eigenvalue of the G-1 . Figure 1, 2 shows the sum squared error of each learning epoch for NGL and ANGL. Table 1 show the parameters used in the three learning algorithms and some of the result of the experiment. Parameter NGL ANGL Hidden units Learning rate Adaption rate Learning Epoch When SSE < 0.02 Final SSE Final Learning Rate 2 0.25 N.A. 10000 0.0817 1e-4 2 0.25 0.1 320 3.55e-4 0.144 Table 1: The result of XOR Experiment And Parameter Used VI. CONCLUSION Natural Gradient Descent learning works well for many problems. Amari[18] had developed an algorithm to avoid local minima by following the curvature of a manifold in the parameter space of neuron. By using recursive estimate of the inverse of the Fisher information matrix of the parameters, the algorithm is able to accelerate learning in the direction of descent. The experiment have shown that the performance of natural gradient algorithm improved by using adaptive gradient method of learning. There are many areas of research in which this research can be applied, like speech recognition etc. REFERENCES [1] D. E. Rumelhart and J. L. McClelland, Parallel Distributed Processing. Cambridge, MA: MIT Press, 1986. [2] D. O. Hebb, The Organization of Behavior. New York: John Wiley & Sons, 1949. [3] D. J. C. MacKay, Information Theory, Inference, and Learning Algorithms. New York: Cambridge University Press, 2003. [4] F. Rosenblatt, Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Washington DC: Spartan Books, 1962. [5] H. Park, S. Amari, and K. Fukumizu, ―Adaptive natural gradient learning algorithms for various stochastic models,‖ Neural Networks, vol. 13, no. 7, pp. 755–764, 2000. [6] James A. Freeman David M. Skapura, Neural Networks Algorithms, Applications, and Programming Techniques, Addison-Wesley Publishing Company (1991) [7] Jinwook Go, Gunhee Han, Hagbae Kim Multigradient: A New Neural Network Learning Algorithm for Pattern Classification IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 39, NO. 5, MAY 2001 [8] Kenji Fukumizu, Shun-ichi Amari Local Minima and Plateaus in Hierarchical Structures of Multilayer Perceptrons Brain Science Institute The Institute of Physical and Chemical Research (RIKEN) E-mail: ffuku,[email protected] Oct 22, 1999 [9] Kavita Burse, Manish Manoria, Vishnu P. S. Kirar Improved Back Propagation Algorithm to Avoid Local Minima in Multiplicative Neuron Model World
  • 3. Improving Performance of Back propagation Learning Algorithm (IJSRD/Vol. 1/Issue 6/2013/007) All rights reserved by www.ijsrd.com 1302 Academy of Science, Engineering and Technology 72 2010 [10] M. Abramowitz and I. A. Stegun, Eds., Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Washington, DC: US Government Printing Office, 1972. [11] M. Biehl and H. Schwarze, ―Learning by online gradient descent,‖ Journal of Physics, vol. A, no. 28, pp. 643–656, 1995. [12] N. Murata, ―Astatistical study of on-line learning,‖ in On-line Learning in Neural Networks, D. Saad, Ed., pp. 63–92. New York: Cambridge University Press, 1999. [13] N. M. Nawi, M. R. Ransing, and R. S. RansingAn Improved Learning Algorithm based on the Conjugate Gradient Method for Back Propagation Neural NetworksInternational Journal of Applied Science, Engineering and Technology www.waset.org Spring 2006 [14] R. Rojas, Neural Networks, ch. 7. New York: Springer-Verlag, 1996. [15] RIKEN Brain Science Institute(RIKEN BSI) Japan http://www.brain.riken.jp/ [16] R. A. Fisher, ―On the mathematical foundations of theoretical statistics,‖ Philosophical Transactions of the Royal Society of London, vol. 222, pp. 309–68, 1922. [17] S. Amari, ―Neural learning in structured parameter spaces — natural riemannian gradient,‖ in Advances in Neural Information Processing Systems, M. C. Mozer, M. I. Jordan, and T. Petsche, Eds., vol. 9, p. 127. Cambridge, MA: The MIT Press, 1997. [18] S. Amari, ―Natural gradient works efficiently in learning,‖ Neural Computation, vol. 10, no. 2, pp. 251–276, 1998. [19] S. Amari, H. Park, and T. Ozeki, Geometrical singularities in the neuromanifold of multilayer perceptrons, no. 14. Cambridge, MA: MIT Press, 2002. [20] S. Amari and H. Nagaoka, Methods of Information Geometry, Translations of Mathematical Monographs, vol. 191. New York: Oxford University Press, 2000. [21] Simon Haykin Neural Networks A comprehension foundation Pearson education seventh edition (2007) [22] T. Heskes and B. Kappen, ―On-line learning processes in artificial neural networks,‖ in Mathematical Foundations of Neural Networks, J. Taylor, Ed., pp. 199–233. Amsterdam, Netherlands: Elsevier, 1993. [23] Todd K. Moon and Wynn Stirling Mathematical Methods and Algorithms for Signal Processing, Prentice Hall, 1999 [24] Weixing, Xugang, Zheng TANG Avoiding the Local Minima Problem in Backpropagation Algorithm with Modified Error Function IEICE TRANS. FUNDAMENTALS, VOL.E88–A, NO.12 DECEMBER 2005 Fig. 1: The Sum Squared Error of NGL . Fig. 2: The Sum Squared Error of ANGL.