SlideShare a Scribd company logo
International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.5, October 2011
DOI : 10.5121/ijcsea.2011.1503 21
Adaptive modified backpropagation algorithm
based on differential errors
S.Jeyaseeli Subavathia
and T.Kathirvalavakumarb
a
Department of Information Technology, Sri Kaliswari College, Sivakasi – 626130,
Tamilnadu, India
b
Department of Computer Science, V.H.N.S.N. College, Virudhunagar – 626001,
Tamilnadu, India
Abstract
A new efficient modified back propagation algorithm with adaptive learning rate is proposed to increase
the convergence speed and to minimize the error. The method eliminates initial fixing of learning rate
through trial and error and replaces by adaptive learning rate. In each iteration, adaptive learning rate for
output and hidden layer are determined by calculating differential linear and nonlinear errors of output
layer and hidden layer separately. In this method, each layer has different learning rate in each iteration.
The performance of the proposed algorithm is verified by the simulation results.
Keywords
Adaptive learning rate, Differential error, Linear error, Modified standard back propagation, Nonlinear
error.
1. Introduction
The classical method for training feedforward neural network (FNN) is the backpropagation
algorithm (BP) [9] which is based on the steepest descent optimization technique. Training is
usually carried out by iterative updating of weights based on the error signal. BP is a
descent algorithm which attempts to minimize the error at each iteration. The weights of
the network are adjusted by the algorithm such that the error is decreased along a descent
direction [18]. Traditionally two parameters called learning rate and momentum factor are
used for controlling the weight adjustment along the descent direction. Finding initial
learning rate and fixed learning rate must be done with great care. If the learning rate is very
large, then the learning may become unstable. If it is small, then often it is very slow for practical
applications which leads to finding of fast learning algorithms [13].
International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.5, October 2011
22
Many techniques have been proposed to increase the convergence speed. Abid et al. [1] described
modified BP algorithm (MBP) based on sum of linear and nonlinear errors of output neurons to
improve the speed of convergence in minimum iterations. The algorithm converges faster than the
standard BP algorithm. Some researchers focused on selection of better energy function [2,14]
and selection of suitable learning rate and momentum [6,9,16,17]. Learning rate adaptation by
sign changes will adapt the step size by having a separate learning rate for each connection [12]
A problem with all of these techniques is their convergence to local minima. To solve this
problem, global search algorithm like genetic algorithm have to be applied [4]. But searching for
the global minimum may be trapped at local minima during gradient descent. Also if the network
is trained with disturbances in the input, then global minimum point can not be found. So fast
convergence and strong robustness may not be guaranteed. To solve these problems adaptive
learning algorithms have been developed recently.
Jeong and Lee [7] have proposed an adaptive algorithm based on first and second order
derivatives of neural activation at hidden layers which results in hybrid learning rules. Sha and
Bajic [13] have proposed an adaptive learning rate algorithm for I/O identification based on two
ANNs using convergence analysis of the conventional gradient descent method. Xie and Zhang
[15] have proposed variable learning rate LMS algorithm using Lyapunav method especially
when there is noise in the input signal. Behera et al. [3] have described new learning algorithms
LFI and LF II based on Lyapunov function for the training of feeforward neural networks. In this
algorithm fixed learning parameters are replaced with adaptive learning parameters using
convergence theorem based on Lyapunov stability theory. ]. Zhihong Man et al [19] proposed
a new adaptive backpropagation algorithm based on lyapunov stability theory for neural
networks. They showed that the candidate of a lyapunov function of the tracking error
between the output of a neural network and the desired reference signal is chosen first, and
the weights of a neural network are then updated from the output layer to input layer.
Our previous work [8] describes a modified backpropagation algorithm in neighborhood based
network by replacing fixed learning parameters by adaptive learning parameters. Here the
parameters are calculated using convergence theorem based on Lyapunov stability theory.
Iranmanesh and Mahdavi [11] have proposed a learning method using differential adaptive
learning rate. In each iteration, the learning rate is updated according to the error of the output
layer. The learning rate of the output layer is computed by differentiating the error of the output
layer. The differentiation of the sigmoidal function of the sum of multiplication of error of each
output layer neuron with corresponding weights is divided by the number of hidden neurons is
used as an adaptive learning rate of hidden layer.
We propose a new adaptive learning rate algorithm to speed up the learning process of the neural
network. In the proposed algorithm separate adaptive learning rate is used in both hidden and
output layers. In this, linear and nonlinear errors for each neuron in the output layer are multiplied
with derivative of the corresponding neuron’s activation function, added and then differentiated
to get the adaptive learning rate for the output layer. Linear and nonlinear error of each hidden
neuron is multiplied with its corresponding output layer weights separately and then added. Then
the value is divided by number of hidden neurons. The differentiation of the sigmoidal function of
this value is used as a learning rate for the hidden layer. The efficiency of the proposed algorithm
in terms of time and epochs shown by simulating the benchmark problems such as XOR, 3-bit
parity, nonlinear function approximation problem and iris data sets.
International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.5, October 2011
23
The remaining of the paper is organized as follows: section 2 describes adaptive learning rate
algorithm, section 3 describes the proposed algorithm and section 4 discusses the simulation
results.
2. Training of Neural network
Consider a single hidden layer feedforward neural network shown in Figure 1. A bias node is
included in the input layer. Let X = (xi) be the input vector, Y = (yj) be the output vector and wji
[s]
be the weight of the ith
unit in the (s-1)th
layer to the jth
unit in the sth
layer. The activation function
of both hidden and output layer neurons are assumed to be sigmoidal. Sequential mode training is
applied here.
Figure 1. Single hidden layer neural network
.
Standard BP (SBP)
For each input pattern nonlinear output of the jth
neuron of the output layer network is
calculated as follows:
1
1
)1(
−
=
∑
−
= s
i
n
i
s
ji
s
j ywu
s
(1)
( )
s
ju
s
j d
e
uf s
j
=
+
= −
1
1
)( (2)
where n(s-1) represents number of neurons in the (s-1)th
layer.
SBP minimizes the following criterion equals to the sum of the squares of the errors
between the actual s
jy and the desired s
jd outputs for a pattern p.
( )∑=
=
sn
j
s
jp eE
1
2
1 (3)
where the nonlinear error signal is
)(1
s
j
s
j
s
j dye −= (4)
X1
Xn
:
:
X0
(Bias)
s-2 s-1 s
Y1
International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.5, October 2011
24
The weight update rule is
s
ji
ps
ji
w
E
w
∂
∂
−=∆ µ (5)
where µ is the fixed learning rate selected by trial and error. Substituting (3) in (5), the weight
update rule becomes,
s
ji
s
js
j
s
ji
w
y
ew
∂
∂
=∆ 1µ
s
ji
s
j
s
j
s
js
j
s
ji
w
u
u
y
ew
∂
∂
∂
∂
=∆ 1µ
( ) 1
1 ' −
=∆ s
i
s
j
s
j
s
ji yufew µ (6)
The estimated nonlinear error of the hidden layer (s-1) is as follows:
( ) s
rj
s
r
n
r
s
r
s
j weufe
s
1
1
1
1 ∑=
−
′= (7)
The weight update rule of the hidden layer is
)1(
)1(
−
−
∂
∂
−=∆ s
ji
ps
ji
w
E
w µ (8)
( ) )2()1()1(
1
)1(
' −−−−
=∆ s
i
s
j
s
j
s
ji yufew µ (9)
Now the weights of both hidden and output layer are updated using
jijiji wtwtw ∆+=− )()1( (10)
Modified BP
For each input pattern the linear and nonlinear outputs of the jth
neuron in output layer s of the
network are calculated respectively as follows:
1
1
)1(
−
=
∑
−
= s
i
n
i
s
ji
s
j ywu
s
(11)
( )
s
ju
s
j d
e
uf s
j
=
+
= −
1
1
)( (12)
where n(s-1) represents number of neurons in the (s-1)th
layer. The MBP approach minimizes
modified form of criterion Ep used in standard BP algorithm. The criteria Ep is sum of the linear
and nonlinear quadratic errors of the output neuron for the current pattern p.
International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.5, October 2011
25
( ) ( )∑∑ ==
+=
ss s
j
s
j
n
j
s
jp eeE
1
2
2
1
2
1
2
1
2
1
λ (13)
where the nonlinear error signal is
)(1
s
j
s
j
s
j dye −= (14)
and the linear error signal is
)(2
s
j
s
j
s
j ulye −= (15)
Here
( )s
j
s
j yfly 1−
= (16)
where s
jy and s
jd respectively are desired and current output for jth
unit in the sth
layer. p in (13)
denotes the pth
pattern and λ is the weighting coefficient. In the output layer the linear and
nonlinear errors are known [1]. So the weight update rule [1] for the output layer is
s
ji
ps
ji
w
E
w
∂
∂
−=∆ µ (17)
where µ is the fixed learning rate selected by trial and error. Substituting (13) in (17), the weight
update rule becomes,
s
ji
s
js
js
ji
s
js
j
s
ji
w
u
e
w
y
ew
∂
∂
+
∂
∂
=∆ 21 µλµ
1
21
−
+
∂
∂
∂
∂
=∆ s
i
s
js
ji
s
j
s
j
s
js
j
s
ji ye
w
u
u
y
ew µλµ
( ) 1
2
1
1 ' −−
+=∆ s
i
s
j
s
i
s
j
s
j
s
ji yeyufew µλµ (18)
In the hidden layer, the linear and nonlinear errors are unknown and must be calculated [1]. The
estimated nonlinear and linear error [1] of the hidden layer (s-1) are respectively as follows:
( ) s
rj
s
r
n
r
s
r
s
j weufe
s
1
1
1
1 ∑=
−
′= (19)
( ) s
rj
n
r
s
r
s
j
s
j weufe
s
∑=
−−
′=
1
2
11
2 (20)
The weight update rule of the hidden layer is
)1(
)1(
−
−
∂
∂
−=∆ s
ji
ps
ji
w
E
w µ (21)
( ) )2()1(
2
)2()1()1(
1
)1(
' −−−−−−
+=∆ s
i
s
j
s
i
s
j
s
j
s
ji yeyufew µλµ (22)
International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.5, October 2011
26
Now the weights of both hidden and output layer are updated using
jijiji wtwtw ∆+=− )()1( (23)
where t represents iteration. In order to increase the convergence speed and to make the learning
rate µ adaptive , we propose a new technique based on differential linear and nonlinear errors of
output layer and hidden layer.
Adaptive Modified BP
In the proposed technique first linear and nonlinear errors of jth
neuron in the output layer s are
calculated using (14), (15) and (16). Then all the linear and nonlinear errors of the neurons are
multiplied with the derivative of the corresponding neuron’s activation function and added
separately as shown below:
)(
1
11
s
j
n
j
s
jo ufe
s
′= ∑=
δ (24)
)(
1
22
s
j
n
j
s
jo ufe
s
′= ∑=
δ (25)
Then 1oδ and 2oδ are added to get the total error
21 ooo δδδ += (26)
Now the total error is divided by the total number of output neurons known as a
δ
s
a
a
n
δ
δ = (27)
and the outµ of the output layer s is computed as follows:
( )a
out f δµ ′= (28)
where f is a sigmoidal activation function given by
( )
( )a
e
f a
δ
δ −
+
=
1
1
(29)
with property
( ) ( ) ( )( )aaa
fff δδδ −=′ 1 (30)
Then the change of weights are calculated using
( ) 1
2
1
1 ' −−
+=∆ s
i
s
jout
s
i
s
j
s
jout
s
ji yeyufew λµµ (31)
Similarly for the hidden layer (s-1) the same procedure is applied to calculate adaptive learning
International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.5, October 2011
27
rate hidµ . First nonlinear errors )1(
1
−s
je and linear errors )1(
2
−s
je of all hidden neurons are calculated
using (19) and (20). Then nonlinear errors 1hδ and 2hδ respectively are
s
ji
n
i
n
j
s
jh we
s s
∑∑
−
= =
−
=
)1(
1 1
)1(
11δ (32)
s
ji
n
i
n
j
s
jh we
s s
∑∑
−
= =
−
=
)1(
1 1
)1(
22δ (33)
and then both 1hδ and 2hδ are added to get the total error hδ as below:
21 hhh δδδ += (34)
Now the total error is divided by the total number of hidden neurons known as b
δ
)1( −
=
s
hb
n
δ
δ (35)
and then hidµ is computed as follows:
( )b
hid f δµ ′= (36)
where f is a sigmoidal activation function. Then the change of weights are calculated using the
following equation.
( ) )2()1(
2
)2()1()1(
1
1(
' −−−−−−
+=∆ s
i
s
jhid
s
i
s
j
s
jhid
s
ji yeyufew λµµ (37)
Now the weights of both hidden and output layer are updated using (23).
3. Algorithm
1. Define network structure and assign initial weights randomly.
2. Select a pattern to be processed in the network.
3. For each node in the hidden layer, compute
a. Net value using Eq (11).
b. Output value using Eq (12).
4. For the output layer, compute
a. Net value using Eq (11) and output value using Eq (12).
b. Non Linear and linear errors using Eq (14), Eq (15) and Eq (16).
c. Adaptive learning rate outµ using Eq (24) to Eq (30).
d. Change of weight using Eq (31).
5. For the hidden layer, compute
a. Non Linear error using Eq (19).
b. Linear error using Eq (20).
International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.5, October 2011
28
c. Adaptive learning rate hidµ using Eq (32) to Eq (36).
d. Change of weight using Eq (37)
6. Update weights of output and hidden layer using Eq (23).
7. Repeat the steps 2 to 6 for all the patterns.
8. Evaluate network error with new weights.
9. Stop training if termination condition is reached. Otherwise repeat the steps 3
to 9.
4. Simulation Results and discussions
The performance of the proposed algorithm is verified by simulating the benchmark problem
such as XOR, 3-Bit parity, Nonlinear function approximation function problem and Iris data set.
All the problems are simulated using language C on a Pentium IV with 2.40 GHz. The
convergence property of the proposed algorithm is compared with MBP [1], Backpropagation
with momentum (BPM) [9] and backpropagation algorithm [10]. Each time all the patterns in the
problem have been used once in the network during training is called an epoch. Mean squared
error (MSE) of the network is calculated by dividing the sum of squared linear error in each
epoch by twice the number of patterns. Network structure, parameter values and termination
condition are considered as constant for all the algorithms to have better comparison. Network
weights are randomly and uniformly generated from the range [-5, +5]. The weighting coefficient
λ is assigned the value 3.7. The convergence of the proposed algorithm is shown by the
learning curve.
XOR
The network structure considered in this problem has 3 input neurons including bias, 4 hidden
neurons and one output neuron. The termination condition fixed for convergence is MSE 0.001.
The results obtained are tabulated in Table 1.
Table 1: Comparison table for XOR problem
ALGORITHM PARAMETERS EPOCHS TRAINING TIME
MSE IN
MSECS
BP µ=1.15 754 0.000998 176
BPM µ=1.15 α=0.01 710 0.001 151
MBP µ=0.25 λ=0.01 501 0.001 115
Proposed λ=3.7 237 0.000987 49
It has been observed that the BP algorithm takes 176 msecs and 754 epochs to reach the
minimum error. The proposed algorithm converges faster even the learning rate is not fixed in the
beginning. Since the learning rate is adapted based on the error of output and hidden layers it
takes minimum time of 49 msecs and minimum epochs of 237 for convergence. The learning
International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.5, October 2011
29
curve obtained is shown in Figure 2 for the proposed algorithm. The adaptive learning rate
obtained based on the error of output layer and hidden layer are shown in Figure 3 and Figure 4.
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0 50 100 150 200 250
Epochs
MSE
Figure 2. Learning curve based on MSE and Epochs of XOR problem for the proposed
algorithm
0
0.05
0.1
0.15
0.2
0.25
0.3
0 200 400 600 800 1000
Iterations
Muout
Figure 3. Adaptive learning rate of hidden layer.
0.243
0.244
0.245
0.246
0.247
0.248
0.249
0.25
0.251
0 200 400 600 800 1000
Iterations
Muhidden
Figure 4. Adaptive learning rate of output layer.
3-bit parity
We used 4-9-1 ANN including one bias in input layer to simulate the 3-bit parity problem. The
results obtained are tabulated in Table 2.
International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.5, October 2011
30
Table 2. Comparison table for the 3-bit parity problem
ALGORITHM PARAMETERS EPOCHS TRAINING TIME
MSE IN
MSECS
BP µ=1.15 1570 0.000999 379
BPM µ=1.15 α=0.01 1450 0.000998 364
MBP µ=0.25 λ=0.01 520 0.000995 126
Proposed λ=3.7 298 0.000997 77
From the table it has been observed that the proposed algorithm converges quickly within 77
msecs in 298 epochs. But the algorithm BP, BPM and MBP require 1570, 1450 and 520 epochs
for convergence respectively. Also they require 379 msecs, 364 msecs and 126 msecs time to
reach the termination condition MSE 0.001. All the algorithm except proposed algorithm take
time to fix the learning rate. The best performance of the proposed algorithm is shown in Figure
5.
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0 100 200 300 400
Epochs
MSE
Figure 5. Learning curve based on MSE and Epochs of 3-bit parity problem for the
proposed Algorithm
Nonlinear function approximation problem
A nonlinear function approximation with 8 input values xi is defined in this problem. The three
output quantities yi are defined by the following equations
( ) 4876543211 xxxxxxxxy +++=
( ) 8876543212 xxxxxxxxy +++++++=
( ) 21
13 1 yy −=
500 number of input values xi ε (0,1) are randomly generated and the corresponding yi are
calculated using the above equation. All the algorithms taken for comparison are assumed to have
the network structure with 9 neurons in the input layer including bias, 5 neurons in the hidden
layer and 3 neurons in the output layer. All the algorithms including proposed is fixed with the
minimum error of MSE 0.004. The results obtained are tabulated in Table 3. It shows that the
algorithms BP and BPM converge to MSE 0.004 in 590 epochs and 389 epochs within 612 msecs
and 487 msecs respectively. But MBP converges to the termination condition with the maximum
International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.5, October 2011
31
of 75 epochs within 89 msecs. The proposed algorithm converges quickly in 25 epochs within 51
msecs.
Table 3. Comparison table for the nonlinear function approximation problem.
ALGORITHM PARAMETERS EPOCHS TRAINING TESTING TIME
MSE MSE IN
MSECS
BP µ=1.15 590 0.003990 0.004285 612
BPM µ=1.15 α=0.01 389 0.003995 0.004125 487
MBP µ=0.25 λ=0.01 75 0.003574 0.003913 89
Proposed λ=3.7 25 0.003796 0.003835 51
The learning curve of the proposed algorithm is shown in Figure 6. Another set of 500 patterns
are generated for testing. The testing MSE obtained for the proposed is 0.003835 and for the
MBP is 0.003913.
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0 5 10 15 20 25
Epochs
MSE
Figure 6. Learning curve based on MSE and Epochs of Non linear function approximation
problem for the proposed algorithm
Iris data set
The Iris data [5], is one of the best known databases in the pattern recognition literature. The data
set contains three classes. Each class has 50 instances, totally 150 patterns are used. Among 75
patterns are used for training and the remaining for testing. All the values are normalized by
dividing the value by 10. The network structure considered is 5-10-1 including one bias in the
input layer. Table 4 shows the results obtained for all the algorithms taken for comparison.
International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.5, October 2011
32
Table 4. Comparison table for the Iris data set problem.
ALGORITHM PARAMETERS EPOCHS TRAINING TESTING TIME
MSE MSE IN
MSECS
BP µ=1.15 491 0.0003 0.008172 193
BPM µ=1.15 α=0.01 414 0.0003 0.008155 165
MBP µ=0.25 λ=0.01 368 0.0003 0.006869 143
Proposed λ=3.7 95 0.00029 0.006654 33
The proposed algorithm and MBP take minimum epochs of 95 and 368 and minimum time of 33
msecs and 143 msecs respectively. But BP and BP with momentum require 491 and 414 epochs
and 193 msecs and 165 msecs respectively to reach the termination condition MSE 0.0003. Also
the testing MSE obtained for the proposed algorithm is minimum. The learning curve drawn
against epochs and MSE for the proposed algorithm is shown in Figure 7.
0
0.0005
0.001
0.0015
0.002
0.0025
0 50 100 150
Epochs
MSE
Figure 7. Learning curve based on MSE and epochs of Iris data set problem for the
proposed algorithm.
4. Conclusion
An efficient technique for adapting the learning rate in modified backpropagation algorithm for
training sequential FNN is proposed. Here, the learning rate is adapted based on the differential
linear and nonlinear errors of output and hidden layers. Separate adaptive learning rate is used for
both hidden and output layer in each iteration. The time required to fix the learning rate by trial
and error is saved. The proposed algorithm improves the convergence speed in terms of time and
epochs which is shown by simulating four different problems. The main advantage of the
proposed algorithm is no need to put effort to tune the learning parameter to obtain optimal
convergence. The proposed algorithm is easy to implement and easy to compute learning rate for
both hidden and output layer which modifies the values of weights and increases the convergence
speed. The learning curve show that the convergence is guaranteed.
International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.5, October 2011
33
References
[1] Abid S, Fnaiech F, Najim M, (2001), “A fast feedforward training algorithm using a modified form of
the standard backpropagation algorithm,” IEEE Trans. Neural Networks 12 424-430.
[2] Ahmad M, Salam F.M, (1992) “Supervised learning using cauchy energy function,” Proc. 2nd Int.
Conf. Fuzzy logic neural networks, lizuka, Japan, 721-724.
[3] Behera L, Kumar S, Patnaik A, (2006) “On adaptive learning rate that guarantees convergence in
feedforward networks,” IEEE Trans. Neural Networks 17 1116-1125.
[4] Bengio S, Bengio Y, Cloutier J, (1994) “Use of genetic programming for the search of a new learning
rule for neural networks,” Proc.IEEE World Congr. Computational Intelligence and Evolutionary,
324-327
[5] Fisher R.A, (1936) “The use of multiple measurements in taxonomic problems,” Annual Eugenics 7
179-188.
[6] Jacobs R.A, (1988) “Increased rates of convergence through learning rate adaptation,” Neural
networks 1 295-307..
[7] Jeong S.Y, Lee S.Y, (2000) “Adaptive learning algorithms to incorporate additiona functional
constraints into neural networks,” Neurocomputing 35 73-90.
[8] Kathirvalavakumar T, Subavathi S.J, (2009) “Neighborhood based modified backpropagation
algorithm using adaptive learning parameters for training feedforward neural networks,”
Neurocomputing 72 3915-3921.
[9] Rojas R, (1996) “Neural networks : a systematic introduction,”. Berlin. Springer verlag; 424-430.
[10] Rumelhart DE, Hinton GE, Williams RJ, 1 (1986) “Learning internal representations by error
propagations,” Parallel distributed processing: explorations in the microstructures of cognition,
Cambridge(MA): MIT Press; 62-318.
[11] Saeid Iranmanesh, Amin Mahadevi M, (2009) “Differential adaptive learning rate method for back
propagation neural networks,” World Academy of Science, Engineering and Technology 50 285-288.
[12] Sarkar D, (1995) “Methods to speed up error back propagation learning algorithm,” ACM Comput.
Surv., 27 519-544
[13] Sha D, Bajic V.B, (1999) “Adaptive on-line ANN learning algorithm and application to identification
of non-linear systems,” Informatica 23 521-529
[14] Van Ooyen A, Nienhuis B, (1992) “Improving the convergence of the backpropagation algorithm,
Neural networks,” 5 465-471
[15] Xie S, Zhang C, (2006) “Variable learning rate LMS based linear adaptive inverse control,” Journal
of information and computing science 1 139-148
[16] Yu C.C, Liu B.D, (2002) “A backpropagation algorithm with adaptive learning rate and momentum
coefficient,” Proc.Int.Joint Conf. Neural networks(IJCNN'02), 2 1218-1223
[17] Yu X.H, Chen G.A, Cheng S.X, (1993) “Acceleration of backpropagation of learning using optimzed
learning rate and momentum,” Electron.Lett, 29(14) 1288-1289.
[18] Yahya H. Zweiri, (2006) “ Optimization of a three term backpropagation algorithm used for neural
network learning “, International journal of Computational Intelligence 3 322 – 327.
[19] Zhihong Man, Hong Ren Wu, Sophie Liu, Xinghuo Yu, (2006), “ A new adaptive backpropagation
algorithm based on Lyapunov stability thory for neural networks”, IEEE Transactions on Neural
Networks 17 1580-1591.
International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.5, October 2011
34
Authors
T.Kathirvalavakumar received M.Sc. degree in Mathematics from Madurai Kamaraj
University in 1986, Post Graduate Diplomo in Computer Applications from
Bharathidasan University in 1987, M.Phil. degree in Computer Science from Bharathiar
University in 1994 and the Ph.D. degree in Computer Science from University of
Madras in 2004. Since 1987 he has been working as a Lecturer, currently Associate
Professor in Computer Science at V.H.N.Senthikumara Nadar College, Virudhunagar,
Tamilnadu, India. His research interests include Neural Networks and Applications, Pattern recognition and
Data Mining.
S.Jeyaseeli Subavathi received the MCA degree from Madurai Kamaraj University, in
1998 and M.Phil degree in Computer Science, from Mother Teresa Women's University
in 2004. From January 2000 to April 2007 she worked as Lecturer in Computer
Applications at SFR College, India. Since July 2007 she has been working as a Lecturer
in Information Technology, Sri Kaliswari College, Sivakasi, Tamilnadu, India. At
present she is a doctoral candidate in the Department of Computer Science at Madurai
Kamaraj University, India. Her area of interests include Neural Networks and Data structures and
algorithms.

More Related Content

What's hot (19)

PDF
Optimal neural network models for wind speed prediction
IAEME Publication
 
PDF
Multilayer extreme learning machine for hand movement prediction based on ele...
journalBEEI
 
PDF
Electricity Demand Forecasting Using ANN
Naren Chandra Kattla
 
PDF
Efficiency of Neural Networks Study in the Design of Trusses
IRJET Journal
 
PDF
Simulation of Single and Multilayer of Artificial Neural Network using Verilog
ijsrd.com
 
PDF
Comparison of hybrid pso sa algorithm and genetic algorithm for classification
Alexander Decker
 
PDF
Electricity Demand Forecasting Using Fuzzy-Neural Network
Naren Chandra Kattla
 
PDF
11.comparison of hybrid pso sa algorithm and genetic algorithm for classifica...
Alexander Decker
 
PPT
Supervised Learning
butest
 
PDF
High performance extreme learning machines a complete toolbox for big data a...
redpel dot com
 
PDF
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
cscpconf
 
PPTX
Artificial neural networks
Mohamed Arif
 
PPTX
Forecasting of Sales using Neural network techniques
Hitesh Dua
 
PPTX
Multilayer perceptron
omaraldabash
 
PDF
K010218188
IOSR Journals
 
PDF
Short Term Electrical Load Forecasting by Artificial Neural Network
IJERA Editor
 
PDF
Hybrid PSO-SA algorithm for training a Neural Network for Classification
IJCSEA Journal
 
PDF
A survey based on eeg classification
ijitjournal
 
PPTX
Amnestic neural network for classification
lolokikipipi
 
Optimal neural network models for wind speed prediction
IAEME Publication
 
Multilayer extreme learning machine for hand movement prediction based on ele...
journalBEEI
 
Electricity Demand Forecasting Using ANN
Naren Chandra Kattla
 
Efficiency of Neural Networks Study in the Design of Trusses
IRJET Journal
 
Simulation of Single and Multilayer of Artificial Neural Network using Verilog
ijsrd.com
 
Comparison of hybrid pso sa algorithm and genetic algorithm for classification
Alexander Decker
 
Electricity Demand Forecasting Using Fuzzy-Neural Network
Naren Chandra Kattla
 
11.comparison of hybrid pso sa algorithm and genetic algorithm for classifica...
Alexander Decker
 
Supervised Learning
butest
 
High performance extreme learning machines a complete toolbox for big data a...
redpel dot com
 
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
cscpconf
 
Artificial neural networks
Mohamed Arif
 
Forecasting of Sales using Neural network techniques
Hitesh Dua
 
Multilayer perceptron
omaraldabash
 
K010218188
IOSR Journals
 
Short Term Electrical Load Forecasting by Artificial Neural Network
IJERA Editor
 
Hybrid PSO-SA algorithm for training a Neural Network for Classification
IJCSEA Journal
 
A survey based on eeg classification
ijitjournal
 
Amnestic neural network for classification
lolokikipipi
 

Similar to Adaptive modified backpropagation algorithm based on differential errors (20)

PDF
A New Method for Figuring the Number of Hidden Layer Nodes in BP Algorithm
rahulmonikasharma
 
PPT
nural network ER. Abhishek k. upadhyay
abhishek upadhyay
 
PDF
Design of airfoil using backpropagation training with mixed approach
Editor Jacotech
 
PDF
Design of airfoil using backpropagation training with mixed approach
Editor Jacotech
 
PPT
Multi-Layer Perceptrons
ESCOM
 
PDF
honn
William Yates
 
PDF
Jf3515881595
IJERA Editor
 
PDF
A COMPARATIVE STUDY OF BACKPROPAGATION ALGORITHMS IN FINANCIAL PREDICTION
IJCSEA Journal
 
PDF
Ffnn
guestd60a613
 
PPT
Lec 6-bp
Taymoor Nazmy
 
PDF
1-s2.0-S092523121401087X-main
Praveen Jesudhas
 
PPT
Annintro
kaushaljha009
 
PPT
nural network ER. Abhishek k. upadhyay
abhishek upadhyay
 
PPTX
CST413 KTU S7 CSE Machine Learning Neural Networks and Support Vector Machine...
resming1
 
PPTX
Extreme learning machine:Theory and applications
James Chou
 
PDF
NN-Ch6.PDF
gnans Kgnanshek
 
PDF
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
IOSR Journals
 
PDF
Improving Performance of Back propagation Learning Algorithm
ijsrd.com
 
PDF
Optimal neural network models for wind speed prediction
IAEME Publication
 
PDF
Optimal neural network models for wind speed prediction
IAEME Publication
 
A New Method for Figuring the Number of Hidden Layer Nodes in BP Algorithm
rahulmonikasharma
 
nural network ER. Abhishek k. upadhyay
abhishek upadhyay
 
Design of airfoil using backpropagation training with mixed approach
Editor Jacotech
 
Design of airfoil using backpropagation training with mixed approach
Editor Jacotech
 
Multi-Layer Perceptrons
ESCOM
 
Jf3515881595
IJERA Editor
 
A COMPARATIVE STUDY OF BACKPROPAGATION ALGORITHMS IN FINANCIAL PREDICTION
IJCSEA Journal
 
Lec 6-bp
Taymoor Nazmy
 
1-s2.0-S092523121401087X-main
Praveen Jesudhas
 
Annintro
kaushaljha009
 
nural network ER. Abhishek k. upadhyay
abhishek upadhyay
 
CST413 KTU S7 CSE Machine Learning Neural Networks and Support Vector Machine...
resming1
 
Extreme learning machine:Theory and applications
James Chou
 
NN-Ch6.PDF
gnans Kgnanshek
 
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
IOSR Journals
 
Improving Performance of Back propagation Learning Algorithm
ijsrd.com
 
Optimal neural network models for wind speed prediction
IAEME Publication
 
Optimal neural network models for wind speed prediction
IAEME Publication
 
Ad

Recently uploaded (20)

PPTX
Stability of IBR Dominated Grids - IEEE PEDG 2025 - short.pptx
ssuser307730
 
PDF
CLIP_Internals_and_Architecture.pdf sdvsdv sdv
JoseLuisCahuanaRamos3
 
PPTX
Work at Height training for workers .pptx
cecos12
 
PDF
How to Buy Verified CashApp Accounts IN 2025
Buy Verified CashApp Accounts
 
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
AsadShad4
 
PDF
13th International Conference of Security, Privacy and Trust Management (SPTM...
ijcisjournal
 
PDF
NFPA 10 - Estandar para extintores de incendios portatiles (ed.22 ENG).pdf
Oscar Orozco
 
PDF
تقرير عن التحليل الديناميكي لتدفق الهواء حول جناح.pdf
محمد قصص فتوتة
 
PDF
PROGRAMMING REQUESTS/RESPONSES WITH GREATFREE IN THE CLOUD ENVIRONMENT
samueljackson3773
 
PPSX
OOPS Concepts in Python and Exception Handling
Dr. A. B. Shinde
 
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
AsadShad4
 
PPTX
Introduction to File Transfer Protocol with commands in FTP
BeulahS2
 
PPT
دراسة حاله لقرية تقع في جنوب غرب السودان
محمد قصص فتوتة
 
PDF
June 2025 Top 10 Sites -Electrical and Electronics Engineering: An Internatio...
elelijjournal653
 
PPTX
CST413 KTU S7 CSE Machine Learning Introduction Parameter Estimation MLE MAP ...
resming1
 
PDF
bs-en-12390-3 testing hardened concrete.pdf
ADVANCEDCONSTRUCTION
 
PPTX
Functions in Python Programming Language
BeulahS2
 
PPTX
Kel.3_A_Review_on_Internet_of_Things_for_Defense_v3.pptx
Endang Saefullah
 
PDF
June 2025 - Top 10 Read Articles in Network Security and Its Applications
IJNSA Journal
 
PDF
輪読会資料_Miipher and Miipher2 .
NABLAS株式会社
 
Stability of IBR Dominated Grids - IEEE PEDG 2025 - short.pptx
ssuser307730
 
CLIP_Internals_and_Architecture.pdf sdvsdv sdv
JoseLuisCahuanaRamos3
 
Work at Height training for workers .pptx
cecos12
 
How to Buy Verified CashApp Accounts IN 2025
Buy Verified CashApp Accounts
 
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
AsadShad4
 
13th International Conference of Security, Privacy and Trust Management (SPTM...
ijcisjournal
 
NFPA 10 - Estandar para extintores de incendios portatiles (ed.22 ENG).pdf
Oscar Orozco
 
تقرير عن التحليل الديناميكي لتدفق الهواء حول جناح.pdf
محمد قصص فتوتة
 
PROGRAMMING REQUESTS/RESPONSES WITH GREATFREE IN THE CLOUD ENVIRONMENT
samueljackson3773
 
OOPS Concepts in Python and Exception Handling
Dr. A. B. Shinde
 
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
AsadShad4
 
Introduction to File Transfer Protocol with commands in FTP
BeulahS2
 
دراسة حاله لقرية تقع في جنوب غرب السودان
محمد قصص فتوتة
 
June 2025 Top 10 Sites -Electrical and Electronics Engineering: An Internatio...
elelijjournal653
 
CST413 KTU S7 CSE Machine Learning Introduction Parameter Estimation MLE MAP ...
resming1
 
bs-en-12390-3 testing hardened concrete.pdf
ADVANCEDCONSTRUCTION
 
Functions in Python Programming Language
BeulahS2
 
Kel.3_A_Review_on_Internet_of_Things_for_Defense_v3.pptx
Endang Saefullah
 
June 2025 - Top 10 Read Articles in Network Security and Its Applications
IJNSA Journal
 
輪読会資料_Miipher and Miipher2 .
NABLAS株式会社
 
Ad

Adaptive modified backpropagation algorithm based on differential errors

  • 1. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.5, October 2011 DOI : 10.5121/ijcsea.2011.1503 21 Adaptive modified backpropagation algorithm based on differential errors S.Jeyaseeli Subavathia and T.Kathirvalavakumarb a Department of Information Technology, Sri Kaliswari College, Sivakasi – 626130, Tamilnadu, India b Department of Computer Science, V.H.N.S.N. College, Virudhunagar – 626001, Tamilnadu, India Abstract A new efficient modified back propagation algorithm with adaptive learning rate is proposed to increase the convergence speed and to minimize the error. The method eliminates initial fixing of learning rate through trial and error and replaces by adaptive learning rate. In each iteration, adaptive learning rate for output and hidden layer are determined by calculating differential linear and nonlinear errors of output layer and hidden layer separately. In this method, each layer has different learning rate in each iteration. The performance of the proposed algorithm is verified by the simulation results. Keywords Adaptive learning rate, Differential error, Linear error, Modified standard back propagation, Nonlinear error. 1. Introduction The classical method for training feedforward neural network (FNN) is the backpropagation algorithm (BP) [9] which is based on the steepest descent optimization technique. Training is usually carried out by iterative updating of weights based on the error signal. BP is a descent algorithm which attempts to minimize the error at each iteration. The weights of the network are adjusted by the algorithm such that the error is decreased along a descent direction [18]. Traditionally two parameters called learning rate and momentum factor are used for controlling the weight adjustment along the descent direction. Finding initial learning rate and fixed learning rate must be done with great care. If the learning rate is very large, then the learning may become unstable. If it is small, then often it is very slow for practical applications which leads to finding of fast learning algorithms [13].
  • 2. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.5, October 2011 22 Many techniques have been proposed to increase the convergence speed. Abid et al. [1] described modified BP algorithm (MBP) based on sum of linear and nonlinear errors of output neurons to improve the speed of convergence in minimum iterations. The algorithm converges faster than the standard BP algorithm. Some researchers focused on selection of better energy function [2,14] and selection of suitable learning rate and momentum [6,9,16,17]. Learning rate adaptation by sign changes will adapt the step size by having a separate learning rate for each connection [12] A problem with all of these techniques is their convergence to local minima. To solve this problem, global search algorithm like genetic algorithm have to be applied [4]. But searching for the global minimum may be trapped at local minima during gradient descent. Also if the network is trained with disturbances in the input, then global minimum point can not be found. So fast convergence and strong robustness may not be guaranteed. To solve these problems adaptive learning algorithms have been developed recently. Jeong and Lee [7] have proposed an adaptive algorithm based on first and second order derivatives of neural activation at hidden layers which results in hybrid learning rules. Sha and Bajic [13] have proposed an adaptive learning rate algorithm for I/O identification based on two ANNs using convergence analysis of the conventional gradient descent method. Xie and Zhang [15] have proposed variable learning rate LMS algorithm using Lyapunav method especially when there is noise in the input signal. Behera et al. [3] have described new learning algorithms LFI and LF II based on Lyapunov function for the training of feeforward neural networks. In this algorithm fixed learning parameters are replaced with adaptive learning parameters using convergence theorem based on Lyapunov stability theory. ]. Zhihong Man et al [19] proposed a new adaptive backpropagation algorithm based on lyapunov stability theory for neural networks. They showed that the candidate of a lyapunov function of the tracking error between the output of a neural network and the desired reference signal is chosen first, and the weights of a neural network are then updated from the output layer to input layer. Our previous work [8] describes a modified backpropagation algorithm in neighborhood based network by replacing fixed learning parameters by adaptive learning parameters. Here the parameters are calculated using convergence theorem based on Lyapunov stability theory. Iranmanesh and Mahdavi [11] have proposed a learning method using differential adaptive learning rate. In each iteration, the learning rate is updated according to the error of the output layer. The learning rate of the output layer is computed by differentiating the error of the output layer. The differentiation of the sigmoidal function of the sum of multiplication of error of each output layer neuron with corresponding weights is divided by the number of hidden neurons is used as an adaptive learning rate of hidden layer. We propose a new adaptive learning rate algorithm to speed up the learning process of the neural network. In the proposed algorithm separate adaptive learning rate is used in both hidden and output layers. In this, linear and nonlinear errors for each neuron in the output layer are multiplied with derivative of the corresponding neuron’s activation function, added and then differentiated to get the adaptive learning rate for the output layer. Linear and nonlinear error of each hidden neuron is multiplied with its corresponding output layer weights separately and then added. Then the value is divided by number of hidden neurons. The differentiation of the sigmoidal function of this value is used as a learning rate for the hidden layer. The efficiency of the proposed algorithm in terms of time and epochs shown by simulating the benchmark problems such as XOR, 3-bit parity, nonlinear function approximation problem and iris data sets.
  • 3. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.5, October 2011 23 The remaining of the paper is organized as follows: section 2 describes adaptive learning rate algorithm, section 3 describes the proposed algorithm and section 4 discusses the simulation results. 2. Training of Neural network Consider a single hidden layer feedforward neural network shown in Figure 1. A bias node is included in the input layer. Let X = (xi) be the input vector, Y = (yj) be the output vector and wji [s] be the weight of the ith unit in the (s-1)th layer to the jth unit in the sth layer. The activation function of both hidden and output layer neurons are assumed to be sigmoidal. Sequential mode training is applied here. Figure 1. Single hidden layer neural network . Standard BP (SBP) For each input pattern nonlinear output of the jth neuron of the output layer network is calculated as follows: 1 1 )1( − = ∑ − = s i n i s ji s j ywu s (1) ( ) s ju s j d e uf s j = + = − 1 1 )( (2) where n(s-1) represents number of neurons in the (s-1)th layer. SBP minimizes the following criterion equals to the sum of the squares of the errors between the actual s jy and the desired s jd outputs for a pattern p. ( )∑= = sn j s jp eE 1 2 1 (3) where the nonlinear error signal is )(1 s j s j s j dye −= (4) X1 Xn : : X0 (Bias) s-2 s-1 s Y1
  • 4. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.5, October 2011 24 The weight update rule is s ji ps ji w E w ∂ ∂ −=∆ µ (5) where µ is the fixed learning rate selected by trial and error. Substituting (3) in (5), the weight update rule becomes, s ji s js j s ji w y ew ∂ ∂ =∆ 1µ s ji s j s j s js j s ji w u u y ew ∂ ∂ ∂ ∂ =∆ 1µ ( ) 1 1 ' − =∆ s i s j s j s ji yufew µ (6) The estimated nonlinear error of the hidden layer (s-1) is as follows: ( ) s rj s r n r s r s j weufe s 1 1 1 1 ∑= − ′= (7) The weight update rule of the hidden layer is )1( )1( − − ∂ ∂ −=∆ s ji ps ji w E w µ (8) ( ) )2()1()1( 1 )1( ' −−−− =∆ s i s j s j s ji yufew µ (9) Now the weights of both hidden and output layer are updated using jijiji wtwtw ∆+=− )()1( (10) Modified BP For each input pattern the linear and nonlinear outputs of the jth neuron in output layer s of the network are calculated respectively as follows: 1 1 )1( − = ∑ − = s i n i s ji s j ywu s (11) ( ) s ju s j d e uf s j = + = − 1 1 )( (12) where n(s-1) represents number of neurons in the (s-1)th layer. The MBP approach minimizes modified form of criterion Ep used in standard BP algorithm. The criteria Ep is sum of the linear and nonlinear quadratic errors of the output neuron for the current pattern p.
  • 5. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.5, October 2011 25 ( ) ( )∑∑ == += ss s j s j n j s jp eeE 1 2 2 1 2 1 2 1 2 1 λ (13) where the nonlinear error signal is )(1 s j s j s j dye −= (14) and the linear error signal is )(2 s j s j s j ulye −= (15) Here ( )s j s j yfly 1− = (16) where s jy and s jd respectively are desired and current output for jth unit in the sth layer. p in (13) denotes the pth pattern and λ is the weighting coefficient. In the output layer the linear and nonlinear errors are known [1]. So the weight update rule [1] for the output layer is s ji ps ji w E w ∂ ∂ −=∆ µ (17) where µ is the fixed learning rate selected by trial and error. Substituting (13) in (17), the weight update rule becomes, s ji s js js ji s js j s ji w u e w y ew ∂ ∂ + ∂ ∂ =∆ 21 µλµ 1 21 − + ∂ ∂ ∂ ∂ =∆ s i s js ji s j s j s js j s ji ye w u u y ew µλµ ( ) 1 2 1 1 ' −− +=∆ s i s j s i s j s j s ji yeyufew µλµ (18) In the hidden layer, the linear and nonlinear errors are unknown and must be calculated [1]. The estimated nonlinear and linear error [1] of the hidden layer (s-1) are respectively as follows: ( ) s rj s r n r s r s j weufe s 1 1 1 1 ∑= − ′= (19) ( ) s rj n r s r s j s j weufe s ∑= −− ′= 1 2 11 2 (20) The weight update rule of the hidden layer is )1( )1( − − ∂ ∂ −=∆ s ji ps ji w E w µ (21) ( ) )2()1( 2 )2()1()1( 1 )1( ' −−−−−− +=∆ s i s j s i s j s j s ji yeyufew µλµ (22)
  • 6. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.5, October 2011 26 Now the weights of both hidden and output layer are updated using jijiji wtwtw ∆+=− )()1( (23) where t represents iteration. In order to increase the convergence speed and to make the learning rate µ adaptive , we propose a new technique based on differential linear and nonlinear errors of output layer and hidden layer. Adaptive Modified BP In the proposed technique first linear and nonlinear errors of jth neuron in the output layer s are calculated using (14), (15) and (16). Then all the linear and nonlinear errors of the neurons are multiplied with the derivative of the corresponding neuron’s activation function and added separately as shown below: )( 1 11 s j n j s jo ufe s ′= ∑= δ (24) )( 1 22 s j n j s jo ufe s ′= ∑= δ (25) Then 1oδ and 2oδ are added to get the total error 21 ooo δδδ += (26) Now the total error is divided by the total number of output neurons known as a δ s a a n δ δ = (27) and the outµ of the output layer s is computed as follows: ( )a out f δµ ′= (28) where f is a sigmoidal activation function given by ( ) ( )a e f a δ δ − + = 1 1 (29) with property ( ) ( ) ( )( )aaa fff δδδ −=′ 1 (30) Then the change of weights are calculated using ( ) 1 2 1 1 ' −− +=∆ s i s jout s i s j s jout s ji yeyufew λµµ (31) Similarly for the hidden layer (s-1) the same procedure is applied to calculate adaptive learning
  • 7. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.5, October 2011 27 rate hidµ . First nonlinear errors )1( 1 −s je and linear errors )1( 2 −s je of all hidden neurons are calculated using (19) and (20). Then nonlinear errors 1hδ and 2hδ respectively are s ji n i n j s jh we s s ∑∑ − = = − = )1( 1 1 )1( 11δ (32) s ji n i n j s jh we s s ∑∑ − = = − = )1( 1 1 )1( 22δ (33) and then both 1hδ and 2hδ are added to get the total error hδ as below: 21 hhh δδδ += (34) Now the total error is divided by the total number of hidden neurons known as b δ )1( − = s hb n δ δ (35) and then hidµ is computed as follows: ( )b hid f δµ ′= (36) where f is a sigmoidal activation function. Then the change of weights are calculated using the following equation. ( ) )2()1( 2 )2()1()1( 1 1( ' −−−−−− +=∆ s i s jhid s i s j s jhid s ji yeyufew λµµ (37) Now the weights of both hidden and output layer are updated using (23). 3. Algorithm 1. Define network structure and assign initial weights randomly. 2. Select a pattern to be processed in the network. 3. For each node in the hidden layer, compute a. Net value using Eq (11). b. Output value using Eq (12). 4. For the output layer, compute a. Net value using Eq (11) and output value using Eq (12). b. Non Linear and linear errors using Eq (14), Eq (15) and Eq (16). c. Adaptive learning rate outµ using Eq (24) to Eq (30). d. Change of weight using Eq (31). 5. For the hidden layer, compute a. Non Linear error using Eq (19). b. Linear error using Eq (20).
  • 8. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.5, October 2011 28 c. Adaptive learning rate hidµ using Eq (32) to Eq (36). d. Change of weight using Eq (37) 6. Update weights of output and hidden layer using Eq (23). 7. Repeat the steps 2 to 6 for all the patterns. 8. Evaluate network error with new weights. 9. Stop training if termination condition is reached. Otherwise repeat the steps 3 to 9. 4. Simulation Results and discussions The performance of the proposed algorithm is verified by simulating the benchmark problem such as XOR, 3-Bit parity, Nonlinear function approximation function problem and Iris data set. All the problems are simulated using language C on a Pentium IV with 2.40 GHz. The convergence property of the proposed algorithm is compared with MBP [1], Backpropagation with momentum (BPM) [9] and backpropagation algorithm [10]. Each time all the patterns in the problem have been used once in the network during training is called an epoch. Mean squared error (MSE) of the network is calculated by dividing the sum of squared linear error in each epoch by twice the number of patterns. Network structure, parameter values and termination condition are considered as constant for all the algorithms to have better comparison. Network weights are randomly and uniformly generated from the range [-5, +5]. The weighting coefficient λ is assigned the value 3.7. The convergence of the proposed algorithm is shown by the learning curve. XOR The network structure considered in this problem has 3 input neurons including bias, 4 hidden neurons and one output neuron. The termination condition fixed for convergence is MSE 0.001. The results obtained are tabulated in Table 1. Table 1: Comparison table for XOR problem ALGORITHM PARAMETERS EPOCHS TRAINING TIME MSE IN MSECS BP µ=1.15 754 0.000998 176 BPM µ=1.15 α=0.01 710 0.001 151 MBP µ=0.25 λ=0.01 501 0.001 115 Proposed λ=3.7 237 0.000987 49 It has been observed that the BP algorithm takes 176 msecs and 754 epochs to reach the minimum error. The proposed algorithm converges faster even the learning rate is not fixed in the beginning. Since the learning rate is adapted based on the error of output and hidden layers it takes minimum time of 49 msecs and minimum epochs of 237 for convergence. The learning
  • 9. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.5, October 2011 29 curve obtained is shown in Figure 2 for the proposed algorithm. The adaptive learning rate obtained based on the error of output layer and hidden layer are shown in Figure 3 and Figure 4. 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0 50 100 150 200 250 Epochs MSE Figure 2. Learning curve based on MSE and Epochs of XOR problem for the proposed algorithm 0 0.05 0.1 0.15 0.2 0.25 0.3 0 200 400 600 800 1000 Iterations Muout Figure 3. Adaptive learning rate of hidden layer. 0.243 0.244 0.245 0.246 0.247 0.248 0.249 0.25 0.251 0 200 400 600 800 1000 Iterations Muhidden Figure 4. Adaptive learning rate of output layer. 3-bit parity We used 4-9-1 ANN including one bias in input layer to simulate the 3-bit parity problem. The results obtained are tabulated in Table 2.
  • 10. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.5, October 2011 30 Table 2. Comparison table for the 3-bit parity problem ALGORITHM PARAMETERS EPOCHS TRAINING TIME MSE IN MSECS BP µ=1.15 1570 0.000999 379 BPM µ=1.15 α=0.01 1450 0.000998 364 MBP µ=0.25 λ=0.01 520 0.000995 126 Proposed λ=3.7 298 0.000997 77 From the table it has been observed that the proposed algorithm converges quickly within 77 msecs in 298 epochs. But the algorithm BP, BPM and MBP require 1570, 1450 and 520 epochs for convergence respectively. Also they require 379 msecs, 364 msecs and 126 msecs time to reach the termination condition MSE 0.001. All the algorithm except proposed algorithm take time to fix the learning rate. The best performance of the proposed algorithm is shown in Figure 5. 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0 100 200 300 400 Epochs MSE Figure 5. Learning curve based on MSE and Epochs of 3-bit parity problem for the proposed Algorithm Nonlinear function approximation problem A nonlinear function approximation with 8 input values xi is defined in this problem. The three output quantities yi are defined by the following equations ( ) 4876543211 xxxxxxxxy +++= ( ) 8876543212 xxxxxxxxy +++++++= ( ) 21 13 1 yy −= 500 number of input values xi ε (0,1) are randomly generated and the corresponding yi are calculated using the above equation. All the algorithms taken for comparison are assumed to have the network structure with 9 neurons in the input layer including bias, 5 neurons in the hidden layer and 3 neurons in the output layer. All the algorithms including proposed is fixed with the minimum error of MSE 0.004. The results obtained are tabulated in Table 3. It shows that the algorithms BP and BPM converge to MSE 0.004 in 590 epochs and 389 epochs within 612 msecs and 487 msecs respectively. But MBP converges to the termination condition with the maximum
  • 11. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.5, October 2011 31 of 75 epochs within 89 msecs. The proposed algorithm converges quickly in 25 epochs within 51 msecs. Table 3. Comparison table for the nonlinear function approximation problem. ALGORITHM PARAMETERS EPOCHS TRAINING TESTING TIME MSE MSE IN MSECS BP µ=1.15 590 0.003990 0.004285 612 BPM µ=1.15 α=0.01 389 0.003995 0.004125 487 MBP µ=0.25 λ=0.01 75 0.003574 0.003913 89 Proposed λ=3.7 25 0.003796 0.003835 51 The learning curve of the proposed algorithm is shown in Figure 6. Another set of 500 patterns are generated for testing. The testing MSE obtained for the proposed is 0.003835 and for the MBP is 0.003913. 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0 5 10 15 20 25 Epochs MSE Figure 6. Learning curve based on MSE and Epochs of Non linear function approximation problem for the proposed algorithm Iris data set The Iris data [5], is one of the best known databases in the pattern recognition literature. The data set contains three classes. Each class has 50 instances, totally 150 patterns are used. Among 75 patterns are used for training and the remaining for testing. All the values are normalized by dividing the value by 10. The network structure considered is 5-10-1 including one bias in the input layer. Table 4 shows the results obtained for all the algorithms taken for comparison.
  • 12. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.5, October 2011 32 Table 4. Comparison table for the Iris data set problem. ALGORITHM PARAMETERS EPOCHS TRAINING TESTING TIME MSE MSE IN MSECS BP µ=1.15 491 0.0003 0.008172 193 BPM µ=1.15 α=0.01 414 0.0003 0.008155 165 MBP µ=0.25 λ=0.01 368 0.0003 0.006869 143 Proposed λ=3.7 95 0.00029 0.006654 33 The proposed algorithm and MBP take minimum epochs of 95 and 368 and minimum time of 33 msecs and 143 msecs respectively. But BP and BP with momentum require 491 and 414 epochs and 193 msecs and 165 msecs respectively to reach the termination condition MSE 0.0003. Also the testing MSE obtained for the proposed algorithm is minimum. The learning curve drawn against epochs and MSE for the proposed algorithm is shown in Figure 7. 0 0.0005 0.001 0.0015 0.002 0.0025 0 50 100 150 Epochs MSE Figure 7. Learning curve based on MSE and epochs of Iris data set problem for the proposed algorithm. 4. Conclusion An efficient technique for adapting the learning rate in modified backpropagation algorithm for training sequential FNN is proposed. Here, the learning rate is adapted based on the differential linear and nonlinear errors of output and hidden layers. Separate adaptive learning rate is used for both hidden and output layer in each iteration. The time required to fix the learning rate by trial and error is saved. The proposed algorithm improves the convergence speed in terms of time and epochs which is shown by simulating four different problems. The main advantage of the proposed algorithm is no need to put effort to tune the learning parameter to obtain optimal convergence. The proposed algorithm is easy to implement and easy to compute learning rate for both hidden and output layer which modifies the values of weights and increases the convergence speed. The learning curve show that the convergence is guaranteed.
  • 13. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.5, October 2011 33 References [1] Abid S, Fnaiech F, Najim M, (2001), “A fast feedforward training algorithm using a modified form of the standard backpropagation algorithm,” IEEE Trans. Neural Networks 12 424-430. [2] Ahmad M, Salam F.M, (1992) “Supervised learning using cauchy energy function,” Proc. 2nd Int. Conf. Fuzzy logic neural networks, lizuka, Japan, 721-724. [3] Behera L, Kumar S, Patnaik A, (2006) “On adaptive learning rate that guarantees convergence in feedforward networks,” IEEE Trans. Neural Networks 17 1116-1125. [4] Bengio S, Bengio Y, Cloutier J, (1994) “Use of genetic programming for the search of a new learning rule for neural networks,” Proc.IEEE World Congr. Computational Intelligence and Evolutionary, 324-327 [5] Fisher R.A, (1936) “The use of multiple measurements in taxonomic problems,” Annual Eugenics 7 179-188. [6] Jacobs R.A, (1988) “Increased rates of convergence through learning rate adaptation,” Neural networks 1 295-307.. [7] Jeong S.Y, Lee S.Y, (2000) “Adaptive learning algorithms to incorporate additiona functional constraints into neural networks,” Neurocomputing 35 73-90. [8] Kathirvalavakumar T, Subavathi S.J, (2009) “Neighborhood based modified backpropagation algorithm using adaptive learning parameters for training feedforward neural networks,” Neurocomputing 72 3915-3921. [9] Rojas R, (1996) “Neural networks : a systematic introduction,”. Berlin. Springer verlag; 424-430. [10] Rumelhart DE, Hinton GE, Williams RJ, 1 (1986) “Learning internal representations by error propagations,” Parallel distributed processing: explorations in the microstructures of cognition, Cambridge(MA): MIT Press; 62-318. [11] Saeid Iranmanesh, Amin Mahadevi M, (2009) “Differential adaptive learning rate method for back propagation neural networks,” World Academy of Science, Engineering and Technology 50 285-288. [12] Sarkar D, (1995) “Methods to speed up error back propagation learning algorithm,” ACM Comput. Surv., 27 519-544 [13] Sha D, Bajic V.B, (1999) “Adaptive on-line ANN learning algorithm and application to identification of non-linear systems,” Informatica 23 521-529 [14] Van Ooyen A, Nienhuis B, (1992) “Improving the convergence of the backpropagation algorithm, Neural networks,” 5 465-471 [15] Xie S, Zhang C, (2006) “Variable learning rate LMS based linear adaptive inverse control,” Journal of information and computing science 1 139-148 [16] Yu C.C, Liu B.D, (2002) “A backpropagation algorithm with adaptive learning rate and momentum coefficient,” Proc.Int.Joint Conf. Neural networks(IJCNN'02), 2 1218-1223 [17] Yu X.H, Chen G.A, Cheng S.X, (1993) “Acceleration of backpropagation of learning using optimzed learning rate and momentum,” Electron.Lett, 29(14) 1288-1289. [18] Yahya H. Zweiri, (2006) “ Optimization of a three term backpropagation algorithm used for neural network learning “, International journal of Computational Intelligence 3 322 – 327. [19] Zhihong Man, Hong Ren Wu, Sophie Liu, Xinghuo Yu, (2006), “ A new adaptive backpropagation algorithm based on Lyapunov stability thory for neural networks”, IEEE Transactions on Neural Networks 17 1580-1591.
  • 14. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.5, October 2011 34 Authors T.Kathirvalavakumar received M.Sc. degree in Mathematics from Madurai Kamaraj University in 1986, Post Graduate Diplomo in Computer Applications from Bharathidasan University in 1987, M.Phil. degree in Computer Science from Bharathiar University in 1994 and the Ph.D. degree in Computer Science from University of Madras in 2004. Since 1987 he has been working as a Lecturer, currently Associate Professor in Computer Science at V.H.N.Senthikumara Nadar College, Virudhunagar, Tamilnadu, India. His research interests include Neural Networks and Applications, Pattern recognition and Data Mining. S.Jeyaseeli Subavathi received the MCA degree from Madurai Kamaraj University, in 1998 and M.Phil degree in Computer Science, from Mother Teresa Women's University in 2004. From January 2000 to April 2007 she worked as Lecturer in Computer Applications at SFR College, India. Since July 2007 she has been working as a Lecturer in Information Technology, Sri Kaliswari College, Sivakasi, Tamilnadu, India. At present she is a doctoral candidate in the Department of Computer Science at Madurai Kamaraj University, India. Her area of interests include Neural Networks and Data structures and algorithms.