SlideShare a Scribd company logo
ARTIFICIAL NEURAL
NETWORKS PARAMETERS
Setting ARTIFICIAL NEURAL
NETWORKS PARAMETERS
NEED FOR SETTING
PARAMETER VALUES
1. LOCAL MINIMA
w1 – Global Minima
w2, w3 – Local minima
W1 W2 W3
1
32
Erms
Erms
min
NEED FOR SETTING
PARAMETER VALUES
2. LEARNING RATE
 Small learning rate – Slow and lengthy
learning
 Large learning rate –
 Output may saturate
 or may swing across desired.
 May take too long to train.
3. Learning will improve and network training will
converge if inputs and outputs are statistical i.e.
numeric.
TYPES OF TRAINING
Supervised Training
•Supplies the neural network with inputs and the
desired Outputs
• Response of the network to the inputs is measured
•The weights are modified to reduce the difference
between the actual and desired outputs
Unsupervised Training
•Only supplies inputs
•The neural network adjusts its own weights so that
similar inputs cause similar outputs
•The network identifies the patterns and differences in
the inputs without any external assistance
I.INITIALISATION OF WEIGHTS
 Larger weights will drive the output of layer 1 to
saturation.
 Network will require larger training time to
emerge out of saturation.
 Weights chosen as :- Small weights
between -1 and 1
Or
between -0.5 and 0.5
INITIALISATION OF WEIGHTS
 PROBLEM ITH THIS CHOISE:
 If some of input parameters are very high, they
will predominate the output.
 e. g. x = [ 10 2 0.2 1]
 SOLUTION:
 Weights are initialized as inversely proportional
to input.
 Output will not depend on any individual
parameters, but total input as a whole.
RULE FOR INITIALISATION OF WEIGHTS
 Weight between input and 1st layer:
P
 vij = (1/2P) ∑p=1 (1/|xj|)
 P is total no of input patterns.
 Weight between 1st layer and output layer:
P
 wij = (1/2P) ∑p=1 (1/ f(∑ vij xj )
II. FREQUENCY OF WEIGHT UPDATES
 Per pattern training: Weight changes after every
input is applied.
 Input set repeated if NN is not trained yet.
 Per epoch training: Epoch is one iteration through the
process of providing the network with an input and
updating the network's weights
 Many epochs are required to train the neural
network
 weight changes as suggested by every input are
accumulated together into a single change at the
end of each epoch i.e. set of patterns.
 No change in weight at end of each input.
 Also called BATCH MODE training
FREQUENCY OF WEIGHT UPDATES
 Advantages / Disadvantages
 Batch mode training not possible for on-
line training.
 For large applications, with large training
time, parallel processing may reduce time
in batch mode training.
 Per pattern training is more expensive as
weight changes more often.
 Per pattern suitable for small NN and
small data set
III. LEARNING RATE
 FOR PERCEPTRON TRAINING
ALGORITHM
 Too small η – Very slow learning
 Too large η – Output may saturate on one
direction.
 η = 0 --- no weight change
 η = 1 --- Common Choice
PROBLEM WITH Ρ = 1
 If η = 1 ∆w = ± x
 New Output = (w + ∆w)t x
 Output = wtx ± xtx
 Here if wtx > xtx output will always be positive
and grows in one direction only.
 Should be - wtx < ∆wtx ∆w = ± x
 η |xtx| >| wtx |
 η > | wtx | / |xtx|
 η is normally between 0 and 1.
III. LEARNING RATE
 FOR BACK PROPAGATION ALGORITHM
 Large η in early iterations and steadily
decrease it when NN is converging.
 Increase η at every iteration that improves
performance by significant amount and vise
versa.
 Steadily double the η untill error value worsens.
 If Second derivative of E, ▼2E is constant and
low, Ρ can be large.
 If Second derivative of E, ▼2E is large, η can be
small.
 For above, more computation required.
MOMENTUM
•Training done to reduce this error.
•Training may stop at local minima instead
global minima.
MOMENTUM
 Can be prevented if weight changes depend
on average gradient of Error, rather than
gradient at a point.
 Averaging δE/ δw in a small neighborhood
leads the network in general direction of MSE
decrease without getting stuck at local
minima.
 May become complex.
MOMENTUM
 Shortcut method:
 Weight change at ith iteration of back
propagation algorithm also depends on
immediately preceding weight changes.
 This has an averaging effect.
 This diminishes drastic fluctuations in weight
changes over consecutive iterations.
 Achieved by adding momentum to weight
update rule.
MOMENTUM
 Δwkj(t+1) = ηδkxi + α∆wkj(t)
 ∆wkj(t) is weight change required at time t .
 α is a constant . α ≤ 1.
 Disadvantage:
 Past training trend can strongly bias current
training.
 α depends on application.
 α = 0, no effect of past value.
 α = 1, no effect of current value.
What constitutes a “good” training
set?
 Samples must represent the general population
 Samples must contain members of each class
 Samples in each class must contain a wide range
of variations or noise effect
GENERALIZABILITY
 Occurs more in large NN with less inputs.
 Inputs are repeated while training till error
reduces.
 This leads to network memorizing the inputs
samples.
 Such trained NN may behave correctly with
training data but fail with any unknown data.
 Also called over training.
GENERALIZABILITY- SOLUTION
 The set of all known samples is broken into
two orthogonal (independent) sets:
 Training set - A group of samples used to
train the neural network
 Testing set - A group of samples used to test
the performance of the neural network
◦ Used to estimate the error rate
 Training continues as long as error to test
data gradually reduces.
 Training terminates as soon as error on test
data increases.
GENERALIZABILITY
E
time
Error on test
data
Error on training
data
Time when
error on test
data starts to
increase
•Performance over test data is monitored over
several iterations, not just one iteration.
GENERALIZABILITY
 Weight will NOT change on test data.
 Overtraining can be avoided by using small
number of parameters (hidden nodes and
weights).
 If size of training set is small, multiple sets
can be created by adding small randomly
generated noise or displacement.
 X = { x1, x2, x3…..xn} then
 X’ = { x1+ß1, x2+ß2, x3+ß3… xn + ßn}
NO. OF HIDDEN LAYERS AND NODES
 Mostly obtained by trial and error.
 Too few nodes – NW may not be efficient.
 Too large nodes –
 Computation is tedious and expensive.
 NW may memorize the inputs and perform poorly on
test data.
 NW is called well trained if performs well on
data not used for testing.
 Hence NN should be capable of generalizing
from input, rather than memorizing the
inputs.
NO. OF HIDDEN LAYERS AND NODES
 Methods:
 Adaptive algorithm-
◦ Choose large number of nodes and train.
◦ Gradually discard nodes one by one during training.
◦ Train till performance reduces below unacceptable
level.
◦ NN to be retrained at each change in nodes.
◦ Or vice versa
◦ Choose small number of nodes and increase nodes
till performance is satisfactory.
Let’s see how NN size advances:
 Linear Classification:
L1 ax1+bx2+c>0
ax1+bx2+c<0
L1
Let’s see how NN size advances:
 Two class problem - Nonlinear
L1
L2 L11
L1
L2
Let’s see how NN size advances:
 Two class problem - Nonlinear
L1
L2 L11
L3
L4
L1
L2
L3
L4
P
Let’s see how NN size advances:
 Two class problem - Nonlinear
L22
PP1
P2
P3
P4
P1
P4
P2
P3
L22
P1
P2
P3
P4
L11
L11
L11
L11
NUMBER OF INPUT SAMPLES
 As a thumb rule: 5 to 10 times as many
samples as the number of weights to be trained.
 Baum and Haussler suggest:
◦ P > |w| /(1-a)
◦ P is number of samples,
◦ |w| is number of weights to be trained,
◦ a expected accuracy on test set.
Non-numeric inputs
 Nonnumeric inputs like colours have no
inherent order.
 Can not be depicted on an axis e.g. red-blue-
green-yellow.
 Colour becomes position sensitive. Results in
Erroneous training.
 Hence assign binary vector with component
corresponding to each colour. e.g.
 Green – 0 0 1 0 red – 1 0 0 0
 Blue – 0 1 0 0 yellow – 0 0 0 1
 But dimension increases drastically
Termination criteria
 “Halt when goal is achieved.”
 Perceptron training of linearly separable patterns –
◦ Correct classification of all samples.
◦ Termination is assured if ƞ is sufficiently small.
◦ Program may run indefinitely if ƞ is not appropriate.
◦ Different choice of if ƞ may yield classification.
 Back propagation algorithm using delta rule–
◦ Termination can never be achieved with above criteria as
output can never be +1 or -1.
◦ Will have to fix Emin , the minimum error acceptable.
Terminates as error goes below Emin.
Termination criteria
 Perceptron training of linearly non-separable
patterns –
◦ Above criteria will allow procedure to run indefinitely.
◦ Compare amount of progress in recent past.
◦ If number of misclassification has not changed in large
step, samples are not linearly separable.
◦ Can fix limit of minimum % of correct classification for
termination.

More Related Content

PPT
6.frequency domain image_processing
PPTX
Minmax Algorithm In Artificial Intelligence slides
PPTX
previous question solve of operating system.
PDF
Approximation Algorithms
PDF
Maximum Matching in General Graphs
PPT
Lecture 4 - Growth of Functions (1).ppt
PPTX
Gaussian noise
PPTX
Chapter 9 morphological image processing
6.frequency domain image_processing
Minmax Algorithm In Artificial Intelligence slides
previous question solve of operating system.
Approximation Algorithms
Maximum Matching in General Graphs
Lecture 4 - Growth of Functions (1).ppt
Gaussian noise
Chapter 9 morphological image processing

What's hot (20)

DOC
It 05104 digsig_1
PPTX
Back propagation method
PPT
HiperLAN.ppt
PPTX
Difference between wavelet transform and fourier transform
PPTX
Camera model ‫‬
PDF
Edge linking in image processing
PPTX
Hopfield Networks
PPT
Branch and bound
PDF
I. Mini-Max Algorithm in AI
PPTX
Presentation of daa on approximation algorithm and vertex cover problem
PPT
Recursion tree method
PPTX
Algorithm Using Divide And Conquer
PPT
0/1 knapsack
PPT
Iterative deepening search
PPT
Arithmetic coding
PPTX
NOISE FILTERS IN IMAGE PROCESSING
PDF
DIGITAL IMAGE PROCESSING - Visual perception - DAY 2
PPTX
Introduction to digital signal processing 2
It 05104 digsig_1
Back propagation method
HiperLAN.ppt
Difference between wavelet transform and fourier transform
Camera model ‫‬
Edge linking in image processing
Hopfield Networks
Branch and bound
I. Mini-Max Algorithm in AI
Presentation of daa on approximation algorithm and vertex cover problem
Recursion tree method
Algorithm Using Divide And Conquer
0/1 knapsack
Iterative deepening search
Arithmetic coding
NOISE FILTERS IN IMAGE PROCESSING
DIGITAL IMAGE PROCESSING - Visual perception - DAY 2
Introduction to digital signal processing 2
Ad

Similar to Setting Artificial Neural Networks parameters (20)

PPTX
Artificial Neural Networks , Recurrent networks , Perceptron's
PPT
Neural networks,Single Layer Feed Forward
PPTX
An Introduction to Deep Learning
PPTX
Introduction to Deep Learning
PPTX
Multilayer & Back propagation algorithm
PDF
Nural Network ppt presentation which help about nural
PPT
nural network ER. Abhishek k. upadhyay
PPTX
Nimrita deep learning
PDF
Artificial Neural Networks Lect3: Neural Network Learning rules
PPTX
08 neural networks
PPTX
ML_ Unit 2_Part_B
PPT
nural network ER. Abhishek k. upadhyay
PPTX
Artificial Neural Networks presentations
PDF
Neural networks
PPTX
Deep learning study 2
PPTX
Neural network basic and introduction of Deep learning
PPT
Back_propagation_algorithm.Back_propagation_algorithm.Back_propagation_algorithm
PPTX
Comprehensive Guide to Neural Networks in Machine Learning and Deep Learning ...
Artificial Neural Networks , Recurrent networks , Perceptron's
Neural networks,Single Layer Feed Forward
An Introduction to Deep Learning
Introduction to Deep Learning
Multilayer & Back propagation algorithm
Nural Network ppt presentation which help about nural
nural network ER. Abhishek k. upadhyay
Nimrita deep learning
Artificial Neural Networks Lect3: Neural Network Learning rules
08 neural networks
ML_ Unit 2_Part_B
nural network ER. Abhishek k. upadhyay
Artificial Neural Networks presentations
Neural networks
Deep learning study 2
Neural network basic and introduction of Deep learning
Back_propagation_algorithm.Back_propagation_algorithm.Back_propagation_algorithm
Comprehensive Guide to Neural Networks in Machine Learning and Deep Learning ...
Ad

More from Madhumita Tamhane (20)

PPTX
Fiber optic sensors
PDF
OFDM for LTE
PDF
Small cells I : Femto cell
PDF
Optical wireless communication li fi
PDF
Optical Fiber Communication Part 3 Optical Digital Receiver
PDF
Optical fiber communication Part 2 Sources and Detectors
PDF
Optical fiber communication Part 1 Optical Fiber Fundamentals
PDF
Colout TV Fundamentals
PDF
Black and white TV fundamentals
PDF
Telecommunication switching system
PDF
Data Link Synchronous Protocols - SDLC, HDLC
PDF
Data communication protocols in centralised networks (in master:slave environ...
PDF
Data link control line control unit LCU
PDF
Flyod's algorithm for finding shortest path
PDF
Line codes
PDF
ISDN Integrated Services Digital Network
PDF
Asynchronous Transfer Mode ATM
PDF
Weight enumerators of block codes and the mc williams
PDF
Justesen codes alternant codes goppa codes
Fiber optic sensors
OFDM for LTE
Small cells I : Femto cell
Optical wireless communication li fi
Optical Fiber Communication Part 3 Optical Digital Receiver
Optical fiber communication Part 2 Sources and Detectors
Optical fiber communication Part 1 Optical Fiber Fundamentals
Colout TV Fundamentals
Black and white TV fundamentals
Telecommunication switching system
Data Link Synchronous Protocols - SDLC, HDLC
Data communication protocols in centralised networks (in master:slave environ...
Data link control line control unit LCU
Flyod's algorithm for finding shortest path
Line codes
ISDN Integrated Services Digital Network
Asynchronous Transfer Mode ATM
Weight enumerators of block codes and the mc williams
Justesen codes alternant codes goppa codes

Recently uploaded (20)

PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
“Next-Gen AI: Trends Reshaping Our World”
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPT
Project quality management in manufacturing
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PPTX
web development for engineering and engineering
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PPTX
Sustainable Sites - Green Building Construction
PPTX
The-Looming-Shadow-How-AI-Poses-Dangers-to-Humanity.pptx
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PPTX
Unit 5 BSP.pptxytrrftyyydfyujfttyczcgvcd
PPTX
Internship_Presentation_Final engineering.pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
Structs to JSON How Go Powers REST APIs.pdf
Model Code of Practice - Construction Work - 21102022 .pdf
“Next-Gen AI: Trends Reshaping Our World”
CYBER-CRIMES AND SECURITY A guide to understanding
Project quality management in manufacturing
Arduino robotics embedded978-1-4302-3184-4.pdf
web development for engineering and engineering
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Embodied AI: Ushering in the Next Era of Intelligent Systems
Foundation to blockchain - A guide to Blockchain Tech
Strings in CPP - Strings in C++ are sequences of characters used to store and...
Sustainable Sites - Green Building Construction
The-Looming-Shadow-How-AI-Poses-Dangers-to-Humanity.pptx
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Lesson 3_Tessellation.pptx finite Mathematics
Unit 5 BSP.pptxytrrftyyydfyujfttyczcgvcd
Internship_Presentation_Final engineering.pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Structs to JSON How Go Powers REST APIs.pdf

Setting Artificial Neural Networks parameters

  • 1. ARTIFICIAL NEURAL NETWORKS PARAMETERS Setting ARTIFICIAL NEURAL NETWORKS PARAMETERS
  • 2. NEED FOR SETTING PARAMETER VALUES 1. LOCAL MINIMA w1 – Global Minima w2, w3 – Local minima W1 W2 W3 1 32 Erms Erms min
  • 3. NEED FOR SETTING PARAMETER VALUES 2. LEARNING RATE  Small learning rate – Slow and lengthy learning  Large learning rate –  Output may saturate  or may swing across desired.  May take too long to train. 3. Learning will improve and network training will converge if inputs and outputs are statistical i.e. numeric.
  • 4. TYPES OF TRAINING Supervised Training •Supplies the neural network with inputs and the desired Outputs • Response of the network to the inputs is measured •The weights are modified to reduce the difference between the actual and desired outputs Unsupervised Training •Only supplies inputs •The neural network adjusts its own weights so that similar inputs cause similar outputs •The network identifies the patterns and differences in the inputs without any external assistance
  • 5. I.INITIALISATION OF WEIGHTS  Larger weights will drive the output of layer 1 to saturation.  Network will require larger training time to emerge out of saturation.  Weights chosen as :- Small weights between -1 and 1 Or between -0.5 and 0.5
  • 6. INITIALISATION OF WEIGHTS  PROBLEM ITH THIS CHOISE:  If some of input parameters are very high, they will predominate the output.  e. g. x = [ 10 2 0.2 1]  SOLUTION:  Weights are initialized as inversely proportional to input.  Output will not depend on any individual parameters, but total input as a whole.
  • 7. RULE FOR INITIALISATION OF WEIGHTS  Weight between input and 1st layer: P  vij = (1/2P) ∑p=1 (1/|xj|)  P is total no of input patterns.  Weight between 1st layer and output layer: P  wij = (1/2P) ∑p=1 (1/ f(∑ vij xj )
  • 8. II. FREQUENCY OF WEIGHT UPDATES  Per pattern training: Weight changes after every input is applied.  Input set repeated if NN is not trained yet.  Per epoch training: Epoch is one iteration through the process of providing the network with an input and updating the network's weights  Many epochs are required to train the neural network  weight changes as suggested by every input are accumulated together into a single change at the end of each epoch i.e. set of patterns.  No change in weight at end of each input.  Also called BATCH MODE training
  • 9. FREQUENCY OF WEIGHT UPDATES  Advantages / Disadvantages  Batch mode training not possible for on- line training.  For large applications, with large training time, parallel processing may reduce time in batch mode training.  Per pattern training is more expensive as weight changes more often.  Per pattern suitable for small NN and small data set
  • 10. III. LEARNING RATE  FOR PERCEPTRON TRAINING ALGORITHM  Too small Ρ – Very slow learning  Too large Ρ – Output may saturate on one direction.  Ρ = 0 --- no weight change  Ρ = 1 --- Common Choice
  • 11. PROBLEM WITH Ρ = 1  If Ρ = 1 ∆w = Âą x  New Output = (w + ∆w)t x  Output = wtx Âą xtx  Here if wtx > xtx output will always be positive and grows in one direction only.  Should be - wtx < ∆wtx ∆w = Âą x  Ρ |xtx| >| wtx |  Ρ > | wtx | / |xtx|  Ρ is normally between 0 and 1.
  • 12. III. LEARNING RATE  FOR BACK PROPAGATION ALGORITHM  Large Ρ in early iterations and steadily decrease it when NN is converging.  Increase Ρ at every iteration that improves performance by significant amount and vise versa.  Steadily double the Ρ untill error value worsens.  If Second derivative of E, ▼2E is constant and low, Ρ can be large.  If Second derivative of E, ▼2E is large, Ρ can be small.  For above, more computation required.
  • 13. MOMENTUM •Training done to reduce this error. •Training may stop at local minima instead global minima.
  • 14. MOMENTUM  Can be prevented if weight changes depend on average gradient of Error, rather than gradient at a point.  Averaging δE/ δw in a small neighborhood leads the network in general direction of MSE decrease without getting stuck at local minima.  May become complex.
  • 15. MOMENTUM  Shortcut method:  Weight change at ith iteration of back propagation algorithm also depends on immediately preceding weight changes.  This has an averaging effect.  This diminishes drastic fluctuations in weight changes over consecutive iterations.  Achieved by adding momentum to weight update rule.
  • 16. MOMENTUM  Δwkj(t+1) = Ρδkxi + α∆wkj(t)  ∆wkj(t) is weight change required at time t .  Îą is a constant . Îą ≤ 1.  Disadvantage:  Past training trend can strongly bias current training.  Îą depends on application.  Îą = 0, no effect of past value.  Îą = 1, no effect of current value.
  • 17. What constitutes a “good” training set?  Samples must represent the general population  Samples must contain members of each class  Samples in each class must contain a wide range of variations or noise effect
  • 18. GENERALIZABILITY  Occurs more in large NN with less inputs.  Inputs are repeated while training till error reduces.  This leads to network memorizing the inputs samples.  Such trained NN may behave correctly with training data but fail with any unknown data.  Also called over training.
  • 19. GENERALIZABILITY- SOLUTION  The set of all known samples is broken into two orthogonal (independent) sets:  Training set - A group of samples used to train the neural network  Testing set - A group of samples used to test the performance of the neural network ◦ Used to estimate the error rate  Training continues as long as error to test data gradually reduces.  Training terminates as soon as error on test data increases.
  • 20. GENERALIZABILITY E time Error on test data Error on training data Time when error on test data starts to increase •Performance over test data is monitored over several iterations, not just one iteration.
  • 21. GENERALIZABILITY  Weight will NOT change on test data.  Overtraining can be avoided by using small number of parameters (hidden nodes and weights).  If size of training set is small, multiple sets can be created by adding small randomly generated noise or displacement.  X = { x1, x2, x3…..xn} then  X’ = { x1+ß1, x2+ß2, x3+ß3… xn + ßn}
  • 22. NO. OF HIDDEN LAYERS AND NODES  Mostly obtained by trial and error.  Too few nodes – NW may not be efficient.  Too large nodes –  Computation is tedious and expensive.  NW may memorize the inputs and perform poorly on test data.  NW is called well trained if performs well on data not used for testing.  Hence NN should be capable of generalizing from input, rather than memorizing the inputs.
  • 23. NO. OF HIDDEN LAYERS AND NODES  Methods:  Adaptive algorithm- ◦ Choose large number of nodes and train. ◦ Gradually discard nodes one by one during training. ◦ Train till performance reduces below unacceptable level. ◦ NN to be retrained at each change in nodes. ◦ Or vice versa ◦ Choose small number of nodes and increase nodes till performance is satisfactory.
  • 24. Let’s see how NN size advances:  Linear Classification: L1 ax1+bx2+c>0 ax1+bx2+c<0 L1
  • 25. Let’s see how NN size advances:  Two class problem - Nonlinear L1 L2 L11 L1 L2
  • 26. Let’s see how NN size advances:  Two class problem - Nonlinear L1 L2 L11 L3 L4 L1 L2 L3 L4 P
  • 27. Let’s see how NN size advances:  Two class problem - Nonlinear L22 PP1 P2 P3 P4 P1 P4 P2 P3
  • 29. NUMBER OF INPUT SAMPLES  As a thumb rule: 5 to 10 times as many samples as the number of weights to be trained.  Baum and Haussler suggest: ◦ P > |w| /(1-a) ◦ P is number of samples, ◦ |w| is number of weights to be trained, ◦ a expected accuracy on test set.
  • 30. Non-numeric inputs  Nonnumeric inputs like colours have no inherent order.  Can not be depicted on an axis e.g. red-blue- green-yellow.  Colour becomes position sensitive. Results in Erroneous training.  Hence assign binary vector with component corresponding to each colour. e.g.  Green – 0 0 1 0 red – 1 0 0 0  Blue – 0 1 0 0 yellow – 0 0 0 1  But dimension increases drastically
  • 31. Termination criteria  “Halt when goal is achieved.”  Perceptron training of linearly separable patterns – ◦ Correct classification of all samples. ◦ Termination is assured if ƞ is sufficiently small. ◦ Program may run indefinitely if ƞ is not appropriate. ◦ Different choice of if ƞ may yield classification.  Back propagation algorithm using delta rule– ◦ Termination can never be achieved with above criteria as output can never be +1 or -1. ◦ Will have to fix Emin , the minimum error acceptable. Terminates as error goes below Emin.
  • 32. Termination criteria  Perceptron training of linearly non-separable patterns – ◦ Above criteria will allow procedure to run indefinitely. ◦ Compare amount of progress in recent past. ◦ If number of misclassification has not changed in large step, samples are not linearly separable. ◦ Can fix limit of minimum % of correct classification for termination.