SlideShare a Scribd company logo
Dr. Kamal Gulati
Artificial Neural
Networks for data mining
Data Mining: Classification and
Prediction
• 1. Classification with decision trees
• 2. Artificial Neural Networks
1. CLASSIFICATION WITH DECISION
TREES
• Classification is the process of learning a model
that describes different classes of data. The
classes are predetermined.
• Example: In a banking application, customers
who apply for a credit card may be classify as a
“good risk”, a “fair risk” or a “poor risk”. Hence,
this type of activity is also called supervised
learning.
• Once the model is built, then it can be used to
classify new data.
• The first step, of learning the model, is accomplished by using a
training set of data that has already been classified. Each record in the
training data contains an attribute, called the class label, that indicates
which class the record belongs to.
• The model that is produced is usually in the form of a decision tree or a
set of rules.
• Some of the important issues with regard to the model and the
algorithm that produces the model include:
– the model’s ability to predict the correct class of the new data,
– the computational cost associated with the algorithm
– the scalability of the algorithm.
• Let examine the approach where the model is in the form of a decision
tree.
• A decision tree is simply a graphical representation of the description
of each class or in other words, a representation of the classification
rules.
• Example : Suppose that we have a database of
customers on the AllEletronics mailing list. The
database describes attributes of the customers, such as
their name, age, income, occupation, and credit rating.
The customers can be classified as to whether or not
they have purchased a computer at AllElectronics.
• Suppose that new customers are added to the
database and that you would like to notify these
customers of an upcoming computer sale. To send out
promotional literature to every new customers in the
database can be quite costly. A more cost-efficient
method would be to target only those new customers
who are likely to purchase a new computer. A
classification model can be constructed and used for
this purpose.
• The figure 2 shows a decision tree for the concept
buys_computer, indicating whether or not a customer
at AllElectronics is likely to purchase a computer.
Each internal node
represents a test on an
attribute. Each leaf node
represents a class.
A decision tree for the concept buys_computer, indicating whether or not a customer
at AllElectronics is likely to purchase a computer.
Training data tuples from the AllElectronics customer
database
age income student credit_rating
<=30 high no fair
<=30 high no excellent
31…40 high no fair
>40 medium no fair
>40 low yes fair
>40 low yes excellent
31…40 low yes excellent
<=30 medium no fair
<=30 low yes fair
>40 medium yes fair
<=30 medium yes excellent
31…40 medium no excellent
31…40 high yes fair
>40 medium no excellent
Class
No
No
Yes
Yes
Yes
No
Yes
No
Yes
Yes
Yes
Yes
Yes
No
8
age?
<= 30 >40
31…40
income student credit_rating class
high no fair no
high no excellent no
medium no fair no
low yes fair yes
medium yes excellent yes
income student credit_rating class
high no fair yes
low yes excellent yes
medium no excellent yes
high yes fair yes
income student credit_rating class
medium no fair yes
low yes fair yes
low yes excellent no
medium yes fair yes
medium no excellent no
9
Extracting Classification Rules from Trees
• Represent the knowledge in the form of IF-THEN rules
• One rule is created for each path from the root to a leaf
• Each attribute-value pair along a path forms a conjunction
• The leaf node holds the class prediction
• Rules are easier for humans to understand.
Example
IF age = “<=30” AND student = “no” THEN buys_computer = “no”
IF age = “<=30” AND student = “yes” THEN buys_computer = “yes”
IF age = “31…40” THEN buys_computer = “yes”
IF age = “>40” AND credit_rating = “excellent” THEN buys_computer =
“no”
IF age = “>40” AND credit_rating = “fair” THEN buys_computer = “yes”
10
1. NEURAL NETWORK REPRESENTATION
• An ANN is composed of processing elements called or perceptrons,
organized in different ways to form the network’s structure.
Processing Elements
• An ANN consists of perceptrons. Each of the perceptrons receives
inputs, processes inputs and delivers a single output.
The input can be raw input
data or the output of
other perceptrons. The
output can be the final
result (e.g. 1 means yes, 0
means no) or it can be
inputs to other
perceptrons.
11
The network
• Each ANN is composed of a collection of perceptrons grouped
in layers. A typical structure is shown in Fig.2.
Note the three layers:
input, intermediate
(called the hidden layer)
and output.
Several hidden layers can
be placed between the
input and output layers.
Figure 2
12
Appropriate Problems for Neural Network
• ANN learning is well-suited to problems in which the training data
corresponds to noisy, complex sensor data. It is also applicable to
problems for which more symbolic representations are used.
• The backpropagation (BP) algorithm is the most commonly used ANN
learning technique. It is appropriate for problems with the
characteristics:
– Input is high-dimensional discrete or real-valued (e.g. raw sensor input)
– Output is discrete or real valued
– Output is a vector of values
– Possibly noisy data
– Long training times accepted
– Fast evaluation of the learned function required.
– Not important for humans to understand the weights
• Examples:
– Speech phoneme recognition
– Image classification
– Financial prediction
13
NEURAL NETWORK APPLICATION
DEVELOPMENT
The development process for an ANN application has eight steps.
• Step 1: (Data collection) The data to be used for the training and
testing of ANN are collected. Important considerations
are that the particular problem is amenable to ANN solution and that
adequate data exist and can be obtained.
• Step 2: (Training and testing data separation) Trainning data must be
identified, and a plan must be made for testing the performance of
ANN. The available data are divided into training and testing data sets.
For a moderately sized data set, 80% of the data are randomly selected
for training, 10% for testing, and 10% secondary testing.
• Step 3: (Network architecture) A network architecture and a learning
method are selected. Important considerations are the exact number
of nodes and the number of layers.
14
• Step 4: (Parameter tuning and weight initialization) There are
parameters for tuning ANN to the desired learning
performance level. Part of this step is initialization of the
network weights and parameters, followed by modification of
the parameters as training performance feedback is received.
– Often, the initial values are important in determining the effectiveness
and length of training.
• Step 5: (Data transformation) Transforms the application data
into the type and format required by the ANN.
• Step 6: (Training) Training is conducted iteratively by
presenting input and known output data to the ANN. The ANN
computes the outputs and adjusts the weights until the
computed outputs are within an acceptable tolerance of the
known outputs for the input cases.
15
• Step 7: (Testing) Once the training has been completed, it is
necessary to test the network.
– The testing examines the performance of ANN using the derived
weights by measuring the ability of the network to classify the
testing data correctly.
– Black-box testing (comparing test results to historical results) is the
primary approach for verifying that inputs produce the appropriate
outputs.
• Step 8: (Implementation) Now a stable set of weights are
obtained.
– Now ANN can reproduce the desired output given inputs like those
in the training set.
– The ANN is ready to use as a stand-alone system or as part of
another software system where new input data will be presented
to it and its output will be a recommended decision.
16
BENEFITS AND LIMITATIONS OF NEURAL NETWORKS
6.1 Benefits of ANNs
• Usefulness for pattern recognition, classification, generalization,
abstraction and interpretation of imcomplete and noisy inputs. (e.g.
handwriting recognition, image recognition, voice and speech
recognition, weather forecasing).
• Providing some human characteristics to problem solving that are
difficult to simulate using the logical, analytical techniques of expert
systems and standard software technologies. (e.g. financial
applications).
• Ability to solve new kinds of problems. ANNs are particularly effective
at solving problems whose solutions are difficult to define. This
opened up a new range of decision support applications formerly
either difficult or impossible to computerize.
[Artificial] Neural Networks
• A class of powerful, general-purpose tools readily applied to:
– Prediction
– Classification
– Clustering
• Biological Neural Net (human brain) is the most powerful – we
can generalize from experience
• Computers are best at following pre-determined instructions
• Computerized Neural Nets attempt to bridge the gap
– Predicting time-series in financial world
– Diagnosing medical conditions
– Identifying clusters of valuable customers
– Fraud detection
– Etc…
Neural Networks
• When applied in well-defined domains, their ability
to generalize and learn from data “mimics” a
human’s ability to learn from experience.
• Very useful in Data Mining…better results are the
hope
• Drawback – training a neural network results in
internal weights distributed throughout the network
making it difficult to understand why a solution is
valid
Neural Networks
What is a Neural Network?
Similarity with biological network
Fundamental processing elements of a neural network
is a neuron
1.Receives inputs from other source
2.Combines them in someway
3.Performs a generally nonlinear operation on the result
4.Outputs the final result
•Biologically motivated approach to
machine learning
Neural Network History
• 1930s thru 1970s
• 1980s:
– Back propagation – better way of training a neural net
– Computing power became available
– Researchers became more comfortable with n-nets
– Relevant operational data more accessible
– Useful applications (expert systems) emerged
• Check out Fair Isaac (www.fairisaac.com) which has a
division here in San Diego (formerly HNC)
Neural Network
• Neural Network learns by adjusting the weights so as
to be able to correctly classify the training data and
hence, after testing phase, to classify unknown data.
• Neural Network needs long time for training.
• Neural Network has a high tolerance to noisy and
incomplete data
Neural Network Classifier
• Input: Classification data
It contains classification attribute
• Data is divided, as in any classification problem.
[Training data and Testing data]
• All data must be normalized.
(i.e. all values of attributes in the database are changed to contain values in
the internal [0,1] or[-1,1])
Neural Network can work with data in the range of (0,1) or (-1,1)
• Two basic normalization techniques
[1] Max-Min normalization
[2] Decimal Scaling normalization
Real Estate Appraiser
Loan Prospector – HNC/Fair Isaac
• A Neural Network (Expert System) is like a black box that knows how
to process inputs to create a useful output.
• The calculation(s) are quite complex and difficult to understand
Neural Net Limitations
• Neural Nets are good for prediction and estimation
when:
– Inputs are well understood
– Output is well understood
– Experience is available for examples to use to “train” the
neural net application (expert system)
• Neural Nets are only as good as the training set used
to generate it. The resulting model is static and must
be updated with more recent examples and
retraining for it to stay relevant
Neural Network Training
• Training is the process of setting the best weights on the
edges connecting all the units in the network
• The goal is to use the training set to calculate weights where
the output of the network is as close to the desired output as
possible for as many of the examples in the training set as
possible
• Back propagation has been used since the 1980s to adjust the
weights (other methods are now available):
– Calculates the error by taking the difference between the calculated
result and the actual result
– The error is fed back through the network and the weights are
adjusted to minimize the error
27
Introduction
• Data Mining Definitions:
– Building compact and understandable models
incorporating the relationships between the
description of a situation and a result concerning
the situation.
– Extraction of interesting (non-trivial, implicit,
previously unknown and potentially useful)
information or patterns from data in large
databases.
28
Kinds of Data Mining Problems
• Classification / Segmentation
• Forecasting/Prediction (how much)
• Association rule extraction (market basket
analysis)
• Sequence detection
29
Data Mining Techniques:
• Neural Networks
• Decision Trees
• Multivariate Adaptive Regression Splines
(MARS)
• Rule Induction
• Nearest Neighbor Method and discriminant
analysis
• Genetic Algorithms
• Boosting
30
Neural Networks
• What are they?
– Based on early research aimed at representing the
way the human brain works
– Neural networks are composed of many
processing units called neurons
• Types (Supervised versus Unsupervised)
• Training
31
Neural Networks are great, but..
• Problem 1: The black box model!
– Solution: 1. Do we really need to know?
– Solution 2. Rule Extraction techniques
• Problem 2: Long training times
– Solution 1: Get a faster PC with lots of RAM
– Solution 2: Use faster algorithms “For example:
Quickprop”
• Problems 3-: Back propagation
– Solution: Evolutionary Neural Networks!
32
Rule Extraction Techniques
• Representation Methods
• Extraction Strategy
• Network Requirement
Neural Network Concepts
• Neural networks (NN): a brain metaphor for
information processing
• Neural computing
• Artificial neural network (ANN)
• Many uses for ANN for
– pattern recognition, forecasting, prediction, and
classification
• Many application areas
– finance, marketing, manufacturing, operations,
information systems, and so on
Biological Neural Networks
Soma
Axon
Axon
Synapse
Synapse
Dendrites
Dendrites Soma
• Two interconnected brain cells (neurons)
Biology Analogy
Processing Information in ANN
w1
w2
wn
x1
x2
xn
.
.
.
Y
Y1
Yn
Y2
Inputs Weights Outputs
.
.
.
Neuron (or PE)


n
i
iiWXS
1
)( Sf
Summation
Transfer
Function
• A single neuron (processing element – PE) with
inputs and outputs
Elements of ANN
• Processing element (PE)
• Network architecture
– Hidden layers
– Parallel processing
• Network information processing
– Inputs
– Outputs
– Connection weights
– Summation function
Elements of ANN
• Processing element (PE)
• Network architecture
– Hidden layers
– Parallel processing
• Network information processing
– Inputs
– Outputs
– Connection weights
– Summation function
Neural Network Architectures
Recurrent Neural Networks
Learning in ANN
• A process by which a neural network learns the
underlying relationship between input and outputs,
or just among the inputs
• Supervised learning
– For prediction type problems
– E.g., backpropagation
• Unsupervised learning
– For clustering type problems
– Self-organizing
– E.g., adaptive resonance theory
A Taxonomy of ANN Learning
Algorithms
Learning Algorithms
Discrete/binary input Continuous Input
Surepvised Unsupervised
· Delta rule
· Gradient Descent
· Competitive learning
· Neocognitron
· Perceptor
· Simple Hopefield
· Outerproduct AM
· Hamming Net
· ART-1
· Carpenter /
Grossberg
· ART-3
· SOFM (or SOM)
· Other clustering
algorithms
Architectures
Supervised Unsupervised
Recurrent Feedforward Extimator Extractor
· Hopefield · SOFM (or SOM)· Nonlinear vs. linear
· Backpropagation
· ML perceptron
· Boltzmann
· ART-1
· ART-2
UnsupervisedSurepvised
A Supervised Learning Process
Compute
output
Is desired
output
achieved?
Stop
learning
Adjust
weights
Yes
No
ANN
Model
Three-step process:
1. Compute temporary
outputs
2. Compare outputs with
desired targets
3. Adjust the weights and
repeat the process
How a Network Learns
• Example: single neuron that learns the
inclusive OR operation
* See your book for step-by-step progression of the learning process
Learning parameters:
 Learning rate
 Momentum
Backpropagation Learning
• Backpropagation of Error for a Single Neuron
w1
w2
wn
x1
x2
xn
.
.
.
Yi
Neuron (or PE)


n
i
iiWXS
1
)( Sf
Summation
Transfer
Function
)(SfY 
a(Zi – Yi)
error
Backpropagation Learning
• The learning algorithm procedure:
1. Initialize weights with random values and set other
network parameters
2. Read in the inputs and the desired outputs
3. Compute the actual output (by working forward
through the layers)
4. Compute the error (difference between the actual and
desired output)
5. Change the weights by working backward through the
hidden layers
6. Repeat steps 2-5 until weights stabilize
Development Process of an ANN
Neural Network Architectures
• Architecture of a neural network is driven by the
task it is intended to address
– Classification, regression, clustering, general
optimization, association, ….
• Most popular architecture: Feedforward, multi-
layered perceptron with backpropagation learning
algorithm
– Used for both classification and regression type
problems
Other Popular ANN Paradigms
Self Organizing Maps (SOM)
• Applications of SOM
– Customer segmentation
– Bibliographic classification
– Image-browsing systems
– Medical diagnosis
– Interpretation of seismic activity
– Speech recognition
– Data compression
– Environmental modeling, many more …
Applications Types of ANN
• Classification
– Feedforward networks (MLP), radial basis function, and
probabilistic NN
• Regression
– Feedforward networks (MLP), radial basis function
• Clustering
– Adaptive Resonance Theory (ART) and SOM
• Association
– Hopfield networks
• Provide examples for each type?
Advantages of ANN
• Able to deal with (identify/model) highly
nonlinear relationships
• Not prone to restricting normality and/or
independence assumptions
• Can handle variety of problem types
• Usually provides better results (prediction and/or
clustering) compared to its statistical
counterparts
• Handles both numerical and categorical variables
(transformation needed!)
Disadvantages of ANN
• They are deemed to be black-box solutions, lacking
expandability
• It is hard to find optimal values for large number of
network parameters
– Optimal design is still an art: requires expertise and
extensive experimentation
• It is hard to handle large number of variables
(especially the rich nominal attributes)
• Training may take a long time for large datasets;
which may require case sampling
ANN Software
• Standalone ANN software tool
– NeuroSolutions
– BrainMaker
– NeuralWare
– NeuroShell, … for more (see pcai.com) …
• Part of a data mining software suit
– PASW (formerly SPSS Clementine)
– SAS Enterprise Miner
– Statistica Data Miner, … many more …
Applications-I
• Handwritten Digit Recognition
• Face recognition
• Time series prediction
• Process identification
• Process control
• Optical character recognition
Application-II
• Forecasting/Market Prediction: finance and banking
• Manufacturing: quality control, fault diagnosis
• Medicine: analysis of electrocardiogram data, RNA & DNA
sequencing, drug development without animal testing
• Control: process, robotics

More Related Content

PDF
Classification in Data Mining
PPTX
Data mining technique (decision tree)
PPTX
Data Mining
PPT
Chapter 8. Classification Basic Concepts.ppt
PPTX
Introduction to-data-mining chapter 1
PPT
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
PPTX
3 Data Mining Tasks
PPTX
Web Mining & Text Mining
Classification in Data Mining
Data mining technique (decision tree)
Data Mining
Chapter 8. Classification Basic Concepts.ppt
Introduction to-data-mining chapter 1
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
3 Data Mining Tasks
Web Mining & Text Mining

What's hot (20)

PPT
2.3 bayesian classification
PPTX
Semantic net in AI
PPTX
Icon based visualization techniques
PPT
PPT
5.3 mining sequential patterns
PDF
I. Alpha-Beta Pruning in ai
PPTX
Fragmentation and types of fragmentation in Distributed Database
PDF
Data Mining: Association Rules Basics
PPT
Mining Frequent Patterns, Association and Correlations
PPTX
Forward and Backward chaining in AI
PPTX
Decision tree induction \ Decision Tree Algorithm with Example| Data science
PPTX
Association rule mining.pptx
PPTX
Presentation on supervised learning
PPTX
04 Classification in Data Mining
PPTX
Data mining primitives
PPTX
Ooad unit – 1 introduction
PPTX
Decision Trees
PPTX
Data mining techniques unit III
PPT
Data preprocessing
PPTX
Module 4 part_1
2.3 bayesian classification
Semantic net in AI
Icon based visualization techniques
5.3 mining sequential patterns
I. Alpha-Beta Pruning in ai
Fragmentation and types of fragmentation in Distributed Database
Data Mining: Association Rules Basics
Mining Frequent Patterns, Association and Correlations
Forward and Backward chaining in AI
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Association rule mining.pptx
Presentation on supervised learning
04 Classification in Data Mining
Data mining primitives
Ooad unit – 1 introduction
Decision Trees
Data mining techniques unit III
Data preprocessing
Module 4 part_1
Ad

Similar to Artificial Neural Networks for Data Mining (20)

PDF
A Seminar Report On NEURAL NETWORK
PPTX
Artificial Neural Network
PDF
Artificial Neural Networks: Applications In Management
PPTX
Artifical Neural Network
PPTX
Dr. Syed Muhammad Ali Tirmizi - Special topics in finance lec 14
PDF
IRJET- The Essentials of Neural Networks and their Applications
PPTX
Artificial Neural Network
PPTX
02 Fundamental Concepts of ANN
PPT
AI-CH5 (ANN) - Artificial Neural Network
PPTX
Artificial Neural Networks for NIU
PPTX
Introduction to Artificial Neural Networks
DOCX
Artificial Intelligence.docx
PPTX
Machine Learning AND Deep Learning for OpenPOWER
PPT
LearningAG.ppt
PPT
SET-02_SOCS_ESE-DEC23__B.Tech%20(CSE-H+NH)-AIML_5_CSAI300
PDF
10 Things Every PHP Developer Should Know About Machine Learning
PPT
Neural-Networks.ppt
PPTX
Presentationnnnn
PPTX
machine learning in the age of big data: new approaches and business applicat...
PPT
Machine Learning
A Seminar Report On NEURAL NETWORK
Artificial Neural Network
Artificial Neural Networks: Applications In Management
Artifical Neural Network
Dr. Syed Muhammad Ali Tirmizi - Special topics in finance lec 14
IRJET- The Essentials of Neural Networks and their Applications
Artificial Neural Network
02 Fundamental Concepts of ANN
AI-CH5 (ANN) - Artificial Neural Network
Artificial Neural Networks for NIU
Introduction to Artificial Neural Networks
Artificial Intelligence.docx
Machine Learning AND Deep Learning for OpenPOWER
LearningAG.ppt
SET-02_SOCS_ESE-DEC23__B.Tech%20(CSE-H+NH)-AIML_5_CSAI300
10 Things Every PHP Developer Should Know About Machine Learning
Neural-Networks.ppt
Presentationnnnn
machine learning in the age of big data: new approaches and business applicat...
Machine Learning
Ad

More from Amity University | FMS - DU | IMT | Stratford University | KKMI International Institute | AIMA | DTU (20)

PPTX
Concept of Governance - Management of Operational Risk for IT Officers/Execut...
PPTX
Models of SDLC (Software Development Life Cycle / Program Development Life Cy...
PPTX
CLOUD SECURITY IN INSURANCE INDUSTRY WITH RESPECT TO INDIAN MARKET
Concept of Governance - Management of Operational Risk for IT Officers/Execut...
Models of SDLC (Software Development Life Cycle / Program Development Life Cy...
CLOUD SECURITY IN INSURANCE INDUSTRY WITH RESPECT TO INDIAN MARKET

Recently uploaded (20)

PDF
Sports Quiz easy sports quiz sports quiz
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
GDM (1) (1).pptx small presentation for students
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Computing-Curriculum for Schools in Ghana
PDF
Complications of Minimal Access Surgery at WLH
PPTX
Cell Structure & Organelles in detailed.
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Institutional Correction lecture only . . .
PDF
RMMM.pdf make it easy to upload and study
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Pre independence Education in Inndia.pdf
Sports Quiz easy sports quiz sports quiz
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPH.pptx obstetrics and gynecology in nursing
Final Presentation General Medicine 03-08-2024.pptx
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
GDM (1) (1).pptx small presentation for students
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Computing-Curriculum for Schools in Ghana
Complications of Minimal Access Surgery at WLH
Cell Structure & Organelles in detailed.
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
Supply Chain Operations Speaking Notes -ICLT Program
Institutional Correction lecture only . . .
RMMM.pdf make it easy to upload and study
STATICS OF THE RIGID BODIES Hibbelers.pdf
Microbial disease of the cardiovascular and lymphatic systems
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
O5-L3 Freight Transport Ops (International) V1.pdf
Pre independence Education in Inndia.pdf

Artificial Neural Networks for Data Mining

  • 1. Dr. Kamal Gulati Artificial Neural Networks for data mining
  • 2. Data Mining: Classification and Prediction • 1. Classification with decision trees • 2. Artificial Neural Networks
  • 3. 1. CLASSIFICATION WITH DECISION TREES • Classification is the process of learning a model that describes different classes of data. The classes are predetermined. • Example: In a banking application, customers who apply for a credit card may be classify as a “good risk”, a “fair risk” or a “poor risk”. Hence, this type of activity is also called supervised learning. • Once the model is built, then it can be used to classify new data.
  • 4. • The first step, of learning the model, is accomplished by using a training set of data that has already been classified. Each record in the training data contains an attribute, called the class label, that indicates which class the record belongs to. • The model that is produced is usually in the form of a decision tree or a set of rules. • Some of the important issues with regard to the model and the algorithm that produces the model include: – the model’s ability to predict the correct class of the new data, – the computational cost associated with the algorithm – the scalability of the algorithm. • Let examine the approach where the model is in the form of a decision tree. • A decision tree is simply a graphical representation of the description of each class or in other words, a representation of the classification rules.
  • 5. • Example : Suppose that we have a database of customers on the AllEletronics mailing list. The database describes attributes of the customers, such as their name, age, income, occupation, and credit rating. The customers can be classified as to whether or not they have purchased a computer at AllElectronics. • Suppose that new customers are added to the database and that you would like to notify these customers of an upcoming computer sale. To send out promotional literature to every new customers in the database can be quite costly. A more cost-efficient method would be to target only those new customers who are likely to purchase a new computer. A classification model can be constructed and used for this purpose. • The figure 2 shows a decision tree for the concept buys_computer, indicating whether or not a customer at AllElectronics is likely to purchase a computer.
  • 6. Each internal node represents a test on an attribute. Each leaf node represents a class. A decision tree for the concept buys_computer, indicating whether or not a customer at AllElectronics is likely to purchase a computer.
  • 7. Training data tuples from the AllElectronics customer database age income student credit_rating <=30 high no fair <=30 high no excellent 31…40 high no fair >40 medium no fair >40 low yes fair >40 low yes excellent 31…40 low yes excellent <=30 medium no fair <=30 low yes fair >40 medium yes fair <=30 medium yes excellent 31…40 medium no excellent 31…40 high yes fair >40 medium no excellent Class No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No
  • 8. 8 age? <= 30 >40 31…40 income student credit_rating class high no fair no high no excellent no medium no fair no low yes fair yes medium yes excellent yes income student credit_rating class high no fair yes low yes excellent yes medium no excellent yes high yes fair yes income student credit_rating class medium no fair yes low yes fair yes low yes excellent no medium yes fair yes medium no excellent no
  • 9. 9 Extracting Classification Rules from Trees • Represent the knowledge in the form of IF-THEN rules • One rule is created for each path from the root to a leaf • Each attribute-value pair along a path forms a conjunction • The leaf node holds the class prediction • Rules are easier for humans to understand. Example IF age = “<=30” AND student = “no” THEN buys_computer = “no” IF age = “<=30” AND student = “yes” THEN buys_computer = “yes” IF age = “31…40” THEN buys_computer = “yes” IF age = “>40” AND credit_rating = “excellent” THEN buys_computer = “no” IF age = “>40” AND credit_rating = “fair” THEN buys_computer = “yes”
  • 10. 10 1. NEURAL NETWORK REPRESENTATION • An ANN is composed of processing elements called or perceptrons, organized in different ways to form the network’s structure. Processing Elements • An ANN consists of perceptrons. Each of the perceptrons receives inputs, processes inputs and delivers a single output. The input can be raw input data or the output of other perceptrons. The output can be the final result (e.g. 1 means yes, 0 means no) or it can be inputs to other perceptrons.
  • 11. 11 The network • Each ANN is composed of a collection of perceptrons grouped in layers. A typical structure is shown in Fig.2. Note the three layers: input, intermediate (called the hidden layer) and output. Several hidden layers can be placed between the input and output layers. Figure 2
  • 12. 12 Appropriate Problems for Neural Network • ANN learning is well-suited to problems in which the training data corresponds to noisy, complex sensor data. It is also applicable to problems for which more symbolic representations are used. • The backpropagation (BP) algorithm is the most commonly used ANN learning technique. It is appropriate for problems with the characteristics: – Input is high-dimensional discrete or real-valued (e.g. raw sensor input) – Output is discrete or real valued – Output is a vector of values – Possibly noisy data – Long training times accepted – Fast evaluation of the learned function required. – Not important for humans to understand the weights • Examples: – Speech phoneme recognition – Image classification – Financial prediction
  • 13. 13 NEURAL NETWORK APPLICATION DEVELOPMENT The development process for an ANN application has eight steps. • Step 1: (Data collection) The data to be used for the training and testing of ANN are collected. Important considerations are that the particular problem is amenable to ANN solution and that adequate data exist and can be obtained. • Step 2: (Training and testing data separation) Trainning data must be identified, and a plan must be made for testing the performance of ANN. The available data are divided into training and testing data sets. For a moderately sized data set, 80% of the data are randomly selected for training, 10% for testing, and 10% secondary testing. • Step 3: (Network architecture) A network architecture and a learning method are selected. Important considerations are the exact number of nodes and the number of layers.
  • 14. 14 • Step 4: (Parameter tuning and weight initialization) There are parameters for tuning ANN to the desired learning performance level. Part of this step is initialization of the network weights and parameters, followed by modification of the parameters as training performance feedback is received. – Often, the initial values are important in determining the effectiveness and length of training. • Step 5: (Data transformation) Transforms the application data into the type and format required by the ANN. • Step 6: (Training) Training is conducted iteratively by presenting input and known output data to the ANN. The ANN computes the outputs and adjusts the weights until the computed outputs are within an acceptable tolerance of the known outputs for the input cases.
  • 15. 15 • Step 7: (Testing) Once the training has been completed, it is necessary to test the network. – The testing examines the performance of ANN using the derived weights by measuring the ability of the network to classify the testing data correctly. – Black-box testing (comparing test results to historical results) is the primary approach for verifying that inputs produce the appropriate outputs. • Step 8: (Implementation) Now a stable set of weights are obtained. – Now ANN can reproduce the desired output given inputs like those in the training set. – The ANN is ready to use as a stand-alone system or as part of another software system where new input data will be presented to it and its output will be a recommended decision.
  • 16. 16 BENEFITS AND LIMITATIONS OF NEURAL NETWORKS 6.1 Benefits of ANNs • Usefulness for pattern recognition, classification, generalization, abstraction and interpretation of imcomplete and noisy inputs. (e.g. handwriting recognition, image recognition, voice and speech recognition, weather forecasing). • Providing some human characteristics to problem solving that are difficult to simulate using the logical, analytical techniques of expert systems and standard software technologies. (e.g. financial applications). • Ability to solve new kinds of problems. ANNs are particularly effective at solving problems whose solutions are difficult to define. This opened up a new range of decision support applications formerly either difficult or impossible to computerize.
  • 17. [Artificial] Neural Networks • A class of powerful, general-purpose tools readily applied to: – Prediction – Classification – Clustering • Biological Neural Net (human brain) is the most powerful – we can generalize from experience • Computers are best at following pre-determined instructions • Computerized Neural Nets attempt to bridge the gap – Predicting time-series in financial world – Diagnosing medical conditions – Identifying clusters of valuable customers – Fraud detection – Etc…
  • 18. Neural Networks • When applied in well-defined domains, their ability to generalize and learn from data “mimics” a human’s ability to learn from experience. • Very useful in Data Mining…better results are the hope • Drawback – training a neural network results in internal weights distributed throughout the network making it difficult to understand why a solution is valid
  • 19. Neural Networks What is a Neural Network? Similarity with biological network Fundamental processing elements of a neural network is a neuron 1.Receives inputs from other source 2.Combines them in someway 3.Performs a generally nonlinear operation on the result 4.Outputs the final result •Biologically motivated approach to machine learning
  • 20. Neural Network History • 1930s thru 1970s • 1980s: – Back propagation – better way of training a neural net – Computing power became available – Researchers became more comfortable with n-nets – Relevant operational data more accessible – Useful applications (expert systems) emerged • Check out Fair Isaac (www.fairisaac.com) which has a division here in San Diego (formerly HNC)
  • 21. Neural Network • Neural Network learns by adjusting the weights so as to be able to correctly classify the training data and hence, after testing phase, to classify unknown data. • Neural Network needs long time for training. • Neural Network has a high tolerance to noisy and incomplete data
  • 22. Neural Network Classifier • Input: Classification data It contains classification attribute • Data is divided, as in any classification problem. [Training data and Testing data] • All data must be normalized. (i.e. all values of attributes in the database are changed to contain values in the internal [0,1] or[-1,1]) Neural Network can work with data in the range of (0,1) or (-1,1) • Two basic normalization techniques [1] Max-Min normalization [2] Decimal Scaling normalization
  • 24. Loan Prospector – HNC/Fair Isaac • A Neural Network (Expert System) is like a black box that knows how to process inputs to create a useful output. • The calculation(s) are quite complex and difficult to understand
  • 25. Neural Net Limitations • Neural Nets are good for prediction and estimation when: – Inputs are well understood – Output is well understood – Experience is available for examples to use to “train” the neural net application (expert system) • Neural Nets are only as good as the training set used to generate it. The resulting model is static and must be updated with more recent examples and retraining for it to stay relevant
  • 26. Neural Network Training • Training is the process of setting the best weights on the edges connecting all the units in the network • The goal is to use the training set to calculate weights where the output of the network is as close to the desired output as possible for as many of the examples in the training set as possible • Back propagation has been used since the 1980s to adjust the weights (other methods are now available): – Calculates the error by taking the difference between the calculated result and the actual result – The error is fed back through the network and the weights are adjusted to minimize the error
  • 27. 27 Introduction • Data Mining Definitions: – Building compact and understandable models incorporating the relationships between the description of a situation and a result concerning the situation. – Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases.
  • 28. 28 Kinds of Data Mining Problems • Classification / Segmentation • Forecasting/Prediction (how much) • Association rule extraction (market basket analysis) • Sequence detection
  • 29. 29 Data Mining Techniques: • Neural Networks • Decision Trees • Multivariate Adaptive Regression Splines (MARS) • Rule Induction • Nearest Neighbor Method and discriminant analysis • Genetic Algorithms • Boosting
  • 30. 30 Neural Networks • What are they? – Based on early research aimed at representing the way the human brain works – Neural networks are composed of many processing units called neurons • Types (Supervised versus Unsupervised) • Training
  • 31. 31 Neural Networks are great, but.. • Problem 1: The black box model! – Solution: 1. Do we really need to know? – Solution 2. Rule Extraction techniques • Problem 2: Long training times – Solution 1: Get a faster PC with lots of RAM – Solution 2: Use faster algorithms “For example: Quickprop” • Problems 3-: Back propagation – Solution: Evolutionary Neural Networks!
  • 32. 32 Rule Extraction Techniques • Representation Methods • Extraction Strategy • Network Requirement
  • 33. Neural Network Concepts • Neural networks (NN): a brain metaphor for information processing • Neural computing • Artificial neural network (ANN) • Many uses for ANN for – pattern recognition, forecasting, prediction, and classification • Many application areas – finance, marketing, manufacturing, operations, information systems, and so on
  • 34. Biological Neural Networks Soma Axon Axon Synapse Synapse Dendrites Dendrites Soma • Two interconnected brain cells (neurons)
  • 36. Processing Information in ANN w1 w2 wn x1 x2 xn . . . Y Y1 Yn Y2 Inputs Weights Outputs . . . Neuron (or PE)   n i iiWXS 1 )( Sf Summation Transfer Function • A single neuron (processing element – PE) with inputs and outputs
  • 37. Elements of ANN • Processing element (PE) • Network architecture – Hidden layers – Parallel processing • Network information processing – Inputs – Outputs – Connection weights – Summation function
  • 38. Elements of ANN • Processing element (PE) • Network architecture – Hidden layers – Parallel processing • Network information processing – Inputs – Outputs – Connection weights – Summation function
  • 40. Learning in ANN • A process by which a neural network learns the underlying relationship between input and outputs, or just among the inputs • Supervised learning – For prediction type problems – E.g., backpropagation • Unsupervised learning – For clustering type problems – Self-organizing – E.g., adaptive resonance theory
  • 41. A Taxonomy of ANN Learning Algorithms Learning Algorithms Discrete/binary input Continuous Input Surepvised Unsupervised · Delta rule · Gradient Descent · Competitive learning · Neocognitron · Perceptor · Simple Hopefield · Outerproduct AM · Hamming Net · ART-1 · Carpenter / Grossberg · ART-3 · SOFM (or SOM) · Other clustering algorithms Architectures Supervised Unsupervised Recurrent Feedforward Extimator Extractor · Hopefield · SOFM (or SOM)· Nonlinear vs. linear · Backpropagation · ML perceptron · Boltzmann · ART-1 · ART-2 UnsupervisedSurepvised
  • 42. A Supervised Learning Process Compute output Is desired output achieved? Stop learning Adjust weights Yes No ANN Model Three-step process: 1. Compute temporary outputs 2. Compare outputs with desired targets 3. Adjust the weights and repeat the process
  • 43. How a Network Learns • Example: single neuron that learns the inclusive OR operation * See your book for step-by-step progression of the learning process Learning parameters:  Learning rate  Momentum
  • 44. Backpropagation Learning • Backpropagation of Error for a Single Neuron w1 w2 wn x1 x2 xn . . . Yi Neuron (or PE)   n i iiWXS 1 )( Sf Summation Transfer Function )(SfY  a(Zi – Yi) error
  • 45. Backpropagation Learning • The learning algorithm procedure: 1. Initialize weights with random values and set other network parameters 2. Read in the inputs and the desired outputs 3. Compute the actual output (by working forward through the layers) 4. Compute the error (difference between the actual and desired output) 5. Change the weights by working backward through the hidden layers 6. Repeat steps 2-5 until weights stabilize
  • 47. Neural Network Architectures • Architecture of a neural network is driven by the task it is intended to address – Classification, regression, clustering, general optimization, association, …. • Most popular architecture: Feedforward, multi- layered perceptron with backpropagation learning algorithm – Used for both classification and regression type problems
  • 48. Other Popular ANN Paradigms Self Organizing Maps (SOM) • Applications of SOM – Customer segmentation – Bibliographic classification – Image-browsing systems – Medical diagnosis – Interpretation of seismic activity – Speech recognition – Data compression – Environmental modeling, many more …
  • 49. Applications Types of ANN • Classification – Feedforward networks (MLP), radial basis function, and probabilistic NN • Regression – Feedforward networks (MLP), radial basis function • Clustering – Adaptive Resonance Theory (ART) and SOM • Association – Hopfield networks • Provide examples for each type?
  • 50. Advantages of ANN • Able to deal with (identify/model) highly nonlinear relationships • Not prone to restricting normality and/or independence assumptions • Can handle variety of problem types • Usually provides better results (prediction and/or clustering) compared to its statistical counterparts • Handles both numerical and categorical variables (transformation needed!)
  • 51. Disadvantages of ANN • They are deemed to be black-box solutions, lacking expandability • It is hard to find optimal values for large number of network parameters – Optimal design is still an art: requires expertise and extensive experimentation • It is hard to handle large number of variables (especially the rich nominal attributes) • Training may take a long time for large datasets; which may require case sampling
  • 52. ANN Software • Standalone ANN software tool – NeuroSolutions – BrainMaker – NeuralWare – NeuroShell, … for more (see pcai.com) … • Part of a data mining software suit – PASW (formerly SPSS Clementine) – SAS Enterprise Miner – Statistica Data Miner, … many more …
  • 53. Applications-I • Handwritten Digit Recognition • Face recognition • Time series prediction • Process identification • Process control • Optical character recognition
  • 54. Application-II • Forecasting/Market Prediction: finance and banking • Manufacturing: quality control, fault diagnosis • Medicine: analysis of electrocardiogram data, RNA & DNA sequencing, drug development without animal testing • Control: process, robotics