Learning – Types of Machine Learning – Supervised Learning – Unsupervised UNIT - 1.pptx

INTRODUCTION TO
Advanced Machine Learning
By
R. Sanghavi
Asst Professor
CSE(DS)
MALLA REDDY ENGINEERING COLLEGE (Autonomous)

Syllabus
Learning – Types of Machine Learning – Supervised
Learning – Unsupervised Learning- semi supervised
learning - The Brain and the Neuron – Design a Learning
System – Perspectives and Issues in Machine Learning –
Concept Learning Task – Concept Learning as Search –
Finding a Maximally Specific Hypothesis – Version
Spaces and the Candidate Elimination Algorithm

Course Outcomes:
Upon completion of the course, the students will be able to: Distinguish
between, supervised, unsupervised and semi-supervised learning
1. Apply the apt machine learning strategy for any given problem
2. Suggest supervised, unsupervised or semi-supervised learning algorithms for
any given problem
3. Design systems that uses the appropriate Trees in Probabilities Models of
machine learning
4. Modify existing machine learning algorithms to improve classification
efficiency
5. Design systems that uses the appropriate graph models of machine learning

C. Optimization
C Optimization

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Supervised Learning
 Supervised learning is an ML method in which a model learns from a
labeled dataset containing input-output pairs. Each input in the
dataset has a corresponding correct output (the label), and the
model's task is to learn the relationship between the inputs and
outputs. This enables the model to make predictions on new, unseen
data by applying the learned mapping.

Categories of Supervised Learning
 Regression: When dealing with real-valued output variables like
"price" or "temperature," several popular Regression algorithms come
into play, such as the Simple Linear Regression Algorithm,
Multivariate Regression Algorithm, Decision Tree Algorithm, and
Lasso Regression.
 Classification: In instances where the output variable is a category,
like distinguishing between 'spam' and 'not spam' in email filtering,
several widely-used classification algorithms come into play. These
encompass the following algorithms: Random Forest, Decision Tree,
Logistic Regression, and Support Vector Machine.

Advantages of Supervised Learning
 Effectiveness: Supervised learning can predict outcomes based on
past data.
 Simplicity: It's relatively easy to understand and implement.
 Performance Evaluation: It is easy to measure the performance of a
supervised learning model since the ground truth (labels) is known.
 Applications: Can be used in various fields like finance, healthcare,
marketing, etc.
 Feature Importance: It allows an understanding of which features are
most important in making predictions.

Unsupervised Learning
 Unsupervised Learning is a type of ML that uses input data without
labeled responses to uncover hidden structures from the data itself.
Unlike supervised learning, where the training data includes both
input vectors and corresponding target labels, unsupervised learning
algorithms try to learn patterns and relationships directly from the
input data.

Categories of Unsupervised Learning
 Clustering: Grouping similar instances into clusters (e.g., k-means,
hierarchical clustering). Some popular clustering algorithms are the K-
Means Clustering algorithm, Mean-shift algorithm, DBSCAN
Algorithm, Principal Component Analysis, and Independent
Component Analysis.
 Association: Discovering rules that capture interesting relationships
between variables in large databases (e.g., market basket analysis).
Some popular algorithms of Association are the Apriori Algorithm,
Eclat, and FP-growth algorithm.
 Dimensionality Reduction: Reducing the number of random variables
under consideration (e.g., PCA, t-SNE), which helps to simplify the
data without losing important information.

Semi-supervised learning
 Semi-supervised learning is an ML approach that trains models using
a combination of a small amount of labeled data and a large amount
of unlabeled data. This method lies between supervised learning
(where all data is labeled) and unsupervised learning (where no data is
labeled). The main goal of semi-supervised learning is to leverage the
large pool of unlabeled data to understand the underlying structure
of the data better and improve learning accuracy with the limited
labeled data.

 Example of Semi-Supervised Learning
 A classic example of semi-supervised learning is classifying web
pages. Consider a scenario where you have a small number of web
pages manually categorized into topics like sports, news, technology,
etc., and a much larger set of uncategorized pages. Semi-supervised
learning algorithms can use the labeled pages to learn about features
indicative of each category and apply this knowledge to categorize the
unlabeled pages.

Advantages of Semi-Supervised Learning
 Efficiency: Reduces the need for labeled data, which is often expensive
and time-consuming.
 Improved accuracy: Combining labeled and unlabeled data can often
improve learning accuracy.
 Utilizes unlabeled data: Effectively uses the abundance of available
unlabeled data.
 Versatility: Useful in scenarios where obtaining a fully labeled dataset
is impractical.
 Better generalization: This can help by learning the underlying data
distribution more effectively.

The Brain and the
Neuron
 In machine learning, neural
networks are models that
mimic the structure and
function of the human
brain to help computers
process data and solve
problems.

Working of a Neural Network
 Neural networks are complex systems that mimic
some features of the functioning of the human brain.
 It is composed of an input layer, one or more hidden
layers, and an output layer made up of layers of
artificial neurons that are coupled.
 The two stages of the basic process are called
backpropagation and forward propagation.

Forward Propagation
• input Layer: Each feature in the input layer is represented by a node on
the network, which receives input data.
• Weights and Connections: The weight of each neuronal connection
indicates how strong the connection is. Throughout training, these
weights are changed.
• Hidden Layers: Each hidden layer neuron processes inputs by
multiplying them by weights, adding them up, and then passing them
through an activation function. By doing this, non-linearity is introduced,
enabling the network to recognize intricate patterns.
• Output: The final result is produced by repeating the process until the
output layer is reached.

Design a Learning System in Machine Learning
 According to Arthur Samuel “Machine Learning enables a Machine to
Automatically learn from Data, Improve performance from an
Experience and predict things without explicitly programmed.”
 In Simple Words, When we fed the Training Data to Machine Learning
Algorithm, this algorithm will produce a mathematical model and with
the help of the mathematical model, the machine will make a prediction
and take a decision without being explicitly programmed. Also, during
training data, the more machine will work with it. The more it will
get experience and the more efficient result is produced.

 Example : In Driverless Car, the training data is fed to Algorithm
like how to Drive Car in Highway, Busy and Narrow Street with
factors like speed limit, parking, stop at signal etc.
 After that, a Logical and Mathematical model is created on the basis
of that and after that, the car will work according to the logical
model.
 Also, the more data the data is fed the more efficient output is
produced

 According to Tom Mitchell, “A computer program is said to be
learning from experience (E), with respect to some task (T). Thus,
the performance measure (P) is the performance at task T, which is
measured by P, and it improves with experience E.”
 Example: In Spam E-Mail detection,
• Task, T: To classify mails into Spam or Not Spam.
• Performance measure, P: Total percent of mails being correctly
classified as being “Spam” or “Not Spam”.
• Experience, E: Set of Mails with label “Spam”

 Steps for Designing Learning System are: Step 4 choosing a learning algorithm

Step 1) Choosing the Training Experience
 The very important and first task is to choose the training data or
training experience which will be fed to the Machine Learning
Algorithm. It is important to note that the data or experience that we
fed to the algorithm must have a significant impact on the Success or
Failure of the Model. So Training data or experience should be chosen
wisely.

Step 2- Choosing target function
 The next important step is choosing the target function. It means
according to the knowledge fed to the algorithm the machine learning
will choose NextMove function which will describe what type of legal
moves should be taken. For example : While playing chess with the
opponent, when opponent will play then the machine learning
algorithm will decide what be the number of possible legal moves
taken in order to get success.

Step 3- Choosing Representation for Target function
 When the machine algorithm will know all the possible legal moves
the next step is to choose the optimized move using any
representation i.e. using linear Equations, Hierarchical Graph
Representation, Tabular form etc. The NextMove function will move
the Target move like out of these move which will provide more
success rate. For Example : while playing chess machine have 4
possible moves, so the machine will choose that optimized move
which will provide success to it.

Step 4- Choosing Function Approximation Algorithm
 An optimized move cannot be chosen just with the training data. The
training data had to go through with set of example and through
these examples the training data will approximates which steps are
chosen and after that machine will provide feedback on it. For
Example : When a training data of Playing chess is fed to algorithm so
at that time it is not machine algorithm will fail or get success and
again from that failure or success it will measure while next move
what step should be chosen and what is its success rate.

Step 5- Final Design
 The final design is created at last when system goes from number of
examples , failures and success , correct and incorrect decision and
what will be the next step etc. Example: DeepBlue is an intelligent
computer which is ML-based won chess game against the chess expert
Garry Kasparov, and it became the first computer which had beaten a
human chess expert.

Perspectives and Issues in Machine Learning
 Perspective in Machine Learning:
• A crucial aspect of machine learning is considering different viewpoints
and approaches. These perspectives influence model development,
training, and application.
• Diverse viewpoints lead to improved model accuracy, fairness, and
generalization.
• Human bias and ethical considerations play a significant role in
shaping machine learning outcomes.
• Interdisciplinary collaboration enhances the field by incorporating i
nsights from various domains

Issues in Machine Learning
• Bias and Fairness: Models can inherit biases from training data, leading to
unfair predictions. Addressing bias is essential for ethical and equitable ML.
• Data Quality: Garbage in, garbage out! High-quality data is crucial for robust
models.
• Interpretability: Understanding why a model makes certain predictions is
challenging.
• Scalability: Handling large datasets and distributed learning remains a challenge.
• Generalization: Ensuring models generalize well to unseen examples.
• Security and Privacy: Protecting sensitive information in ML systems.
• Model Selection: Choosing the right algorithm and hyperparameters.

AML
Overfitting: Models may perform well on training data but poorly on unseen data due
to overfitting.
Methods to reduce overfitting:
 Increase training data in a dataset.
 Reduce model complexity by simplifying the model by selecting one with fewer
parameters
 Ridge Regularization and Lasso Regularization
 Early stopping during the training phase
 Reduce the noise
 Reduce the number of attributes in training data.
 Constraining the model.

Methods to reduce Underfitting:
 Increase model complexity
 Remove noise from the data
 Trained on increased and better features
 Reduce the constraints
 Increase the number of epochs to get better results.

Concept Learning Task
 Concept learning in machine learning refers to the
process of inferring a general rule or concept from
specific examples.
 It's a fundamental aspect of supervised learning, where
the goal is to classify unseen instances based on a set of
training data.
 The training data includes labeled examples, with each
example consisting of an input (features) and the correct
output (label).

Key Aspects of Concept Learning:
1. Hypothesis Space (H):The set of all possible concepts (or rules) that the
learning algorithm could potentially choose to explain the training data.
A concept is a function that maps inputs to outputs (e.g., binary
classification, where the concept assigns a positive or negative label to
each input).
2. Target Concept (C):The actual, unknown function or rule that perfectly
explains the training data. The goal of concept learning is to find a
hypothesis in the hypothesis space that closely matches this target
concept.

Concept Learning as Search
 Concept learning can be viewed as the task of searching through a
large space of hypothesis implicitly defined by the hypothesis
representation. The goal of the concept learning search is to find the
hypothesis that best fits the training examples.

Key Aspects of Concept Learning:
 Generalization vs. Specialization: The learning algorithm explores the
hypothesis space by generalizing from the specific examples it has seen.
The challenge is to find a hypothesis that is specific enough to explain
the training data but general enough to classify unseen examples
correctly.
 Inductive Bias: This refers to the assumptions the learning algorithm
makes to choose between different hypotheses when multiple hypotheses
are consistent with the training data. These biases are necessary to
generalize from limited data.

Examples of Concept Learning Algorithms:
 Version Space Learning: The version space is the set of all hypotheses that
are consistent with the training data. As the algorithm receives more examples,
it refines the version space by eliminating hypotheses that do not explain the
data.
 Decision Trees: A decision tree is a form of concept learning where the
internal nodes represent tests on features, and the leaves represent the output
labels (concepts). The tree generalizes a rule by recursively partitioning the
data based on feature values.
 Inductive Logic Programming (ILP):This is a type of concept learning that
uses logic-based representations to induce hypotheses. ILP combines machine
learning with symbolic reasoning to learn concepts expressed as logical rules

Applications of Concept Learning:
 Spam Detection: Learning the concept of "spam" from labeled
email data to classify future emails.
 Medical Diagnosis: Learning to identify a disease (the
concept) from patient data to predict future diagnoses.
 Image Classification: Learning to recognize objects or scenes
by identifying visual patterns from labeled images.

Find S Algorithm
 It is a basic concept learning algorithm in machine learning.
 it finds the most specific hypothesis that fits all the positive examples.
 We have to note here that the algorithm considers only those positive training
example.
 The find-S algorithm starts with the most specific hypothesis and generalizes
this hypothesis each time it fails to classify an observed positive training data.
 Hence, the Find-S algorithm moves from the most specific hypothesis to the
most general hypothesis.

Important Representation :
 ? indicates that any value is acceptable for the attribute.
 specify a single required value ( e.g., Cold ) for the attribute.
 ϕ indicates that no value is acceptable.
 The most general hypothesis is represented by: {?, ?, ?, ?, ?, ?}
 The most specific hypothesis is represented by: { , , , , , }
ϕ ϕ ϕ ϕ ϕ ϕ

Steps Involved In Find-S :
Start with the most specific hypothesis.
 h = { , , , , , }
ϕ ϕ ϕ ϕ ϕ ϕ
 Take the next example and if it is negative, then no changes occur to the
hypothesis.
 If the example is positive and we find that our initial hypothesis is too
specific then we update our current hypothesis to a general condition.
 Keep repeating the above steps till all the training examples are complete.
 After we have completed all the training examples we will have the final
hypothesis when can use to classify the new examples.

Algorithm
1. Initialize h to the most specific hypothesis in H
2. For each positive training instance x
For each attribute constraint a, in h
If the constraint a, is satisfied by x
Then do nothing
Else replace a, in h by the next more general constraint that is
satisfied by x
3. Output hypothesis h

Example problem
Consider the following data set
having the data about which
particular seeds are poisonous.
First, we consider the hypothesis
to be a more specific hypothesis.
Hence, our hypothesis would be :
h = { , , , , , }
ϕ ϕ ϕ ϕ ϕ ϕ

Consider example 1 :
 The data in example 1 is { GREEN, HARD, NO, WRINKLED }. We see that our initial hypothesis
is more specific and we have to generalize it for this example. Hence, the hypothesis
becomes :
 h = { GREEN, HARD, NO, WRINKLED }
 Here we see that this example has a negative outcome. Hence we neglect this example and
our hypothesis remains the same.
 Here we see that this example has a negative outcome. Hence we neglect this example and
our hypothesis remains the same.

The data present in example 4 is { ORANGE, HARD, NO, WRINKLED }. We compare every single
attribute with the initial data and if any mismatch is found we replace that particular attribute
with a general case ( ” ? ” ). After doing the process the hypothesis becomes :
 h = { ?, HARD, NO, WRINKLED }
Consider example 5 : The data present in example 5 is { GREEN, SOFT, YES, SMOOTH }. We
compare every single attribute with the initial data and if any mismatch is found we replace that
particular attribute with a general case ( ” ? ” ). After doing the process the hypothesis becomes :
 h = { ?, ?, ?, ? }
Since we have reached a point where all the attributes in our hypothesis have the general
condition, example 6 and example 7 would result in the same hypothesizes with all general
attributes.
 h = { ?, ?, ?, ? }
Hence, for the given data the final hypothesis would be :
 Final Hyposthesis: h = { ?, ?, ?, ? }

Candidate Elimination Algorithm
 FIND-S outputs a hypothesis from H, that is consistent with the
training examples, this is just one of many hypotheses from H that
might fit the training data equally well. •
 The key idea in the Candidate-Elimination algorithm is to output a
description of the set of all hypotheses consistent with the training
examples.
 Candidate-Elimination algorithm computes the description of this
set without explicitly enumerating all of its members. – This is
accomplished by using the more-general-than partial ordering and
maintaining a compact representation of the set of consistent
hypotheses.

 You can consider this as an extended form of the Find-S algorithm.
 Consider both positive and negative examples.
 Actually, positive examples are used here as the Find-S algorithm
(Basically they are generalizing from the specification).
 While the negative example is specified in the generalizing form.

Algorithm
 Step1: Load Data set
 Step2: Initialize General Hypothesis and Specific Hypothesis.
 Step3: For each training example
 Step4: If example is positive example
 if attribute_value == hypothesis_value:
 Do nothing
 else:
 replace attribute value with '?' (Basically generalizing it)
 Step5: If example is Negative example
 Make generalize hypothesis more specific.
 finally, Remove hypotheses from G that are more specific than S.

Terms Used:
 Concept learning: Concept learning is basically the learning task of the machine
(Learn by Train data)
 General Hypothesis: Not Specifying features to learn the machine.
 G = {‘?’, ‘?’,’?’,’?’…}: Number of attributes
 Specific Hypothesis: Specifying features to learn machine (Specific feature)
 S= {‘pi’,’pi’,’pi’…}: The number of pi depends on a number of attributes.
Version Space
It is an intermediate of general hypothesis and Specific hypothesis. It not only just
writes one hypothesis but a set of all possible hypotheses based on training data-set.
(or)
It is the set of all hypotheses consistent with training examples. CEA refines this space
to find the best fit.

Candidate Elimination Algorithm

Example 2

eg
 Initially : G = [[?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?],
 [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?]]
 S = [Null, Null, Null, Null, Null, Null]

 For instance 1 : <'sunny','warm','normal','strong','warm ','same'> and positive output.
 G1 = G
 S1 = ['sunny','warm','normal','strong','warm ','same']

 For instance 2 : <'sunny','warm','high','strong','warm ','same'> and positive output.
 G2 = G
 S2 = ['sunny','warm',?,'strong','warm ','same']

 For instance 3 : <'rainy','cold','high','strong','warm ','change'> and negative output.
G3 = [['sunny', ?, ?, ?, ?, ?], [?, 'warm', ?, ?, ?, ?], [?, ?, ?, ?, ?, ?],
[?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, 'same']]
S3 = S2
 For instance 4 : <'sunny','warm','high','strong','cool','change'> and positive output.
G4 = G3
S4 = ['sunny','warm',?,'strong', ?, ?]
At last, by synchronizing the G4 and S4 algorithm produce the output.
G = [['sunny', ?, ?, ?, ?, ?], [?, 'warm', ?, ?, ?, ?]]
S = ['sunny','warm',?,'strong', ?, ?]

Advantages of CEA over Find-S:
 Improved accuracy: CEA considers both positive and negative examples to generate
the hypothesis, which can result in higher accuracy when dealing with noisy or
incomplete data.
 Flexibility: CEA can handle more complex classification tasks, such as those with
multiple classes or non-linear decision boundaries.
 More efficient: CEA reduces the number of hypotheses by generating a set of
general hypotheses and then eliminating them one by one. This can result in faster
processing and improved efficiency.
 Better handling of continuous attributes: CEA can handle continuous attributes by
creating boundaries for each attribute, which makes it more suitable for a wider
range of datasets.

Disadvantages of CEA in comparison with Find-S
 More complex: CEA is a more complex algorithm than Find-S, which may make it
more difficult for beginners or those without a strong background in machine
learning to use and understand.
 Higher memory requirements: CEA requires more memory to store the set of
hypotheses and boundaries, which may make it less suitable for memory-
constrained environments.
 Slower processing for large datasets: CEA may become slower for larger datasets
due to the increased number of hypotheses generated.
 Higher potential for overfitting: The increased complexity of CEA may make it more
prone to overfitting on the training data, especially if the dataset is small or has a
high degree of noise.

Reference
 https://www.simplilearn.com/tutorials/machine-learning-tutorial/typ
es-of-machine-learning
 https://www.geeksforgeeks.org/design-a-learning-system-in-machine-l
earning/

Learning – Types of Machine Learning – Supervised Learning – Unsupervised UNIT - 1.pptx

More Related Content

What's hot (20)

Similar to Learning – Types of Machine Learning – Supervised Learning – Unsupervised UNIT - 1.pptx (20)

More from 23Q95A6706 (8)

Recently uploaded (20)

Learning – Types of Machine Learning – Supervised Learning – Unsupervised UNIT - 1.pptx