Getting started with Machine Learning

Getting started with
Machine Learning
Gaurav Bhalotia

Machine learning
● Processing Data, understanding data, reacting intelligently
to data
● Predicting the future based on the past
Traditional CS Machine Learning

ML vs AI?
AI is a computer program that does
something smart. It can be a pile of if-then
statements or a complex statistical model.
Not all AI counts as machine learning. For
example, symbolic logic (rules engines, expert
systems and knowledge graphs). Trie based
autosuggest are e.g.
Computer programs (algorithms) to parse
data, learn from it, and then make a
determination or prediction about
something in the world.
Teaching computers to learn to perform tasks
from past experiences (data)
A technique for implementing machine
learning.
Attempts to model the mind rather than the
world

What is learning? Do the actual maths/calculations
(training)
Figure out a mathematical
formulation (model)
Identify the real world problem
we are solving. (the question)
Find the relevant data, gather
enough so that it captures all
nuances
1
2
3
4

Machine Learning has come of age
Multiple Real World Deployments
Speech recognition
(Siri, Alexa, … )
Image recognition (self driving cars, FB
tagging … )
Spam detection
Recommendation Engines
(amazon, netflix, … )
Language Translation
Data1
Modelling
Capability
2
Compute
Power
3
Programming
Frameworks
4

What can Machine Learning/ AI do?
Anything humans can do in < 1s (Andrew Ng @ Baidu)
Predicting the next in a sequence of events
We are very very far away from AI becoming sentient

Model Input Learning
Gather data: Identify data
context that represents the
problem space. Ensure
sufficient coverage for various
nuances.
Feature engineering: If
shallow learner, identify specific
features of the input that may
help in disambiguating the
correct answer.
Data Preparation: Split data
into training, testing and
validation sets. Assign human
labels to training data if fitting
to be done against known
outputs.
Learning Type: Identify the
class of algorithms to be used -
supervised, unsupervised, etc.
Algorithm Selection: For the
chosen class, pick one that may
work best for problem at hand -
e.g. classification neural net vs
SVM vs naive bayes
Parameter Identification:
Figure out the parameters or a
set of parameters that you want
to try for your algorithm during
the learning process. E.g.
#cluster for k-means
Train: Fitting the input data to
the model we are working with.
Gradient descent is a well
known tool for this.
Validate: While the model is
being trained on training data,
we update the model only if it
continues to do well on an
independent validation set.
Test: Test how well the model is
able to generalize the learned
behaviour by applying on an
independent data set not seen
before.

Supervised Learning
Supervised learning is the machine learning task of inferring a function (f) from
labeled training data
● We have data with already attached labels (e.g. we know spam/non spam difference)
● We predict a pattern that fits the data
● We apply this pattern to new data and predict
Goal: from the database (learning sample), find a function f of the inputs that
approximates at best the output
Discrete/Symbolic output ⇒ classification problem
Continuous/Numerical output ⇒ regression problem

Unsupervised Learning
In machine learning, the problem of unsupervised learning is that of trying to find
hidden structure in unlabeled data
● We have some (a lot) data that we cannot make much sense of
● We use different algorithms to see if a pattern emerges

Classification
Predicting class labels
(two or more) from
input data
E.g.
Spam detection
Cancer identification

Regression
Predicting values
based on input signals
House prices predict
Predicting stock prices
Predicting LTV

Clustering or Grouping
Grouping data based
on certain attributes
(distance function)
Placing cell towers
Identifying patrol vans
or emergency clinics
Customer segmentation

Association Analysis
Similarity between
entities through
explicit/implicit
associations
Supermarket shelf
planning
Recommendation
engines (netflix,
amazon)

Dimensionality Reduction
Topic Modelling (news), e.g.
LDA
Improving performance of
ML algorithms

Sequence Analysis (HMM/RNN)
Identify sequential patterns in
data
Language modelling (speech
recognition, language
translation)
Activity modelling (fitbit)

Feature Engineering
● Examples : bag of words, phonemes
● Using domain knowledge of the data to create features that make machine learning algorithms work
○ both quantity and quality matter
● "Applied machine learning" is basically feature engineering (Andrew Ng). This is what differentiates
a expert from novice
○ Is getting replaced with model capability engineering with advent of DL
● The data preparation stage will involve scaling, centering and transforming data to get good
features
○ Features should be comparable and combinable

Data Preparation and Training
Training Set: this data set is used to
adjust the parameters of the model
Validation Set: This data set is used to
minimize overfitting. If the accuracy over
the training data set increases, but the
accuracy over validation data set stays the
same or decreases, then we're overfitting
and should stop training.
Testing Set: this data set is used only for
testing the final solution in order to confirm
the model performance.

Picking Training/Test data
● Usually split the input into 60|20|20 or 70|20|10
● The test set should be similar to train set, don’t train on an entirely different
data
● All data should be representative of data that the model will see in future

Model Fitting
1. Start with a seed values
for model parameters
2. Identify a loss function
which is deviation from
input data
3. Fit the model to input
data by optimising for a
loss function (e.g.
gradient descent)
4. Iterate till we converge
to a minima

Nuances and Pitfalls
● Validation set is important
● Monitoring and measuring
the performance of model
fitting over epochs is
important
● The sweet spot is the point
just before the error on the
test dataset starts to
increase
● Use regularisation
techniques (dropouts,
sampling)
The goal of a good machine learning model is to generalize well from the training data to any data
from the problem domain.

Deep Learning was inspired by how human brain works
It’s a massive simplification, but we keep on finding more and more inspirations
The brain is way too complex
and diverse, but it is still fair to
call these biologically inspired
Arranged in layers, networks of
these neurons are effective at
learning from pre-labelled
samples (full supervised)

Amount of data
Performance
Turning Point in ML
Has given us the ability to get to 99% accuracy on so many problems, a game changer

Why is deep learning taking off now?
Scale of data
scale of computation
At small data sizes, feature engineering wins
1
2

How does a neural network work?
The simplest unit is a
perceptron
Has ability to model any
general function
Activation function added
to give it non-linearity
Are arranged in layers to
model more complex
functions

Activation Functions
● Introduced to add non-linearity so that neural network can model any
generalised function
● Modelled after biological neuron
● ReLU (rectified linear unit) is a later discovery and has better training
performance.

Network Design
● Hidden layers do implicit
feature engineering.
● Softmax used in output
layer for classification
problems
● The number and size of
layers is part of capability
engineering
● All weights and biases
(parameters) provide
degrees of freedom that
can fit complex functions

How does learning happen?
1. Similar to before, start with seed
values for weights and biases
2. Compute Loss wrt desired
output
3. Find gradient of each parameter
wrt the total loss
4. Adjust the parameters
proportional to the gradient as
part of learning step
5. Iterate

Training: Optimisation of loss
Generalised backpropagation through the layers of the neural network was a
breakthrough (Geoff Hinton)
Allowed us to train random multi layered networks.
Another breakthrough was using massive amounts of data to train computer vision
and speech recognition models. (Andrew Ng, Google)
Allowed us to achieve accuracies of > 99%

Backpropagation
Our goal with backpropagation is to update each of the weights in the network so that they cause the actual output to be closer
the target output, thereby minimizing the error for each output neuron and the network as a whole.
We’re going to work with a single training
set:
We start with random weights and
propagate them forward

Given inputs 0.05 and 0.10, we want the
neural network to output 0.01 and
0.99.

Forward Pass
Find what the neural network currently predicts given the weights and biases above and inputs of 0.05 and 0.10, we do this by
feeding the inputs forward through the network
1) We figure out the total net input
to each hidden layer neuron,
2) squash the total net input using an
activation function (here we use
the logistic function)
3) repeat the process with the
output layer neurons.
4) Finally calculate the error at the
output against the expected
target

Backward Pass : Output Layer
1. Find how much a change in each weight affects
the total error (partial derivative or gradient wrt
weight)
2. To decrease the error, we then subtract this
value from the current weight (multiplied by
learning rate - here we use 0.5)
3. We perform the actual updates in the neural
network after we have the new weights leading
into the hidden layer neurons
Update each of the weights in the network so that they cause the actual output to be closer the target
output : Thereby minimizing the error for each output neuron and the network as a whole.

Backward Pass : Hidden Layer
1. Find how much a change in each weight affects
the total error
2. Subtract this value from the current weight
(multiplied by learning rate)
Use a similar process as the output layer, but slightly different to account for the fact that the output of
each hidden layer neuron contributes to the output (and therefore error) of multiple output neurons.

Finally ● The learning rate is how quickly a network
abandons old beliefs for new ones.
● High learning rate can lead to over-corrections,
and in fact increase the loss instead of decreasing
it
● Usually, one can start with a large learning rate,
and gradually decrease the learning rate as the
training progresses.
● Find a learning rate that is low enough that the
network converges to something useful, but high
enough that you don't have to spend years
training it.
10kiterations

Debugging Neural Networks
1. Get more training data
2. Play around with regularization techniques
3. Modify network design/capability
4. Train longer
5. Use different types of neurons/networks

Specialised Neural Networks
Convolutional
networks
used to capture shapes.
Used for Computer Vision

Specialised Neural Networks
Recurrent neural
network
Capture sequential information
Used for language
modelling

Learning
Takeaway
Feature Engineering is
implicit
Capability Engineering is
explicit
Humans have been doing this,
identify the right animal for a
task (dogs, parrots)

Practical Wisdom
● Machine learning only works if the problem is actually solvable with the data
that you have.
● Large amounts of data capture learning that can beat a committee of humans,
make sure you supply all nuances of data (shape, color, size)
● Learning is best when training data capture matches serving context. Use
data transformation techniques to improve performance (e.g. noise augmentation)

Getting started with Machine Learning

Getting started with Machine Learning

More Related Content

What's hot (20)

Similar to Getting started with Machine Learning (20)

Recently uploaded (20)

Getting started with Machine Learning