SlideShare a Scribd company logo
1
Yossi Cohen
Machine Learning
with
Scikit-learn
2
INTRO TO ML PROGRAMMING
3
ML Programming
1. Get Data
Get labels for supervised learning
1. Create a classifier
2. Train the classifier
3. Predict test data
4. Evaluate predictor accuracy
*Configure and improve by repeating 2-5
4
The ML Process
Filter
Outliers
Regression
Classify
Validate
configure
Model
Partition
5
Get Data & Labels
โ€ข Sources
โ€“Open data sources
โ€“Collect on your own
โ€ข Verify data validity and correctness
โ€ข Wrangle data
โ€“make it readable by computer
โ€“Filter it
โ€ข Remove Outliers
PANDAS Python library could assist in pre-
processing & data manipulation before ML
http://pandas.pydata.org/
6
Pre-Processing
๏‚งChange formatting
๏‚งRemove redundant data
๏‚งFilter Data (take partial data)
๏‚งRemove Outliers
๏‚งLabel
๏‚งSplit for testing (10/90, 20/80)
7
Data Partitioning
โ€ข Data and labels
โ€“{[data], [labels]}
โ€“{[3,7, 76, 11, 22, 37, 56,2],[T, T, F, T, F, F, F, T]}
โ€“Data: [Age, Do you love Nutella?]
โ€ข Partitioning will create
โ€“{[train data], [train labels],[test data], [test labels]}
โ€“We usually split the data on a ration of 9:1
โ€“There is a tradeoff between the effectiveness of
the test and the learning we could provide to the
classifier
โ€ข We will look at a partitioning function later
8
Learn (The โ€œSmart Partโ€)
๏‚งClassification
๏‚งIf the output is discrete to a limited amount of
classes (groups)
๏‚งRegression
๏‚งIf the output is continues
9
Learn Programming
10
Create Classifier
๏‚งFor most SUPERVISED LEARNING
algorithms this would be
๏‚งC = ClassifyAlg(Params)
๏‚งIts up to us (ML guys) to set the best
params
๏‚งHow?
1. We could develop a hunch for it
2. Perform an exhaustive search
11
Train the classifier
We assigned
C = ClassifyAlg(Params)
This is a general algorithm with some
initalizer and configurations.
In this stage we train it using:
C.fit(Data, Labels)
12
Predict
๏‚งAfter we have a trained Algorithm
classifier C
๏‚งPrdeicted_Labels = C.predict(Data)
13
Predictor Evaluation
๏‚งWe are not done yet
๏‚งThere is a need to evaluate the predictor
accuracy in comparison to other predictors
and to the system requirements
๏‚งWe will learn several methods for this
14
ENVIRONMENT
15
The Environment
โ€ข There are many existing environments and
tools we could use
โ€“Matlab with Machine learning toolbox
โ€“Apache Mahout
โ€“Python with Scikit-learn
โ€ข Additional tools
โ€“Hadoop / Map-Reduce to accelerate and
parallelize large data set processing
โ€“Amazon ML tools
โ€“NVIDIA Tools
16
Scikit-learn
โ€ข Installation Instructions in
http://scikit-learn.org/stable/install.html#install-official-release
โ€ข Depends on two other libraries
โ€ข numpy and scipy
โ€ข Easiest way to install on windows:
โ€ข Install WinPython
http://sourceforge.net/projects/winpython/files/WinPython_2.7/2.7.9.4/
โ€“Lets install this together
For Linux / Mac computers just install the 3
libs separately using PIP
17
THE DATA
18
Data sets
๏‚งThere are many data sets to work on
๏‚งOne of them is the Iris data classification
into three groups. It has an interesting story
you could google later
๏‚งWell work on the iris
data
19
Lab A โ€“ Plot the Iris data
๏‚งPlot septal length vs septal width with labels
ONLY
๏‚งHow? Google Iris data and the scikit learn
environment
๏‚งTry to understand the second part of the
program with the PCA
20
Iris Data
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features.
Y = iris.target
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
21
Plot Iris Data
plt.figure(2, figsize=(8, 6))
plt.clf()
plt.scatter(X[:, 0], X[:, 1],
c=Y, cmap=plt.cm.Paired)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
plt.xticks(())
plt.yticks(())
22
Add PCA for better classification
fig = plt.figure(1, figsize=(8, 6))
ax = Axes3D(fig, elev=-150, azim=110)
X_reduced = PCA(n_components=3).fit_transform(iris.data)
ax.scatter(X_reduced[:, 0], X_reduced[:, 1], X_reduced[:, 2], c=Y,
cmap=plt.cm.Paired)
ax.set_title("First three PCA directions")
ax.set_xlabel("1st eigenvector")
ax.w_xaxis.set_ticklabels([])
ax.set_ylabel("2nd eigenvector")
ax.w_yaxis.set_ticklabels([])
ax.set_zlabel("3rd eigenvector")
ax.w_zaxis.set_ticklabels([])
plt.show()
23
Iris Data Classified
24
25
Thank you!
More About me:
Yossi CohenYossi Cohen
yossicohen19@gmail.comyossicohen19@gmail.com
+972-545-313092+972-545-313092
๏‚ง Video compression and computer vision enthusiast & lecturer
๏‚ง Surfer

More Related Content

What's hot (20)

PDF
Deep Learning with MXNet - Dmitry Larko
Sri Ambati
ย 
PPTX
Making Machine Learning Scale: Single Machine and Distributed
Turi, Inc.
ย 
PPTX
Machine Learning with Spark
elephantscale
ย 
PDF
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Databricks
ย 
PDF
Spark and the Future of Advanced Analytics by Thomas Dinsmore
Spark Summit
ย 
PDF
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
Athens Big Data
ย 
PDF
Snorkel: Dark Data and Machine Learning with Christopher Rรฉ
Jen Aman
ย 
PDF
Startup Data Science
Misha Lisovich
ย 
PDF
Better {ML} Together: GraphLab Create + Spark
Turi, Inc.
ย 
PDF
Scala: the unpredicted lingua franca for data science
Andy Petrella
ย 
PPTX
ISAX
Sri Ambati
ย 
PPTX
Scikit Learn intro
9xdot
ย 
PDF
Distributed machine learning 101 using apache spark from a browser devoxx.b...
Andy Petrella
ย 
PPTX
Adaptable IoT
Geert Baeke
ย 
PDF
Introduction to Machine Learning in Python using Scikit-Learn
Amol Agrawal
ย 
PPTX
Networks are like onions: Practical Deep Learning with TensorFlow
Barbara Fusinska
ย 
PDF
AI with Azure Machine Learning
Geert Baeke
ย 
PDF
Machine Learning for Everyone
Aly Abdelkareem
ย 
PDF
Parikshit Ram โ€“ Senior Machine Learning Scientist, Skytree at MLconf ATL
MLconf
ย 
PDF
3 python packages
FEG
ย 
Deep Learning with MXNet - Dmitry Larko
Sri Ambati
ย 
Making Machine Learning Scale: Single Machine and Distributed
Turi, Inc.
ย 
Machine Learning with Spark
elephantscale
ย 
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Databricks
ย 
Spark and the Future of Advanced Analytics by Thomas Dinsmore
Spark Summit
ย 
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
Athens Big Data
ย 
Snorkel: Dark Data and Machine Learning with Christopher Rรฉ
Jen Aman
ย 
Startup Data Science
Misha Lisovich
ย 
Better {ML} Together: GraphLab Create + Spark
Turi, Inc.
ย 
Scala: the unpredicted lingua franca for data science
Andy Petrella
ย 
ISAX
Sri Ambati
ย 
Scikit Learn intro
9xdot
ย 
Distributed machine learning 101 using apache spark from a browser devoxx.b...
Andy Petrella
ย 
Adaptable IoT
Geert Baeke
ย 
Introduction to Machine Learning in Python using Scikit-Learn
Amol Agrawal
ย 
Networks are like onions: Practical Deep Learning with TensorFlow
Barbara Fusinska
ย 
AI with Azure Machine Learning
Geert Baeke
ย 
Machine Learning for Everyone
Aly Abdelkareem
ย 
Parikshit Ram โ€“ Senior Machine Learning Scientist, Skytree at MLconf ATL
MLconf
ย 
3 python packages
FEG
ย 

Viewers also liked (20)

PDF
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
Arnaud Joly
ย 
PPTX
Introduction to Machine Learning with Python and scikit-learn
Matt Hagy
ย 
PDF
Machine learning in production with scikit-learn
Jeff Klukas
ย 
PDF
Scikit-learn: the state of the union 2016
Gael Varoquaux
ย 
PPT
Machine Learning with scikit-learn
odsc
ย 
PDF
Machine learning with scikit-learn
Qingkai Kong
ย 
PDF
Intro to scikit learn may 2017
Francesco Mosconi
ย 
PDF
Data Science and Machine Learning Using Python and Scikit-learn
Asim Jalis
ย 
PDF
Exploring Machine Learning in Python with Scikit-Learn
Kan Ouivirach, Ph.D.
ย 
PPT
Intro to scikit-learn
AWeber
ย 
PDF
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
PyData
ย 
PPT
Realtime predictive analytics using RabbitMQ & scikit-learn
AWeber
ย 
PDF
Pyparis2017 / Scikit-learn - an incomplete yearly review, by Gael Varoquaux
Pรดle Systematic Paris-Region
ย 
PDF
Think machine-learning-with-scikit-learn-chetan
Chetan Khatri
ย 
PDF
Tree models with Scikit-Learn: Great models with little assumptions
Gilles Louppe
ย 
PDF
Scikit-learn for easy machine learning: the vision, the tool, and the project
Gael Varoquaux
ย 
PDF
Converting Scikit-Learn to PMML
Villu Ruusmann
ย 
PDF
Accelerating Random Forests in Scikit-Learn
Gilles Louppe
ย 
PPTX
Text Classification/Categorization
Oswal Abhishek
ย 
PDF
Text Classification in Python โ€“ using Pandas, scikit-learn, IPython Notebook ...
Jimmy Lai
ย 
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
Arnaud Joly
ย 
Introduction to Machine Learning with Python and scikit-learn
Matt Hagy
ย 
Machine learning in production with scikit-learn
Jeff Klukas
ย 
Scikit-learn: the state of the union 2016
Gael Varoquaux
ย 
Machine Learning with scikit-learn
odsc
ย 
Machine learning with scikit-learn
Qingkai Kong
ย 
Intro to scikit learn may 2017
Francesco Mosconi
ย 
Data Science and Machine Learning Using Python and Scikit-learn
Asim Jalis
ย 
Exploring Machine Learning in Python with Scikit-Learn
Kan Ouivirach, Ph.D.
ย 
Intro to scikit-learn
AWeber
ย 
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
PyData
ย 
Realtime predictive analytics using RabbitMQ & scikit-learn
AWeber
ย 
Pyparis2017 / Scikit-learn - an incomplete yearly review, by Gael Varoquaux
Pรดle Systematic Paris-Region
ย 
Think machine-learning-with-scikit-learn-chetan
Chetan Khatri
ย 
Tree models with Scikit-Learn: Great models with little assumptions
Gilles Louppe
ย 
Scikit-learn for easy machine learning: the vision, the tool, and the project
Gael Varoquaux
ย 
Converting Scikit-Learn to PMML
Villu Ruusmann
ย 
Accelerating Random Forests in Scikit-Learn
Gilles Louppe
ย 
Text Classification/Categorization
Oswal Abhishek
ย 
Text Classification in Python โ€“ using Pandas, scikit-learn, IPython Notebook ...
Jimmy Lai
ย 
Ad

Similar to Intro to machine learning with scikit learn (20)

PPTX
Chapter 5 Introduction to Machine Learning with Scikit-learn.pptx
TngNguynSn19
ย 
PDF
Hands-on - Machine Learning using scikitLearn
avrtraining021
ย 
PDF
HealthOrzo โ€“ Your Health Matters
IRJET Journal
ย 
PPTX
background.pptx
KabileshCm
ย 
PPTX
An introduction to Machine Learning with scikit-learn (October 2018)
Julien SIMON
ย 
PDF
Introduction to Machine Learning with Python ( PDFDrive.com ).pdf
bisan3
ย 
PPTX
Machine Learning Essentials Demystified part1 | Big Data Demystified
Omid Vahdaty
ย 
PPTX
Machine learning
omaraldabash
ย 
PPTX
Session 06 machine learning.pptx
bodaceacat
ย 
PPTX
Session 06 machine learning.pptx
Sara-Jayne Terp
ย 
PDF
Introduction To Machine Learning With Python A Guide For Data Scientists 1st ...
geyzelgarban
ย 
PPTX
Machine learning ppt for presentation 20 slides
Mohitpal722158
ย 
PPTX
MLfinel PPT.pptx zvsbajajsn a ankakaakbsbabananan
Mohitpal722158
ย 
PDF
ะŸั€ะฐะบั‚ะธั‡ะตัะบะพะต ะฟั€ะธะผะตะฝะตะฝะธะต ะผะฐัˆะธะฝะฝะพะณะพ ะพะฑัƒั‡ะตะฝะธั ะฒ ะ˜ะ‘
Positive Hack Days
ย 
PPTX
Nimrita koul Machine Learning
Nimrita Koul
ย 
PDF
Machine Learning Crash Course by Sebastian Raschka
PawanJayarathna1
ย 
PDF
Python Machine Learning - Getting Started
Rafey Iqbal Rahman
ย 
PPTX
Introduction to machine learning
Sangath babu
ย 
PPTX
Internshipppt.pptx
VishalKumarSingh645583
ย 
PPTX
A Beginner's Guide to Machine Learning with Scikit-Learn
Sarah Guido
ย 
Chapter 5 Introduction to Machine Learning with Scikit-learn.pptx
TngNguynSn19
ย 
Hands-on - Machine Learning using scikitLearn
avrtraining021
ย 
HealthOrzo โ€“ Your Health Matters
IRJET Journal
ย 
background.pptx
KabileshCm
ย 
An introduction to Machine Learning with scikit-learn (October 2018)
Julien SIMON
ย 
Introduction to Machine Learning with Python ( PDFDrive.com ).pdf
bisan3
ย 
Machine Learning Essentials Demystified part1 | Big Data Demystified
Omid Vahdaty
ย 
Machine learning
omaraldabash
ย 
Session 06 machine learning.pptx
bodaceacat
ย 
Session 06 machine learning.pptx
Sara-Jayne Terp
ย 
Introduction To Machine Learning With Python A Guide For Data Scientists 1st ...
geyzelgarban
ย 
Machine learning ppt for presentation 20 slides
Mohitpal722158
ย 
MLfinel PPT.pptx zvsbajajsn a ankakaakbsbabananan
Mohitpal722158
ย 
ะŸั€ะฐะบั‚ะธั‡ะตัะบะพะต ะฟั€ะธะผะตะฝะตะฝะธะต ะผะฐัˆะธะฝะฝะพะณะพ ะพะฑัƒั‡ะตะฝะธั ะฒ ะ˜ะ‘
Positive Hack Days
ย 
Nimrita koul Machine Learning
Nimrita Koul
ย 
Machine Learning Crash Course by Sebastian Raschka
PawanJayarathna1
ย 
Python Machine Learning - Getting Started
Rafey Iqbal Rahman
ย 
Introduction to machine learning
Sangath babu
ย 
Internshipppt.pptx
VishalKumarSingh645583
ย 
A Beginner's Guide to Machine Learning with Scikit-Learn
Sarah Guido
ย 
Ad

More from Yoss Cohen (20)

PPTX
Underwater robotics simulation with isaac sim
Yoss Cohen
ย 
PPTX
Infrared simulation and processing on Nvidia platforms
Yoss Cohen
ย 
PPTX
open platform for swarm training
Yoss Cohen
ย 
PDF
Deep Learning - system view
Yoss Cohen
ย 
PDF
Dspip deep learning syllabus
Yoss Cohen
ย 
PPT
IoT consideration selection
Yoss Cohen
ย 
PPT
IoT evolution
Yoss Cohen
ย 
DOC
Nvidia jetson nano bringup
Yoss Cohen
ย 
PPT
Autonomous car teleportation architecture
Yoss Cohen
ย 
PPT
Motion estimation overview
Yoss Cohen
ย 
PPT
Computer Vision - Image Filters
Yoss Cohen
ย 
PPT
DASH and HTTP2.0
Yoss Cohen
ย 
PPT
HEVC Definitions and high-level syntax
Yoss Cohen
ย 
PPT
Introduction to HEVC
Yoss Cohen
ย 
PPT
FFMPEG on android
Yoss Cohen
ย 
PDF
Hands-on Video Course - "RAW Video"
Yoss Cohen
ย 
PDF
Video quality testing
Yoss Cohen
ย 
PPT
HEVC / H265 Hands-On course
Yoss Cohen
ย 
PPT
Web video standards
Yoss Cohen
ย 
PDF
Product wise computer vision development
Yoss Cohen
ย 
Underwater robotics simulation with isaac sim
Yoss Cohen
ย 
Infrared simulation and processing on Nvidia platforms
Yoss Cohen
ย 
open platform for swarm training
Yoss Cohen
ย 
Deep Learning - system view
Yoss Cohen
ย 
Dspip deep learning syllabus
Yoss Cohen
ย 
IoT consideration selection
Yoss Cohen
ย 
IoT evolution
Yoss Cohen
ย 
Nvidia jetson nano bringup
Yoss Cohen
ย 
Autonomous car teleportation architecture
Yoss Cohen
ย 
Motion estimation overview
Yoss Cohen
ย 
Computer Vision - Image Filters
Yoss Cohen
ย 
DASH and HTTP2.0
Yoss Cohen
ย 
HEVC Definitions and high-level syntax
Yoss Cohen
ย 
Introduction to HEVC
Yoss Cohen
ย 
FFMPEG on android
Yoss Cohen
ย 
Hands-on Video Course - "RAW Video"
Yoss Cohen
ย 
Video quality testing
Yoss Cohen
ย 
HEVC / H265 Hands-On course
Yoss Cohen
ย 
Web video standards
Yoss Cohen
ย 
Product wise computer vision development
Yoss Cohen
ย 

Recently uploaded (20)

PDF
Designing Accessible Content Blocks (1).pdf
jaclynmennie1
ย 
PDF
AWS Consulting Services: Empowering Digital Transformation with Nlineaxis
Nlineaxis IT Solutions Pvt Ltd
ย 
PDF
Writing Maintainable Playwright Tests with Ease
Shubham Joshi
ย 
PPTX
Iobit Driver Booster Pro 12 Crack Free Download
chaudhryakashoo065
ย 
PDF
TEASMA: A Practical Methodology for Test Adequacy Assessment of Deep Neural N...
Lionel Briand
ย 
PPTX
CV-Project_2024 version 01222222222.pptx
MohammadSiddiqui70
ย 
PDF
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
ย 
PDF
Humans vs AI Call Agents - Qcall.ai's Special Report
Udit Goenka
ย 
PDF
Azure AI Foundry: The AI app and agent factory
Maxim Salnikov
ย 
PDF
Automated Test Case Repair Using Language Models
Lionel Briand
ย 
PPTX
IObit Driver Booster Pro Crack Download Latest Version
chaudhryakashoo065
ย 
PPTX
IObit Driver Booster Pro 12 Crack Latest Version Download
pcprocore
ย 
PDF
Best Software Development at Best Prices
softechies7
ย 
DOCX
Zoho Creator Solution for EI by Elsner Technologies.docx
Elsner Technologies Pvt. Ltd.
ย 
PPTX
For my supp to finally picking supp that work
necas19388
ย 
PDF
Telemedicine App Development_ Key Factors to Consider for Your Healthcare Ven...
Mobilityinfotech
ย 
PDF
Mastering VPC Architecture Build for Scale from Day 1.pdf
Devseccops.ai
ย 
PPTX
Agentforce โ€“ TDX 2025 Hackathon Achievement
GetOnCRM Solutions
ย 
PPTX
Avast Premium Security crack 25.5.6162 + License Key 2025
HyperPc soft
ย 
PDF
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
ย 
Designing Accessible Content Blocks (1).pdf
jaclynmennie1
ย 
AWS Consulting Services: Empowering Digital Transformation with Nlineaxis
Nlineaxis IT Solutions Pvt Ltd
ย 
Writing Maintainable Playwright Tests with Ease
Shubham Joshi
ย 
Iobit Driver Booster Pro 12 Crack Free Download
chaudhryakashoo065
ย 
TEASMA: A Practical Methodology for Test Adequacy Assessment of Deep Neural N...
Lionel Briand
ย 
CV-Project_2024 version 01222222222.pptx
MohammadSiddiqui70
ย 
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
ย 
Humans vs AI Call Agents - Qcall.ai's Special Report
Udit Goenka
ย 
Azure AI Foundry: The AI app and agent factory
Maxim Salnikov
ย 
Automated Test Case Repair Using Language Models
Lionel Briand
ย 
IObit Driver Booster Pro Crack Download Latest Version
chaudhryakashoo065
ย 
IObit Driver Booster Pro 12 Crack Latest Version Download
pcprocore
ย 
Best Software Development at Best Prices
softechies7
ย 
Zoho Creator Solution for EI by Elsner Technologies.docx
Elsner Technologies Pvt. Ltd.
ย 
For my supp to finally picking supp that work
necas19388
ย 
Telemedicine App Development_ Key Factors to Consider for Your Healthcare Ven...
Mobilityinfotech
ย 
Mastering VPC Architecture Build for Scale from Day 1.pdf
Devseccops.ai
ย 
Agentforce โ€“ TDX 2025 Hackathon Achievement
GetOnCRM Solutions
ย 
Avast Premium Security crack 25.5.6162 + License Key 2025
HyperPc soft
ย 
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
ย 

Intro to machine learning with scikit learn

  • 2. 2 INTRO TO ML PROGRAMMING
  • 3. 3 ML Programming 1. Get Data Get labels for supervised learning 1. Create a classifier 2. Train the classifier 3. Predict test data 4. Evaluate predictor accuracy *Configure and improve by repeating 2-5
  • 5. 5 Get Data & Labels โ€ข Sources โ€“Open data sources โ€“Collect on your own โ€ข Verify data validity and correctness โ€ข Wrangle data โ€“make it readable by computer โ€“Filter it โ€ข Remove Outliers PANDAS Python library could assist in pre- processing & data manipulation before ML http://pandas.pydata.org/
  • 6. 6 Pre-Processing ๏‚งChange formatting ๏‚งRemove redundant data ๏‚งFilter Data (take partial data) ๏‚งRemove Outliers ๏‚งLabel ๏‚งSplit for testing (10/90, 20/80)
  • 7. 7 Data Partitioning โ€ข Data and labels โ€“{[data], [labels]} โ€“{[3,7, 76, 11, 22, 37, 56,2],[T, T, F, T, F, F, F, T]} โ€“Data: [Age, Do you love Nutella?] โ€ข Partitioning will create โ€“{[train data], [train labels],[test data], [test labels]} โ€“We usually split the data on a ration of 9:1 โ€“There is a tradeoff between the effectiveness of the test and the learning we could provide to the classifier โ€ข We will look at a partitioning function later
  • 8. 8 Learn (The โ€œSmart Partโ€) ๏‚งClassification ๏‚งIf the output is discrete to a limited amount of classes (groups) ๏‚งRegression ๏‚งIf the output is continues
  • 10. 10 Create Classifier ๏‚งFor most SUPERVISED LEARNING algorithms this would be ๏‚งC = ClassifyAlg(Params) ๏‚งIts up to us (ML guys) to set the best params ๏‚งHow? 1. We could develop a hunch for it 2. Perform an exhaustive search
  • 11. 11 Train the classifier We assigned C = ClassifyAlg(Params) This is a general algorithm with some initalizer and configurations. In this stage we train it using: C.fit(Data, Labels)
  • 12. 12 Predict ๏‚งAfter we have a trained Algorithm classifier C ๏‚งPrdeicted_Labels = C.predict(Data)
  • 13. 13 Predictor Evaluation ๏‚งWe are not done yet ๏‚งThere is a need to evaluate the predictor accuracy in comparison to other predictors and to the system requirements ๏‚งWe will learn several methods for this
  • 15. 15 The Environment โ€ข There are many existing environments and tools we could use โ€“Matlab with Machine learning toolbox โ€“Apache Mahout โ€“Python with Scikit-learn โ€ข Additional tools โ€“Hadoop / Map-Reduce to accelerate and parallelize large data set processing โ€“Amazon ML tools โ€“NVIDIA Tools
  • 16. 16 Scikit-learn โ€ข Installation Instructions in http://scikit-learn.org/stable/install.html#install-official-release โ€ข Depends on two other libraries โ€ข numpy and scipy โ€ข Easiest way to install on windows: โ€ข Install WinPython http://sourceforge.net/projects/winpython/files/WinPython_2.7/2.7.9.4/ โ€“Lets install this together For Linux / Mac computers just install the 3 libs separately using PIP
  • 18. 18 Data sets ๏‚งThere are many data sets to work on ๏‚งOne of them is the Iris data classification into three groups. It has an interesting story you could google later ๏‚งWell work on the iris data
  • 19. 19 Lab A โ€“ Plot the Iris data ๏‚งPlot septal length vs septal width with labels ONLY ๏‚งHow? Google Iris data and the scikit learn environment ๏‚งTry to understand the second part of the program with the PCA
  • 20. 20 Iris Data import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D from sklearn import datasets iris = datasets.load_iris() X = iris.data[:, :2] # we only take the first two features. Y = iris.target x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5 y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
  • 21. 21 Plot Iris Data plt.figure(2, figsize=(8, 6)) plt.clf() plt.scatter(X[:, 0], X[:, 1], c=Y, cmap=plt.cm.Paired) plt.xlabel('Sepal length') plt.ylabel('Sepal width') plt.xlim(x_min, x_max) plt.ylim(y_min, y_max) plt.xticks(()) plt.yticks(())
  • 22. 22 Add PCA for better classification fig = plt.figure(1, figsize=(8, 6)) ax = Axes3D(fig, elev=-150, azim=110) X_reduced = PCA(n_components=3).fit_transform(iris.data) ax.scatter(X_reduced[:, 0], X_reduced[:, 1], X_reduced[:, 2], c=Y, cmap=plt.cm.Paired) ax.set_title("First three PCA directions") ax.set_xlabel("1st eigenvector") ax.w_xaxis.set_ticklabels([]) ax.set_ylabel("2nd eigenvector") ax.w_yaxis.set_ticklabels([]) ax.set_zlabel("3rd eigenvector") ax.w_zaxis.set_ticklabels([]) plt.show()
  • 24. 24
  • 25. 25 Thank you! More About me: Yossi CohenYossi Cohen [email protected]@gmail.com +972-545-313092+972-545-313092 ๏‚ง Video compression and computer vision enthusiast & lecturer ๏‚ง Surfer