0% found this document useful (0 votes)

26 views45 pages

Python 06 MachineLearning

The document provides an introduction to machine learning algorithms categorized by their output types, including classification, regression, clustering, and more. It discusses the importance of data features and dimensions, feature extraction, and various machine learning models such as decision trees, neural networks, and ensemble methods. Additionally, it highlights the Scikit-Learn library, its features, and the API for building and evaluating machine learning models.

Uploaded by

ashanisharma9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views45 pages

Python 06 MachineLearning

Uploaded by

ashanisharma9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

Introduction to Machine

Learning with Scikit-Learn

Types of Algorithms by Output
Input training data to fit a model which is then
used to predict incoming inputs into ...

Type of Output Algorithm Category

Output is one or more discrete classes Classification (supervised)

Output is continuous Regression (supervised)

Output is membership in a similar group Clustering (unsupervised)

Output is the distribution of inputs Density Estimation

Output is simplified from higher dimensions Dimensionality Reduction

Classification

Given labeled input data (with two or more labels), fit a

function that can determine for any input, what the label is.
Regression

Given continuous input data fit a function that is able to

predict the continuous value of input given other data.
Clustering

Given data, determine a pattern of associated data points

or clusters via their similarity or distance from one another.
Hadley Wickham (2015)
“Model” is an overloaded term.
•Model family describes, at the broadest possible level, the
connection between the variables of interest.
•Model form specifies exactly how the variables of interest
are connected within the framework of the model family.
•A fitted model is a concrete instance of the
model form where all parameters have been
estimated from data, and the model can be
used to generate predictions.

http://had.co.nz/stat645/model-vis.pdf
Dimensions and Features
In order to do machine learning you need a data set containing
instances (examples) that are composed of features from which
you compose dimensions.

Instance: a single data point or example composed of fields

Feature: a quantity describing an instance
Dimension: one or more attributes that describe a property

from sklearn.datasets import l o a d _ d i g i t s

digit s = load_digits()

X= d i g i t s . d a t a # X.shape == (n_samples, n_features)

y = digits.target # y.shape == (n_samples,)
Feature Space
Feature space refers to the n-dimensions where your variables live (not
including a target variable or class). The term is used often in ML literature
because in ML all variables are features (usually) and feature extraction is the
art of creating a space with decision boundaries.

Target
1. Y ≡ Thickness of car tires after some testing period

Variables
1. X1 ≡ distance travelled in test
2. X2 ≡ time duration of test
3. X3 ≡ amount of chemical C in tires

The feature space is R3, or more accurately, the positive quadrant in R3 as all
the X variables can only be positive quantities.

http://stats.stackexchange.com/questions/46425/what-is-feature-space
Mappings
Domain knowledge about tires might suggest that the speed the vehicle was
moving at is important, hence we generate another variable, X4 (this is the
feature extraction part):

X4 = X1*X2 ≡ the speed of the vehicle during testing.

This extends our old feature space into a new one, the positive part of R4.

A mapping is a function, ϕ, from R3 to R4:

ϕ(x1,x2,x3) = (x1,x2,x3,x1x2)

http://stats.stackexchange.com/questions/46425/what-is-feature-space
Your Task
Given a data set of instances of size N, create
a model that is fit from the data (built) by
extracting features and dimensions. Then use
that model to predict outcomes …
1. Data Wrangling (normalization, standardization, imputing)
2. Feature Analysis/Extraction
3. Model Selection/Building
4. Model Evaluation
5. Operationalize Model
A Tour of Machine Learning
Algorithms
Models: Instance Methods
Compare instances in data set with a similarity
measure to find best matches.
- Suffers from curse of dimensionality.
- Focus on feature representation and
similarity metrics between instances

● k-Nearest Neighbors (kNN)

● Self-Organizing Maps (SOM)
● Learning Vector Quantization (LVQ)
Self-Organizing Maps
Models: Regression
Model relationship of independent variables, X
to dependent variable Y by iteratively
optimizing error made in predictions.

● Ordinary Least Squares

● Logistic Regression
● Stepwise Regression
● Multivariate Adaptive Regression Splines (MARS)
● Locally Estimated Scatterplot Smoothing (LOESS)
Logistic Regression
Multivariate Adaptive Regression Splines (MARS)
Locally Estimated Scatterplot Smoothing (LOESS)
• Combine multiple regression models
in a k-nearest-neighbor-based meta-
model
• Fits a low-degree polynomial to a
subset of the data close to the current
point
• Requires fairly large, densely sampled
data sets in order to produce good
models.
Models: Regularization Methods
Extend another method (usually regression),
penalizing complexity (minimize overfit)
- simple, popular, powerful
- better at generalization

● Ridge Regression
● LASSO (Least Absolute Shrinkage & Selection Operator)
● Elastic Net
Models: Regularization Methods
LASSO
• Limits total weight of parameters
• Can be interpreted as a prior distribution on parameters

• Ridge regression: quadratic penalty

• Elastic Net combines both Laplace prior distributions
Models: Decision Trees
Model of decisions based on data attributes.
Predictions are made by following forks in a
tree structure until a decision is made. Used for
classification & regression.

● Classification and Regression Tree (CART)

● Decision Stump
● Random Forest
● Multivariate Adaptive Regression Splines (MARS)
● Gradient Boosting Machines (GBM)
Models: Decision Trees

http://www.saedsayad.com/decision_tree.htm
Models: Bayesian
Explicitly apply Bayes’ Theorem for
classification and regression tasks. Usually by
fitting a probability function constructed via the
chain rule and a naive simplification of Bayes.

● Naive Bayes
● Averaged One-Dependence Estimators (AODE)
● Bayesian Belief Network (BBN)
Naive Bayes

- Text retrieval 1960s

- Independence of feature values (given class)
- Bayesian theorem

Probability distribution:

- Voting for max. posterior probability (MAP)

Models: Kernel Methods
Map input data into higher dimensional vector
space where the problem is easier to model.
Named after the “kernel trick” which computes
the inner product of images of pairs of data.

● Support Vector Machines (SVM)

● Radial Basis Function (RBF)
● Linear Discriminant Analysis (LDA)
SVM
SVM
Models: Clustering Methods
Organize data into into groups whose members
share maximum similarity (defined usually by a
distance metric). Two main approaches:
centroids and hierarchical clustering.

● k-Means
● Affinity Propegation
● OPTICS (Ordering Points to Identify Cluster Structure)
● Agglomerative Clustering
K-means clustering
Models: Artificial Neural Networks
Inspired by biological neural networks, ANNs are
nonlinear function approximators that estimate
functions with a large number of inputs.
- System of interconnected neurons that activate
- Deep learning extends simple networks recursively

● Restricted Boltzmann Machine (RBM)

● Convolutional Neural Networks (CNN)
● Recurrent Neural Networks (RNN)
● Word2Vec models
Models: Artificial Neural Networks
Models: Ensembles
Models composed of multiple weak models that
are trained independently and whose outputs
are combined to make an overall prediction.

● Boosting
● Bootstrapped Aggregation (Bagging)
● AdaBoost
● Stacked Generalization (blending)
● Gradient Boosting Machines (GBM)
● Random Forest
AdaBoost
AdaBoost
Models: Other
The list before was not comprehensive, other
algorithm and model classes include:
● Conditional Random Fields (CRF)
● Markovian Models (HMMs)
● Dimensionality Reduction (PCA, PLS)
● Rule Learning (Apriori, Brill)
● More ...
What is Scikit-Learn?
Extensions to SciPy (Scientific Python) are
called SciKits. SciKit-Learn provides machine
learning algorithms.
● Algorithms for supervised & unsupervised learning
● Built on SciPy and Numpy
● Standard Python API interface
● Sits on top of c libraries, LAPACK, LibSVM, and Cython
● Open Source: BSD License (part of Linux)

Probably the best general ML framework out there.

Primary Features
- Generalized Linear Models
- SVMs, kNN, Bayes, Decision Trees, Ensembles
- Clustering and Density algorithms
- Cross Validation
- Grid Search
- Pipelining
- Model Evaluations
- Dataset Transformations
- Dataset Loading
A Guide to Scikit-Learn
Scikit-Learn API
Object-oriented interface centered around the
concept of an Estimator:
“An estimator is any object that learns from data; it may
be a classification, regression or clustering algorithm or
a transformer that extracts/filters useful features from
raw data.”

- Scikit-Learn Tutorial
class Estimator(object):

def f i t ( s e l f , X, y=None):
" " " F i t s estimator to d a t a . " " "
# set state o f ` ` s e l f ` `
returns e l f

def p r e d i c t ( s e l f , X ) :
" " " P r e d i c t response o f` ` X ` ` . " " "
# compute p r e d i c t i o n s ` ` p r e d ` `
return pred

The Scikit-Learn Estimator API

Estimators
- f i t ( X , y ) sets the state of the estimator.
- X is usually a 2D numpy array of shape
(num_samples, num_features).
- y is a 1D array with shape (n_samples,)
- p r e d i c t ( X ) returns the class or value
- predict_proba() returns a 2D array of
shape (n_samples, n_classes)
from sklearn import svm

estimato r = svm. SVC(gamma=0.001)

e s t i m a t o r. f i t ( X , y)
e s t i m a t o r. p r e d i c t ( x )

Basic methodology
Wrapping fit and predict
We’ve already discussed a broad workflow, the
following is a development workflow:

Feature Feature
Raw Data
Extraction Evaluation

Load &
Build Model Evaluate Model
Transform Data
Task 6
- Select dataset (wines / student performance)
- Apply various learning algorithms on the
problem
- Provide best possible prediction w.r.t. RMSE

Best bet to start with:

- Student performance: Random forrest /
boosting trees
- Wines: MARS / LASSO / Bagging linear
model
Semestrální projekt – varianta 2
• Drug-target interaction prediction
– Predikce interakcí mezi léčivými látkami a proteiny
– DTInet dataset
• https://github.com/luoyunan/DTINet
– Známé interakce (Binary)
– Strukturální podobnost drugs i targets [0,1]
– Namapování na nemoci a side effects
– Evaluace:
• Cílem je ranking prediction (seřazení objektů od nejlepšího po nejhorší)
• 10-fold cross validace (náhodně se skryje 10% interakcí, vaším úkolem je co možná
nejlépe je seřadit, zdroják bude součástí zadání)
• Area Under ROC Curve, Area Under Precision-Recall Curve
– Odevzdané řešení musí být lepší než základní baseline: BLM
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2735674/ (zdrojáky budou k
dispozici)
– Typické algoritmy:
• faktorizace matic e interakcí (případně obohacená o externí data)
• Nearest neighbors a grafové algoritmy
• Lokální modely predikující hrany na základě vlastností interagující drug a target

Algorithmeknn 121213175830 Phpapp02
No ratings yet
Algorithmeknn 121213175830 Phpapp02
52 pages
Data Science in FInancial Services - 3
No ratings yet
Data Science in FInancial Services - 3
76 pages
Lecture Notes 1 2 Intro Python
No ratings yet
Lecture Notes 1 2 Intro Python
13 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
ML - Interview Prep
No ratings yet
ML - Interview Prep
9 pages
Final ML
No ratings yet
Final ML
2 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
ML Models
No ratings yet
ML Models
21 pages
Session 5
No ratings yet
Session 5
36 pages
16 Comparison of Data Science Algorithms
No ratings yet
16 Comparison of Data Science Algorithms
13 pages
Module 1 ML Mumbai University
No ratings yet
Module 1 ML Mumbai University
47 pages
Aiya Session 4
No ratings yet
Aiya Session 4
42 pages
Optimization Problems For Machine Learning: A Survey
No ratings yet
Optimization Problems For Machine Learning: A Survey
41 pages
Accelerated Data Science Introduction To Machine Learning Algorithms
No ratings yet
Accelerated Data Science Introduction To Machine Learning Algorithms
37 pages
Week 09 Lesson 1 Intro Machine Learning 1 To 32
No ratings yet
Week 09 Lesson 1 Intro Machine Learning 1 To 32
61 pages
Machine Learning
No ratings yet
Machine Learning
51 pages
AIML
No ratings yet
AIML
30 pages
Understanding Machine Learning Algorithms - in Depth
No ratings yet
Understanding Machine Learning Algorithms - in Depth
167 pages
ML Workshop
No ratings yet
ML Workshop
78 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
5 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
Machine Learning
No ratings yet
Machine Learning
17 pages
Machine Learning Notes ?
No ratings yet
Machine Learning Notes ?
14 pages
PRCV Viva Notes
No ratings yet
PRCV Viva Notes
32 pages
Data Science Crash Course
100% (1)
Data Science Crash Course
32 pages
CSC413 Lecture Note
No ratings yet
CSC413 Lecture Note
32 pages
ML
No ratings yet
ML
49 pages
Supervised Learning
No ratings yet
Supervised Learning
6 pages
End SEM V IMP DSE 2
No ratings yet
End SEM V IMP DSE 2
9 pages
ChatGPT - Machine Learning Overview
No ratings yet
ChatGPT - Machine Learning Overview
34 pages
ML ModuleUntitled 2
No ratings yet
ML ModuleUntitled 2
8 pages
Introduction To AI
No ratings yet
Introduction To AI
51 pages
Aiml Model
No ratings yet
Aiml Model
13 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
39 pages
Research Trends in Machine Learning: Muhammad Kashif Hanif
No ratings yet
Research Trends in Machine Learning: Muhammad Kashif Hanif
80 pages
MODELS (AutoRecovered)
No ratings yet
MODELS (AutoRecovered)
9 pages
Presentation On ML
No ratings yet
Presentation On ML
469 pages
Data Science II: Charles C.N. Wang
No ratings yet
Data Science II: Charles C.N. Wang
38 pages
Machine Learning Super Cheatsheet (Prof. Pedram Jahangiry)
No ratings yet
Machine Learning Super Cheatsheet (Prof. Pedram Jahangiry)
2 pages
365 ML Infographic
No ratings yet
365 ML Infographic
1 page
Week 01
No ratings yet
Week 01
37 pages
Deep Learning Techniques
No ratings yet
Deep Learning Techniques
65 pages
Week 4 - Intro To ML
No ratings yet
Week 4 - Intro To ML
37 pages
IT 802 ML Unit-2 Notes
No ratings yet
IT 802 ML Unit-2 Notes
19 pages
4 DL
No ratings yet
4 DL
81 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
5.1 Large Scale ML
No ratings yet
5.1 Large Scale ML
10 pages
Module 3 Data Science Machine Learning
No ratings yet
Module 3 Data Science Machine Learning
53 pages
ML Notes
No ratings yet
ML Notes
12 pages
Top 10 Machine Learning Algorithms With Their Use
100% (1)
Top 10 Machine Learning Algorithms With Their Use
12 pages
An Introduction To Supervised Machine Learning and Pattern Classification - The Big Picture
No ratings yet
An Introduction To Supervised Machine Learning and Pattern Classification - The Big Picture
55 pages
Machine Learning Engineer Interview Preparation Guide
No ratings yet
Machine Learning Engineer Interview Preparation Guide
14 pages
Lecture 5
No ratings yet
Lecture 5
26 pages
Chapter 7
No ratings yet
Chapter 7
64 pages
Machine
No ratings yet
Machine
61 pages
ML Algorithms Week 3
No ratings yet
ML Algorithms Week 3
30 pages
Practical VI Sem (Revised)
No ratings yet
Practical VI Sem (Revised)
1 page
Definitions, Notations and Examples
No ratings yet
Definitions, Notations and Examples
13 pages
د سارة محمد المشهداني هبة عمار يحيى
No ratings yet
د سارة محمد المشهداني هبة عمار يحيى
39 pages
Some Aspects of Groups Acting On Finite Posets: Journal of Combinatorial Theory
No ratings yet
Some Aspects of Groups Acting On Finite Posets: Journal of Combinatorial Theory
30 pages
M10 Ch5 ContinProbNotes UnifExp With Inv 2019W
No ratings yet
M10 Ch5 ContinProbNotes UnifExp With Inv 2019W
8 pages
Aniket Result of 4 Sem
No ratings yet
Aniket Result of 4 Sem
1 page
21CSC305P Machine Learning C Professional Core L T P C 2 1 0 3
No ratings yet
21CSC305P Machine Learning C Professional Core L T P C 2 1 0 3
2 pages
Pallet Recognition For Forklift
No ratings yet
Pallet Recognition For Forklift
5 pages
Support Vector Machines PDF
100% (1)
Support Vector Machines PDF
37 pages
Impact of Business Analytics and Enterprise Systems On Managerial Accounting
No ratings yet
Impact of Business Analytics and Enterprise Systems On Managerial Accounting
16 pages
Sat 29.PDF Spam Checker
No ratings yet
Sat 29.PDF Spam Checker
11 pages
Fin Irjmets1714329036
No ratings yet
Fin Irjmets1714329036
6 pages
Racine, Su, Ullah - Unknown - Applied Nonparametric & Semiparametric Econometrics & Statistics PDF
No ratings yet
Racine, Su, Ullah - Unknown - Applied Nonparametric & Semiparametric Econometrics & Statistics PDF
562 pages
Anomaly Detection For A Water Treatment System Using Unsupervised Machine Learning
No ratings yet
Anomaly Detection For A Water Treatment System Using Unsupervised Machine Learning
8 pages
A Hybrid Support Vector Regression With Ant Colony Optimization Algorithm in Estimation of Safety Factor For Circular Failure Slope
No ratings yet
A Hybrid Support Vector Regression With Ant Colony Optimization Algorithm in Estimation of Safety Factor For Circular Failure Slope
13 pages
Machine Learning Approaches in Stock Market Prediction A
No ratings yet
Machine Learning Approaches in Stock Market Prediction A
8 pages
Shreya Patel Resume PDF
No ratings yet
Shreya Patel Resume PDF
2 pages
Artificial Intelligence in Tomato Leaf Disease Detection: A Comprehensive Review and Discussion
No ratings yet
Artificial Intelligence in Tomato Leaf Disease Detection: A Comprehensive Review and Discussion
20 pages
Unit 5 Machine Learning With PU Solution
No ratings yet
Unit 5 Machine Learning With PU Solution
68 pages
Mobile Based Student Attendance System Using Geo Fencing 131mn39g
No ratings yet
Mobile Based Student Attendance System Using Geo Fencing 131mn39g
16 pages
Hybrid Models For Intraday Stock Price Forecasting Based On Artificial
No ratings yet
Hybrid Models For Intraday Stock Price Forecasting Based On Artificial
10 pages
Module-I Machine Learning1
No ratings yet
Module-I Machine Learning1
20 pages
Sustainability 15 02754 v2 1
No ratings yet
Sustainability 15 02754 v2 1
14 pages
Machine Learning Classification of Price Extrema B
No ratings yet
Machine Learning Classification of Price Extrema B
25 pages
1 s2.0 S0360319922040058 Main
No ratings yet
1 s2.0 S0360319922040058 Main
18 pages
IOT Report On CorrAUC
No ratings yet
IOT Report On CorrAUC
6 pages
Breast Cancer Diagnosis Using Machine
No ratings yet
Breast Cancer Diagnosis Using Machine
11 pages
Machine Learning
No ratings yet
Machine Learning
48 pages
A Sensitivity Analysis of Convolutional Neural Networks For Sentence Classification
No ratings yet
A Sensitivity Analysis of Convolutional Neural Networks For Sentence Classification
18 pages
SVM
No ratings yet
SVM
12 pages
Research On Prediction of Multi-Class Theft Crimes by An Optimized Decomposition and Fusion Method Based On XGBoost
No ratings yet
Research On Prediction of Multi-Class Theft Crimes by An Optimized Decomposition and Fusion Method Based On XGBoost
10 pages
Fraud App Detection: Jyoti Singh, Lakshita Suthar, Diksha Khabya, Simmi Pachori, Nikita Somani, Dr. Mayank Patel
No ratings yet
Fraud App Detection: Jyoti Singh, Lakshita Suthar, Diksha Khabya, Simmi Pachori, Nikita Somani, Dr. Mayank Patel
6 pages
COTTON LEAF DISEASE WORD FILE - pdf111
No ratings yet
COTTON LEAF DISEASE WORD FILE - pdf111
57 pages
R004-Application of Machine Learning For Fuel Consumption Modelling of Trucks
No ratings yet
R004-Application of Machine Learning For Fuel Consumption Modelling of Trucks
6 pages
MTech Syllabus
No ratings yet
MTech Syllabus
49 pages

Python 06 MachineLearning

Uploaded by

Python 06 MachineLearning

Uploaded by

Introduction to Machine

Learning with Scikit-Learn

Type of Output Algorithm Category

Output is one or more discrete classes Classification (supervised)

Output is continuous Regression (supervised)

Output is membership in a similar group Clustering (unsupervised)

Output is the distribution of inputs Density Estimation

Output is simplified from higher dimensions Dimensionality Reduction

Given labeled input data (with two or more labels), fit a

Given continuous input data fit a function that is able to

Given data, determine a pattern of associated data points

Instance: a single data point or example composed of fields

from sklearn.datasets import l o a d _ d i g i t s

X= d i g i t s . d a t a # X.shape == (n_samples, n_features)

X4 = X1*X2 ≡ the speed of the vehicle during testing.

A mapping is a function, ϕ, from R3 to R4:

● k-Nearest Neighbors (kNN)

● Ordinary Least Squares

• Ridge regression: quadratic penalty

● Classification and Regression Tree (CART)

- Text retrieval 1960s

- Voting for max. posterior probability (MAP)

● Support Vector Machines (SVM)

● Restricted Boltzmann Machine (RBM)

Probably the best general ML framework out there.

The Scikit-Learn Estimator API

estimato r = svm. SVC(gamma=0.001)

Best bet to start with:

You might also like