SlideShare a Scribd company logo
Feature
Preprocessing
Raw Data
Feature
Selection
Feature
Model
Selection
Data Cleaning
Automated Machine Learning in Python
PyCon JP 2019
AI Lab
Python AutoML
Feature
Preprocessing
Raw Data
Feature
Selection
Feature
Model
Selection
Data Cleaning
CyberAgent AI Lab
Masashi SHIBATA
c-bata c_bata_
Python
Feature
Preprocessing
Feature
Selection
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning
Feature
Preprocessing
Feature
Selection
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning
1
2
3
4
Automated Feature Engineering
AutoML
Automated Hyperparameter Optimization
Automated Algorithm(Model) Selection
Feature
Preprocessing
Feature
Selection
Feature
Construction
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning
Topic 1
AutoML


Jeff Dean, “An Overview of Google's Work on AutoML and Future Directions” , ICML 2019 

https://slideslive.com/38917182/an-overview-of-googles-work-on-automl-and-future-directions


Automated Hyperparameter Optimization

Hyperopt, Optuna, SMAC3, scikit-optimize, …
Jeff Dean, “An Overview of Google's Work on AutoML and Future Directions” , ICML 2019 

https://slideslive.com/38917182/an-overview-of-googles-work-on-automl-and-future-directions
Jeff Dean, “An Overview of Google's Work on AutoML and Future Directions” , ICML 2019 

https://slideslive.com/38917182/an-overview-of-googles-work-on-automl-and-future-directions


HPO + Automated Feature Engineering

featuretools, tsfresh, boruta, …


Automated Algorithm(Model) Selection

Auto-sklearn, TPOT, H2O, auto_ml, MLBox, …
Jeff Dean, “An Overview of Google's Work on AutoML and Future Directions” , ICML 2019 

https://slideslive.com/38917182/an-overview-of-googles-work-on-automl-and-future-directions
Feature
Preprocessing
Feature
Selection
Feature
Construction
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning
Topic 2
Grid Search / Random Search
PythonとAutoML at PyConJP 2019
Eric Brochu, Vlad M. Cora, and Nando de Freitas. A tutorial on Bayesian optimization of expensive cost functions,

with application to active user modeling and hierarchical reinforcement learning. 2010. arXiv:1012.2599.
Eric Brochu, Vlad M. Cora, and Nando de Freitas. A tutorial on Bayesian optimization of expensive cost functions,

with application to active user modeling and hierarchical reinforcement learning. 2010. arXiv:1012.2599.


: 

:
Eric Brochu, Vlad M. Cora, and Nando de Freitas. A tutorial on Bayesian optimization of expensive cost functions,

with application to active user modeling and hierarchical reinforcement learning. 2010. arXiv:1012.2599.






Eric Brochu, Vlad M. Cora, and Nando de Freitas. A tutorial on Bayesian optimization of expensive cost functions,

with application to active user modeling and hierarchical reinforcement learning. 2010. arXiv:1012.2599.




Jamieson, K. G. and Talwalkar, A. S.: Non-stochastic Best Arm Identification

and Hyperparameter Optimization, in AIS-TATS (2016).
10 epochs
trial #1
trial #2
trial #3
trial #4
trial #5
trial #6
trial #7
trial #8
trial #9
30 epochs 90 epochs
Liam Li, Kevin Jamieson, Afshin Rostamizadeh, Ekaterina Gonina,
Moritz Hardt, Benjamin Recht, and Ameet Talwalkar.

Massively parallel hyperparameter tuning. arXiv preprint arXiv:1810.05934, 2018.


PythonとAutoML at PyConJP 2019
PythonとAutoML at PyConJP 2019
PythonとAutoML at PyConJP 2019
TPE, Asynchronous Successive Halving, Median Stopping Rule
Define-by-Run
https://github.com/pfnet/optuna
PythonとAutoML at PyConJP 2019
rung 0
0.088 0.056 0.035 0.027
0.495
0.122
0.150
0.788
0.093
0.115
0.238
0.106
0.104 0.058
trial 0
trial 4
trial 2
trial 6
trial 24
trial 1
trial 8
trial 5
trial 18
trial 3
trial 7
rung 1 rung 2 rung 3
rung


10 worker 









scikit-optimize


Feature
Preprocessing
Feature
Selection
Feature
Construction
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning
Topic 3
1. Feature Preprocessing Operators.

StandardScaler, RobustScaler,
MinMaxScaler, MaxAbsScaler,
RandomizedPCA, Binarizer, and
PolynomialFeatures.

2. Feature Selection Operators:
VarianceThreshold, SelectKBest,
SelectPercentile, SelectFwe, and
Recursive Feature Elimination (RFE).
AutoML feature preprocessing
M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter.

Efficient and robust automated machine learning.

In Neural Information Processing Systems (NIPS), 2015
R. S. Olson and J. H. Moore. Tpot: A tree-based pipeline optimization

tool for automating machine learning.

In Workshop on Automatic Machine Learning, 2016
TPOT Auto-sklearn
1. Feature Preprocessing Operators.

StandardScaler, RobustScaler,
MinMaxScaler, MaxAbsScaler,
RandomizedPCA, Binarizer, and
PolynomialFeatures.

2. Feature Selection Operators:
VarianceThreshold, SelectKBest,
SelectPercentile, SelectFwe, and
Recursive Feature Elimination (RFE).
AutoML feature preprocessing
M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter.

Efficient and robust automated machine learning.

In Neural Information Processing Systems (NIPS), 2015
R. S. Olson and J. H. Moore. Tpot: A tree-based pipeline optimization

tool for automating machine learning.

In Workshop on Automatic Machine Learning, 2016
TPOT Auto-sklearn
TPOT Auto-sklearn AutoML
• featuretools: Deep feature synthesis

• tsfresh

Wrapper methods, Filter methods, Embedded methods

• scikit-learn Boruta
Feature
Preprocessing
Feature
Selection
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning
PythonとAutoML at PyConJP 2019
PythonとAutoML at PyConJP 2019








63 

794
Time Series FeatuRE extraction based on Scalable Hypothesis tests
https://github.com/blue-yonder/tsfresh
Feature
Preprocessing
Feature
Selection
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning
Guyon and A. Elisseeff. An introduction to variable and feature selection.

Journal of Machine Learning Research, 3:1157–1182, 2003.
Filter method
Wrapper method
sklearn.feature_selection.RFE(Recursive Feature Elimination), Boruta (boruta_py)
Embedded method
scikit-learn feature_importances_
Guyon and A. Elisseeff. An introduction to variable and feature selection.

Journal of Machine Learning Research, 3:1157–1182, 2003.
Feature
Preprocessing
Feature
Selection
Feature
Construction
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning
Topic 4
( )
https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
• AutoML 2 

• ML 

• ML 

• 

AutoML as a CASH Problem
Combined Algorithm Selection and Hyperparameter optimization
M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter.

Efficient and robust automated machine learning. In Neural Information Processing Systems (NIPS), 2015
Using Optuna for CASH Problems
def objective(trial):
iris = sklearn.datasets.load_iris()
x, y = iris.data, iris.target
classifier_name = trial.suggest_categorical('classifier', ['SVC', 'RandomForest'])
if classifier_name == 'SVC':
svc_c = trial.suggest_loguniform('svc_c', 1e-10, 1e10)
classifier_obj = sklearn.svm.SVC(C=svc_c, gamma='auto')
else:
rf_max_depth = int(trial.suggest_loguniform('rf_max_depth', 2, 32))
classifier_obj = sklearn.ensemble.RandomForestClassifier(
max_depth=rf_max_depth, n_estimators=10)
score = sklearn.model_selection.cross_val_score(
classifier_obj, x, y, n_jobs=-1, cv=3)
accuracy = score.mean()
return accuracy
https://github.com/pfnet/optuna/blob/v0.16.0/examples/sklearn_simple.py
Optuna for CASH Problem
def objective(trial):
iris = sklearn.datasets.load_iris()
x, y = iris.data, iris.target
classifier_name = trial.suggest_categorical('classifier', ['SVC', 'RandomForest'])
if classifier_name == 'SVC':
svc_c = trial.suggest_loguniform('svc_c', 1e-10, 1e10)
classifier_obj = sklearn.svm.SVC(C=svc_c, gamma='auto')
else:
rf_max_depth = int(trial.suggest_loguniform('rf_max_depth', 2, 32))
classifier_obj = sklearn.ensemble.RandomForestClassifier(
max_depth=rf_max_depth, n_estimators=10)
score = sklearn.model_selection.cross_val_score(
classifier_obj, x, y, n_jobs=-1, cv=3)
accuracy = score.mean()
return accuracy
https://github.com/pfnet/optuna/blob/v0.16.0/examples/sklearn_simple.py
Algorithm Selection
Optuna for CASH Problem
def objective(trial):
iris = sklearn.datasets.load_iris()
x, y = iris.data, iris.target
classifier_name = trial.suggest_categorical('classifier', ['SVC', 'RandomForest'])
if classifier_name == 'SVC':
svc_c = trial.suggest_loguniform('svc_c', 1e-10, 1e10)
classifier_obj = sklearn.svm.SVC(C=svc_c, gamma='auto')
else:
rf_max_depth = int(trial.suggest_loguniform('rf_max_depth', 2, 32))
classifier_obj = sklearn.ensemble.RandomForestClassifier(
max_depth=rf_max_depth, n_estimators=10)
score = sklearn.model_selection.cross_val_score(
classifier_obj, x, y, n_jobs=-1, cv=3)
accuracy = score.mean()
return accuracy
https://github.com/pfnet/optuna/blob/v0.16.0/examples/sklearn_simple.py
Hyperparameter optimization
• auto-sklearn

• TPOT

• h2o-3

• auto_ml (unmaintained)

• MLBox
Adithya Balaji, Alexander Allen Benchmarking Automatic Machine Learning Frameworks https://arxiv.org/pdf/1808.06492v1.pdf
AutoML
SMAC3
ChaLearn AutoML challenge 2 track
Auto-sklearn
import sklearn.metrics
import autosklearn.classification
X_train, X_test, y_train, y_test = train_test_split(…)
automl = autosklearn.classification.AutoSklearnClassifier(…)
automl.fit(X_train.copy(), y_train.copy(), dataset_name='breast_cancer')
print(automl.show_models())
predictions = automl.predict(X_test)
print("Accuracy score", sklearn.metrics.accuracy_score(y_test, predictions))
https://github.com/automl/auto-sklearn
TPOT
https://github.com/EpistasisLab/tpot
Tree-based Pipeline Optimization Tool for Automating Data Science


TPOT
https://github.com/EpistasisLab/tpot
from tpot import TPOTClassifier
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(…)
tpot = TPOTClassifier(verbosity=2, max_time_mins=2)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot_iris_pipeline.py')
• 20 preprocessors

• 16 feature selectors,

• 1-hot encoding, missing
value imputation,
balancing, scaling

• 17 classifiers

• pre-defined
hyperparameter spaces
• 20 preprocessors

• 12 classifiers

• pre-defined
hyperparameter spaces

• Pipeline search space:

• Flexible: combining
tree-shaped pipelines
Auto-sklearn TPOT
Automated Neural
Architecture Search
Feature
Preprocessing
Feature
Selection
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning


THANK YOU


PythonとAutoML at PyConJP 2019





More Related Content

PDF
強化学習の実適用に向けた課題と工夫
PDF
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
PDF
PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健
PPTX
押さえておきたい、PostgreSQL 13 の新機能!! (PostgreSQL Conference Japan 2020講演資料)
PPTX
強化学習における好奇心
PDF
エンジニアも知っておきたいAI倫理のはなし
PDF
ナレッジグラフ推論チャレンジの紹介
PDF
幾何と機械学習: A Short Intro
強化学習の実適用に向けた課題と工夫
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健
押さえておきたい、PostgreSQL 13 の新機能!! (PostgreSQL Conference Japan 2020講演資料)
強化学習における好奇心
エンジニアも知っておきたいAI倫理のはなし
ナレッジグラフ推論チャレンジの紹介
幾何と機械学習: A Short Intro

What's hot (20)

PDF
UE4ディープラーニングってやつでなんとかして!環境構築編(Python3+TensorFlow)
PDF
論文紹介 Anomaly Detection using One-Class Neural Networks (修正版
PDF
Tokyo.R 41 サポートベクターマシンで眼鏡っ娘分類システム構築
PDF
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング
PDF
計算論的学習理論入門 -PAC学習とかVC次元とか-
PDF
数式を綺麗にプログラミングするコツ #spro2013
PDF
解説#78 誤差逆伝播
PDF
機械学習モデルの判断根拠の説明
PPTX
先駆者に学ぶ MLOpsの実際
PDF
Rethinking and Beyond ImageNet
PPTX
遺伝的アルゴリズム・遺伝的プログラミング
PDF
機械学習モデルの判断根拠の説明(Ver.2)
PPTX
【DL輪読会】SimCSE: Simple Contrastive Learning of Sentence Embeddings (EMNLP 2021)
PPTX
MLflowで学ぶMLOpsことはじめ
PDF
Cuda fortranの利便性を高めるfortran言語の機能
PDF
グラフニューラルネットワークとグラフ組合せ問題
PDF
BigQuery MLの行列分解モデルを 用いた推薦システムの基礎
PPTX
[DL輪読会] マルチエージェント強化学習と心の理論
PDF
ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement Learning
PPTX
MLOpsはバズワード
UE4ディープラーニングってやつでなんとかして!環境構築編(Python3+TensorFlow)
論文紹介 Anomaly Detection using One-Class Neural Networks (修正版
Tokyo.R 41 サポートベクターマシンで眼鏡っ娘分類システム構築
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング
計算論的学習理論入門 -PAC学習とかVC次元とか-
数式を綺麗にプログラミングするコツ #spro2013
解説#78 誤差逆伝播
機械学習モデルの判断根拠の説明
先駆者に学ぶ MLOpsの実際
Rethinking and Beyond ImageNet
遺伝的アルゴリズム・遺伝的プログラミング
機械学習モデルの判断根拠の説明(Ver.2)
【DL輪読会】SimCSE: Simple Contrastive Learning of Sentence Embeddings (EMNLP 2021)
MLflowで学ぶMLOpsことはじめ
Cuda fortranの利便性を高めるfortran言語の機能
グラフニューラルネットワークとグラフ組合せ問題
BigQuery MLの行列分解モデルを 用いた推薦システムの基礎
[DL輪読会] マルチエージェント強化学習と心の理論
ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement Learning
MLOpsはバズワード
Ad

Similar to PythonとAutoML at PyConJP 2019 (20)

PPTX
Automated Machine Learning (Auto ML)
PDF
Effective data pre-processing for AutoML
PDF
Automated Machine Learning
PDF
AutoML lectures (ACDL 2019)
PDF
Machine Learning Crash Course by Sebastian Raschka
PDF
Automated Hyperparameter Tuning, Scaling and Tracking
PDF
Automatic Machine Learning, AutoML
PDF
Making Netflix Machine Learning Algorithms Reliable
PDF
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
PPTX
Machine learning to solve bioinformatics problems
PDF
Open and Automated Machine Learning
PDF
AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использо...
PDF
XGBoost @ Fyber
PPTX
Automated Machine Learning and SmartML
PDF
Advanced Model Comparison and Automated Deployment Using ML
PPTX
Everything you need to know about AutoML
PDF
Hands_On_Machine_Learning_with_Scikit_Le.pdf
PDF
Practical Data Science Workshop - Recommendation Systems - Collaborative Filt...
PDF
Machine learning for IoT - unpacking the blackbox
PDF
Automatic machine learning (AutoML) 101
Automated Machine Learning (Auto ML)
Effective data pre-processing for AutoML
Automated Machine Learning
AutoML lectures (ACDL 2019)
Machine Learning Crash Course by Sebastian Raschka
Automated Hyperparameter Tuning, Scaling and Tracking
Automatic Machine Learning, AutoML
Making Netflix Machine Learning Algorithms Reliable
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
Machine learning to solve bioinformatics problems
Open and Automated Machine Learning
AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использо...
XGBoost @ Fyber
Automated Machine Learning and SmartML
Advanced Model Comparison and Automated Deployment Using ML
Everything you need to know about AutoML
Hands_On_Machine_Learning_with_Scikit_Le.pdf
Practical Data Science Workshop - Recommendation Systems - Collaborative Filt...
Machine learning for IoT - unpacking the blackbox
Automatic machine learning (AutoML) 101
Ad

More from Masashi Shibata (20)

PDF
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
PDF
実践Djangoの読み方 - みんなのPython勉強会 #72
PDF
CMA-ESサンプラーによるハイパーパラメータ最適化 at Optuna Meetup #1
PDF
サイバーエージェントにおけるMLOpsに関する取り組み at PyDataTokyo 23
PDF
Implementing sobol's quasirandom sequence generator
PDF
DARTS: Differentiable Architecture Search at 社内論文読み会
PDF
Goptuna Distributed Bayesian Optimization Framework at Go Conference 2019 Autumn
PDF
Djangoアプリのデプロイに関するプラクティス / Deploy django application
PDF
Django REST Framework における API 実装プラクティス | PyCon JP 2018
PDF
Django の認証処理実装パターン / Django Authentication Patterns
PDF
RTMPのはなし - RTMP1.0の仕様とコンセプト / Concepts and Specification of RTMP
PDF
システムコールトレーサーの動作原理と実装 (Writing system call tracer for Linux/x86)
PDF
Golangにおける端末制御 リッチなターミナルUIの実現方法
PDF
How to develop a rich terminal UI application
PDF
Introduction of Feedy
PDF
Webフレームワークを作ってる話 #osakapy
PDF
Pythonのすすめ
PDF
pandasによるデータ加工時の注意点やライブラリの話
PDF
Pythonistaのためのデータ分析入門 - C4K Meetup #3
PDF
テスト駆動開発入門 - C4K Meetup#2
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
実践Djangoの読み方 - みんなのPython勉強会 #72
CMA-ESサンプラーによるハイパーパラメータ最適化 at Optuna Meetup #1
サイバーエージェントにおけるMLOpsに関する取り組み at PyDataTokyo 23
Implementing sobol's quasirandom sequence generator
DARTS: Differentiable Architecture Search at 社内論文読み会
Goptuna Distributed Bayesian Optimization Framework at Go Conference 2019 Autumn
Djangoアプリのデプロイに関するプラクティス / Deploy django application
Django REST Framework における API 実装プラクティス | PyCon JP 2018
Django の認証処理実装パターン / Django Authentication Patterns
RTMPのはなし - RTMP1.0の仕様とコンセプト / Concepts and Specification of RTMP
システムコールトレーサーの動作原理と実装 (Writing system call tracer for Linux/x86)
Golangにおける端末制御 リッチなターミナルUIの実現方法
How to develop a rich terminal UI application
Introduction of Feedy
Webフレームワークを作ってる話 #osakapy
Pythonのすすめ
pandasによるデータ加工時の注意点やライブラリの話
Pythonistaのためのデータ分析入門 - C4K Meetup #3
テスト駆動開発入門 - C4K Meetup#2

Recently uploaded (20)

PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Computer network topology notes for revision
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Introduction to the R Programming Language
PPTX
Database Infoormation System (DBIS).pptx
PPTX
modul_python (1).pptx for professional and student
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
annual-report-2024-2025 original latest.
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPT
ISS -ESG Data flows What is ESG and HowHow
PPT
Quality review (1)_presentation of this 21
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Introduction to Data Science and Data Analysis
PPT
Predictive modeling basics in data cleaning process
Miokarditis (Inflamasi pada Otot Jantung)
Computer network topology notes for revision
climate analysis of Dhaka ,Banglades.pptx
Introduction to the R Programming Language
Database Infoormation System (DBIS).pptx
modul_python (1).pptx for professional and student
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx
annual-report-2024-2025 original latest.
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
ISS -ESG Data flows What is ESG and HowHow
Quality review (1)_presentation of this 21
STUDY DESIGN details- Lt Col Maksud (21).pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Optimise Shopper Experiences with a Strong Data Estate.pdf
[EN] Industrial Machine Downtime Prediction
Qualitative Qantitative and Mixed Methods.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Introduction to Data Science and Data Analysis
Predictive modeling basics in data cleaning process

PythonとAutoML at PyConJP 2019