SlideShare a Scribd company logo
GBM	
  &	
  Random	
  Forest	
  in	
  H2O	
  
Mark	
  Landry	
  
Presenta6on	
  Outline	
  
•  Algorithm	
  Background	
  
o Decision	
  Trees	
  
o Random	
  Forest	
  
o Gradient	
  Boosted	
  Machines	
  (GBM)	
  
•  H2O	
  ImplementaCons	
  
o Code	
  examples	
  
o DescripCon	
  of	
  parameters	
  and	
  general	
  usage	
  
Decision	
  Trees:	
  Concept	
  
•  Separate	
  the	
  data	
  
according	
  to	
  a	
  series	
  of	
  
quesCons	
  
o  Age	
  >	
  9.5?	
  
•  The	
  quesCons	
  are	
  found	
  
automaCcally	
  to	
  
opCmize	
  separaCon	
  of	
  
the	
  data	
  point	
  by	
  the	
  
“target”	
  
Source: wikimedia CART tree Titanic survivors
Example decision tree:
Predicting survival of Titanic passengers
Decision	
  Trees:	
  Prac6cal	
  Use	
  
•  Non	
  linear	
  
•  Robust	
  to	
  correlated	
  
features	
  
•  Robust	
  to	
  feature	
  
distribuCons	
  
•  Robust	
  to	
  missing	
  
values	
  
•  Simple	
  to	
  comprehend	
  
•  Fast	
  to	
  train	
  
•  Fast	
  to	
  score	
  
•  Poor	
  accuracy	
  
•  Cannot	
  project	
  
•  Inefficiently	
  fits	
  linear	
  
relaConships	
  
WeaknessesStrengths
Improved	
  Decision	
  Trees:	
  Ensembles	
  
•  Bootstrap	
  aggregaCon	
  
(bagging)	
  
•  Fit	
  many	
  trees	
  against	
  
different	
  samples	
  of	
  the	
  
data	
  and	
  average	
  
together	
  
•  BoosCng	
  
•  Fits	
  consecuCve	
  trees	
  
where	
  each	
  solves	
  for	
  
the	
  net	
  error	
  of	
  the	
  
prior	
  trees	
  	
  
GBMRandom Forest
Random	
  Forest	
  
•  Combine	
  mulCple	
  
decision	
  trees,	
  each	
  fit	
  
to	
  a	
  random	
  sample	
  of	
  
the	
  original	
  data	
  
•  Randomly	
  samples	
  	
  
o  Rows	
  
o  Columns	
  
•  Reduce	
  variance,	
  with	
  
minimal	
  increase	
  in	
  bias	
  
•  Strengths	
  
o  Easy	
  to	
  use	
  
•  Few	
  parameters	
  
•  Well-­‐established	
  default	
  
values	
  for	
  parameters	
  	
  
o  Robust	
  
o  CompeCCve	
  accuracy	
  on	
  
most	
  data	
  sets	
  
•  Weaknesses	
  
o  Slow	
  to	
  score	
  
o  Lack	
  of	
  transparency	
  
PracticalConceptual
Gradient	
  Boosted	
  Machines	
  (GBM)	
  
•  BoosCng:	
  ensemble	
  of	
  
weak	
  learners*	
  
•  Fits	
  consecuCve	
  trees	
  
where	
  each	
  solves	
  for	
  the	
  
net	
  loss	
  of	
  the	
  prior	
  trees	
  
•  Results	
  of	
  new	
  trees	
  are	
  
applied	
  parCally	
  to	
  the	
  
enCre	
  soluCon	
  
•  Strengths	
  
o  O`en	
  best	
  possible	
  model	
  
o  Robust	
  
o  Directly	
  opCmizes	
  cost	
  
funcCon	
  
•  Weaknesses	
  
o  Overfits	
  
•  Need	
  to	
  find	
  proper	
  
stopping	
  point	
  
o  SensiCve	
  to	
  noise	
  and	
  
extreme	
  values	
  
o  Several	
  hyper-­‐parameters	
  
o  Lack	
  of	
  transparency	
  
PracticalConceptual
* the notion of “weak” is being challenged
in practice
Trees	
  in	
  H2O	
  
•  Individual	
  tree	
  fiang	
  is	
  performed	
  in	
  parallel	
  
•  Shared	
  histograms	
  calculate	
  cut-­‐points	
  	
  
•  Greedy	
  search	
  of	
  histogram	
  bins,	
  opCmizing	
  
squared	
  error	
  
Explore	
  Further	
  through	
  Examples	
  
I have H2O
Installed
I have R
installed
I have the
H2O World
data sets

More Related Content

PPTX
GBM package in r
PPTX
Modern classification techniques
PDF
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
PPTX
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
PDF
Data Wrangling For Kaggle Data Science Competitions
PPTX
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
PPTX
Gbm.more GBM in H2O
PDF
Kaggle Higgs Boson Machine Learning Challenge
GBM package in r
Modern classification techniques
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Data Wrangling For Kaggle Data Science Competitions
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Gbm.more GBM in H2O
Kaggle Higgs Boson Machine Learning Challenge

What's hot (20)

PDF
Generalized Linear Models with H2O
PDF
Winning Kaggle 101: Introduction to Stacking
PDF
Feature Engineering
PPTX
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
PDF
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
PDF
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
PPTX
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
PDF
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
PPTX
Ppt shuai
PDF
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
PDF
XGBoost: the algorithm that wins every competition
PPTX
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
PPTX
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
PDF
Kaggle presentation
PPTX
Java/Scala Lab 2016. Сергей Моренец: Способы повышения эффективности в Java 8.
PDF
AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использо...
PDF
Junhua wang ai_next_con
PDF
5 Coding Hacks to Reduce GC Overhead
PPTX
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017
PDF
GLM & GBM in H2O
Generalized Linear Models with H2O
Winning Kaggle 101: Introduction to Stacking
Feature Engineering
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Ppt shuai
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
XGBoost: the algorithm that wins every competition
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Kaggle presentation
Java/Scala Lab 2016. Сергей Моренец: Способы повышения эффективности в Java 8.
AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использо...
Junhua wang ai_next_con
5 Coding Hacks to Reduce GC Overhead
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017
GLM & GBM in H2O
Ad

Viewers also liked (20)

PDF
PDF
Higgs Boson Machine Learning Challenge - Kaggle
PPTX
classification_methods-logistic regression Machine Learning
PDF
Forecasting P2P Credit Risk based on Lending Club data
PDF
Consumer Credit Scoring Using Logistic Regression and Random Forest
PPTX
Logistic regression with low event rate (rare events)
PPTX
Estimation of the probability of default : Credit Rish
PPTX
Improve Your Regression with CART and RandomForests
PPTX
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
PDF
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
PPTX
Quick Tour of Text Mining
PDF
Tree models with Scikit-Learn: Great models with little assumptions
PDF
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
PDF
陳宜欣/大數據下的情緒分析
PDF
Building Random Forest at Scale
PDF
Intro to Classification: Logistic Regression & SVM
PPTX
Introduction to Modeling
PDF
Understanding Random Forests: From Theory to Practice
PDF
Model building in credit card and loan approval
PDF
給軟體工程師的不廢話 R 語言精要班
Higgs Boson Machine Learning Challenge - Kaggle
classification_methods-logistic regression Machine Learning
Forecasting P2P Credit Risk based on Lending Club data
Consumer Credit Scoring Using Logistic Regression and Random Forest
Logistic regression with low event rate (rare events)
Estimation of the probability of default : Credit Rish
Improve Your Regression with CART and RandomForests
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
Quick Tour of Text Mining
Tree models with Scikit-Learn: Great models with little assumptions
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
陳宜欣/大數據下的情緒分析
Building Random Forest at Scale
Intro to Classification: Logistic Regression & SVM
Introduction to Modeling
Understanding Random Forests: From Theory to Practice
Model building in credit card and loan approval
給軟體工程師的不廢話 R 語言精要班
Ad

Similar to H2O World - GBM and Random Forest in H2O- Mark Landry (20)

PPTX
Performance Issue? Machine Learning to the rescue!
PDF
To bag, or to boost? A question of balance
PPTX
Parallel programming in .NET
PDF
Building Big Data Streaming Architectures
PPTX
Big Data (NJ SQL Server User Group)
PDF
Memory efficient java tutorial practices and challenges
PPTX
Comparitive Analysis .pptx Footprinting, Enumeration, Scanning, Sniffing, Soc...
PDF
Sc12 workshop-writeup
PDF
Lessons learned
PPTX
Big Data Platforms: An Overview
KEY
MongoDB Case Study at NoSQL Now 2012
PPTX
Making powerful science: an introduction to NGS data analysis
PPTX
2013 py con awesome big data algorithms
PDF
Distance-based bias in model-directed optimization of additively decomposable...
PDF
FP Days: Down the Clojure Rabbit Hole
KEY
Writing Scalable Software in Java
PPTX
Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...
PPTX
How Machine Learning Helps Organizations to Work More Efficiently?
PPTX
Footprinting, Enumeration, Scanning, Sniffing, Social Engineering
KEY
Leveraging MongoDB: An Introductory Case Study
Performance Issue? Machine Learning to the rescue!
To bag, or to boost? A question of balance
Parallel programming in .NET
Building Big Data Streaming Architectures
Big Data (NJ SQL Server User Group)
Memory efficient java tutorial practices and challenges
Comparitive Analysis .pptx Footprinting, Enumeration, Scanning, Sniffing, Soc...
Sc12 workshop-writeup
Lessons learned
Big Data Platforms: An Overview
MongoDB Case Study at NoSQL Now 2012
Making powerful science: an introduction to NGS data analysis
2013 py con awesome big data algorithms
Distance-based bias in model-directed optimization of additively decomposable...
FP Days: Down the Clojure Rabbit Hole
Writing Scalable Software in Java
Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...
How Machine Learning Helps Organizations to Work More Efficiently?
Footprinting, Enumeration, Scanning, Sniffing, Social Engineering
Leveraging MongoDB: An Introductory Case Study

More from Sri Ambati (20)

PDF
H2O Label Genie Starter Track - Support Presentation
PDF
H2O.ai Agents : From Theory to Practice - Support Presentation
PDF
H2O Generative AI Starter Track - Support Presentation Slides.pdf
PDF
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
PDF
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
PDF
Intro to Enterprise h2oGPTe Presentation Slides
PDF
Enterprise h2o GPTe Learning Path Slide Deck
PDF
H2O Wave Course Starter - Presentation Slides
PDF
Large Language Models (LLMs) - Level 3 Slides
PDF
Data Science and Machine Learning Platforms (2024) Slides
PDF
Data Prep for H2O Driverless AI - Slides
PDF
H2O Cloud AI Developer Services - Slides (2024)
PDF
LLM Learning Path Level 2 - Presentation Slides
PDF
LLM Learning Path Level 1 - Presentation Slides
PDF
Hydrogen Torch - Starter Course - Presentation Slides
PDF
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
PDF
H2O Driverless AI Starter Course - Slides and Assignments
PPTX
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
PDF
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
PPTX
Generative AI Masterclass - Model Risk Management.pptx
H2O Label Genie Starter Track - Support Presentation
H2O.ai Agents : From Theory to Practice - Support Presentation
H2O Generative AI Starter Track - Support Presentation Slides.pdf
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
Intro to Enterprise h2oGPTe Presentation Slides
Enterprise h2o GPTe Learning Path Slide Deck
H2O Wave Course Starter - Presentation Slides
Large Language Models (LLMs) - Level 3 Slides
Data Science and Machine Learning Platforms (2024) Slides
Data Prep for H2O Driverless AI - Slides
H2O Cloud AI Developer Services - Slides (2024)
LLM Learning Path Level 2 - Presentation Slides
LLM Learning Path Level 1 - Presentation Slides
Hydrogen Torch - Starter Course - Presentation Slides
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
H2O Driverless AI Starter Course - Slides and Assignments
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Generative AI Masterclass - Model Risk Management.pptx

Recently uploaded (20)

PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
Transform Your Business with a Software ERP System
PDF
Nekopoi APK 2025 free lastest update
PDF
medical staffing services at VALiNTRY
DOCX
Greta — No-Code AI for Building Full-Stack Web & Mobile Apps
PDF
iTop VPN 6.5.0 Crack + License Key 2025 (Premium Version)
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Salesforce Agentforce AI Implementation.pdf
PDF
Download FL Studio Crack Latest version 2025 ?
PPTX
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PPTX
assetexplorer- product-overview - presentation
PPTX
Monitoring Stack: Grafana, Loki & Promtail
PPTX
Advanced SystemCare Ultimate Crack + Portable (2025)
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
AutoCAD Professional Crack 2025 With License Key
PDF
Complete Guide to Website Development in Malaysia for SMEs
PDF
Cost to Outsource Software Development in 2025
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Transform Your Business with a Software ERP System
Nekopoi APK 2025 free lastest update
medical staffing services at VALiNTRY
Greta — No-Code AI for Building Full-Stack Web & Mobile Apps
iTop VPN 6.5.0 Crack + License Key 2025 (Premium Version)
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Salesforce Agentforce AI Implementation.pdf
Download FL Studio Crack Latest version 2025 ?
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
wealthsignaloriginal-com-DS-text-... (1).pdf
assetexplorer- product-overview - presentation
Monitoring Stack: Grafana, Loki & Promtail
Advanced SystemCare Ultimate Crack + Portable (2025)
Navsoft: AI-Powered Business Solutions & Custom Software Development
Design an Analysis of Algorithms I-SECS-1021-03
AutoCAD Professional Crack 2025 With License Key
Complete Guide to Website Development in Malaysia for SMEs
Cost to Outsource Software Development in 2025
Adobe Illustrator 28.6 Crack My Vision of Vector Design

H2O World - GBM and Random Forest in H2O- Mark Landry

  • 1. GBM  &  Random  Forest  in  H2O   Mark  Landry  
  • 2. Presenta6on  Outline   •  Algorithm  Background   o Decision  Trees   o Random  Forest   o Gradient  Boosted  Machines  (GBM)   •  H2O  ImplementaCons   o Code  examples   o DescripCon  of  parameters  and  general  usage  
  • 3. Decision  Trees:  Concept   •  Separate  the  data   according  to  a  series  of   quesCons   o  Age  >  9.5?   •  The  quesCons  are  found   automaCcally  to   opCmize  separaCon  of   the  data  point  by  the   “target”   Source: wikimedia CART tree Titanic survivors Example decision tree: Predicting survival of Titanic passengers
  • 4. Decision  Trees:  Prac6cal  Use   •  Non  linear   •  Robust  to  correlated   features   •  Robust  to  feature   distribuCons   •  Robust  to  missing   values   •  Simple  to  comprehend   •  Fast  to  train   •  Fast  to  score   •  Poor  accuracy   •  Cannot  project   •  Inefficiently  fits  linear   relaConships   WeaknessesStrengths
  • 5. Improved  Decision  Trees:  Ensembles   •  Bootstrap  aggregaCon   (bagging)   •  Fit  many  trees  against   different  samples  of  the   data  and  average   together   •  BoosCng   •  Fits  consecuCve  trees   where  each  solves  for   the  net  error  of  the   prior  trees     GBMRandom Forest
  • 6. Random  Forest   •  Combine  mulCple   decision  trees,  each  fit   to  a  random  sample  of   the  original  data   •  Randomly  samples     o  Rows   o  Columns   •  Reduce  variance,  with   minimal  increase  in  bias   •  Strengths   o  Easy  to  use   •  Few  parameters   •  Well-­‐established  default   values  for  parameters     o  Robust   o  CompeCCve  accuracy  on   most  data  sets   •  Weaknesses   o  Slow  to  score   o  Lack  of  transparency   PracticalConceptual
  • 7. Gradient  Boosted  Machines  (GBM)   •  BoosCng:  ensemble  of   weak  learners*   •  Fits  consecuCve  trees   where  each  solves  for  the   net  loss  of  the  prior  trees   •  Results  of  new  trees  are   applied  parCally  to  the   enCre  soluCon   •  Strengths   o  O`en  best  possible  model   o  Robust   o  Directly  opCmizes  cost   funcCon   •  Weaknesses   o  Overfits   •  Need  to  find  proper   stopping  point   o  SensiCve  to  noise  and   extreme  values   o  Several  hyper-­‐parameters   o  Lack  of  transparency   PracticalConceptual * the notion of “weak” is being challenged in practice
  • 8. Trees  in  H2O   •  Individual  tree  fiang  is  performed  in  parallel   •  Shared  histograms  calculate  cut-­‐points     •  Greedy  search  of  histogram  bins,  opCmizing   squared  error  
  • 9. Explore  Further  through  Examples   I have H2O Installed I have R installed I have the H2O World data sets