SlideShare a Scribd company logo
Machine Learning for Web
Data
Hilary Mason
Web Directions USA 2010
Machine Learning for Web Data
= new capacities
(superpowers)
Machine learning is a way of
thinking about data.
Machine Learning for Web Data
http://www.meetup.com/NYC-Tech-Talks/calendar/12939544/
?from=list&offset=0
http://bit.ly/9N7VB1
6
wicked hard problem
10s of
millions of
URLs /day
100s of
millions of
events /
day
1000s of
millions of
Machine Learning for Web Data
Machine Learning for Web Data
Machine Learning for Web Data
@hmason
[archive photo]
ELIZA
Machine Learning for Web Data
Machine Learning for Web Data
Machine Learning for Web Data
ML Today
Algorithms +
On-demand computing +
Ubiquitous data
Algorithms
New frames for modeling the world with
data.
Machine Learning for Web Data
[moar data and new kinds of data]
Examples
[spam filters]
[netflix movie recommendations]
Language Identification
Face Identification
Machine Learning
Supervised Learning
Vs
Unsupervised Learning
Clustering
immunity
ultrasound
medical
imaging
medical
devices
thermoelectric
devices
fault-tolerant
circuits
low power
devices
Entity disambiguation
This is important.
ME
UGLY HAG
Entity disambiguation
This is important.
Company disambiguation is a very common
problem – Are “Microsoft”, “Microsoft
Corporation”, and “MS” the same
company?
Classification
classification
Text
Feature
Extractor
Trained
Classifier
Cats
Dogs
Fire
Training
Data
Feature
Extractor
<math>
Probability
P(A) is the probability that A is true.
Axioms of Probability
0 ≤ P(A) ≤ 1
P(True) = 1
P(False) = 0
P(A or B) = P(A) + P(B) – P(A and B)
P(A or B) = P(A) + P(B) – P(A and B)
P(A)
P(B)
P(A and B)
Bayes Law
Example
There are
10,000 people.
1% have a rare
disease.
Example
• Population of 10,000
• 1% have rare disease
• There’s a test that is 99% effective.
– 99% of sick patients test positive
– 99% of healthy patients test negative
Given a positive test result, what is the probability
that the patient is sick?
Machine Learning for Web Data
Disease Diagnosis
99 sick patients test positive, 99 healthy
patients test positive
Given a positive test, there is a 50%
probability that the patient is sick.
Bayesian Disease
Know the prob. of testing sick given healthy,
and healthy given sick
Use Bayes theorem to invert probabilities
</math>
Obtain
Scrub
Explore
Model
iNterpret
Machine Learning for Web Data
1. Obtain Data
“pointing and clicking does not scale!”
http://www.delicious.com/pskomoroch/dataset
lynx –dump
http://www.nytimes.com
Lynx: http://bit.ly/a6Pumm
2. Scrub
Machine Learning for Web Data
3. Explore
http://vis.stanford.edu/protovis/
4. Model
Google Prediction API
http://code.google.com/apis/predict/
4. Model
Python
• NLTK - http://www.nltk.org/
• Scikits Learn - http://scikit-
learn.sourceforge.net/
4. Model
http://www.alchemyapi.com/
5. Interpret
Andrew Vande Moore – Visual Poetry 06
http://www.dataists.com
One Final Example
Twitter is full of noise.
Sports – down
Math – UP!
Narcissism - down
Code!
Filtering & Relevance Ordering
http://github.com/hmason/tc
What’s next?
Soon:
Natural Language Generation
Rich media classification
Contextual everything
Algorithms-As-A-Service
infer links in data
Filtering
Relevance
h@bit.ly @hmason
Thank you!
Ad

Recommended

Women in Tech, Orlando Tech Week 2016
Women in Tech, Orlando Tech Week 2016
Cassandra Wilcox
 
Avoiding Machine Learning Pitfalls 2-10-18
Avoiding Machine Learning Pitfalls 2-10-18
Dan Elton
 
Machine Learning Pitfalls
Machine Learning Pitfalls
Dan Elton
 
Margaret Hamilton
Margaret Hamilton
Helena Vayna
 
Democratization of Communication
Democratization of Communication
Oscar Berg
 
TEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of Work
Volker Hirsch
 
Short and Long of Data Driven Innovation
Short and Long of Data Driven Innovation
David De Roure
 
Emerging Technologies: Outlooks, Problems, and Challenges - NYSTL - 13_0523
Emerging Technologies: Outlooks, Problems, and Challenges - NYSTL - 13_0523
jeffreylancaster
 
intro to ML by the way m toh phasee movie Punjabi
intro to ML by the way m toh phasee movie Punjabi
botvillain45
 
Azure ML - November 2014
Azure ML - November 2014
David Green
 
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Madhav Mishra
 
Introduction to machine learning-2023-IT-AI and DS.pdf
Introduction to machine learning-2023-IT-AI and DS.pdf
SisayNegash4
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
Marina Santini
 
L1_Introduction - part 1.pdf
L1_Introduction - part 1.pdf
zakria8
 
Introduction to Machine Learning
Introduction to Machine Learning
SSSSSS354882
 
Azure Machine Learning
Azure Machine Learning
Mostafa
 
notes as .ppt
notes as .ppt
butest
 
Advanced Machine Learning- Introduction to Machine Learning
Advanced Machine Learning- Introduction to Machine Learning
Alamelu
 
Machine Learning: What Assurance Professionals Need to Know
Machine Learning: What Assurance Professionals Need to Know
Andrew Clark
 
Machine learning
Machine learning
Sandeep Singh
 
Le Machine Learning de A à Z
Le Machine Learning de A à Z
Alexia Audevart
 
Chapter01 introductory handbook
Chapter01 introductory handbook
Raman Kannan
 
Building Powerful and Intelligent Applications with Azure Machine Learning
Building Powerful and Intelligent Applications with Azure Machine Learning
David Walker, CSM,CSD,MCP,MCAD,MCSD,MVP
 
1 -1 Introduction to machine learning.pptx
1 -1 Introduction to machine learning.pptx
LoyisoArnoldJesusFav
 
Azure Machine Learning 101
Azure Machine Learning 101
Renato Jovic
 
Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015
antimo musone
 
Week 1.pdf
Week 1.pdf
AnjaliJain608033
 
machine learning
machine learning
soundaryasarya
 
Grace Hopper Conference Opening Keynote
Grace Hopper Conference Opening Keynote
Hilary Mason
 
Short URLs, Big Fun
Short URLs, Big Fun
Hilary Mason
 

More Related Content

Similar to Machine Learning for Web Data (20)

intro to ML by the way m toh phasee movie Punjabi
intro to ML by the way m toh phasee movie Punjabi
botvillain45
 
Azure ML - November 2014
Azure ML - November 2014
David Green
 
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Madhav Mishra
 
Introduction to machine learning-2023-IT-AI and DS.pdf
Introduction to machine learning-2023-IT-AI and DS.pdf
SisayNegash4
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
Marina Santini
 
L1_Introduction - part 1.pdf
L1_Introduction - part 1.pdf
zakria8
 
Introduction to Machine Learning
Introduction to Machine Learning
SSSSSS354882
 
Azure Machine Learning
Azure Machine Learning
Mostafa
 
notes as .ppt
notes as .ppt
butest
 
Advanced Machine Learning- Introduction to Machine Learning
Advanced Machine Learning- Introduction to Machine Learning
Alamelu
 
Machine Learning: What Assurance Professionals Need to Know
Machine Learning: What Assurance Professionals Need to Know
Andrew Clark
 
Machine learning
Machine learning
Sandeep Singh
 
Le Machine Learning de A à Z
Le Machine Learning de A à Z
Alexia Audevart
 
Chapter01 introductory handbook
Chapter01 introductory handbook
Raman Kannan
 
Building Powerful and Intelligent Applications with Azure Machine Learning
Building Powerful and Intelligent Applications with Azure Machine Learning
David Walker, CSM,CSD,MCP,MCAD,MCSD,MVP
 
1 -1 Introduction to machine learning.pptx
1 -1 Introduction to machine learning.pptx
LoyisoArnoldJesusFav
 
Azure Machine Learning 101
Azure Machine Learning 101
Renato Jovic
 
Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015
antimo musone
 
Week 1.pdf
Week 1.pdf
AnjaliJain608033
 
machine learning
machine learning
soundaryasarya
 
intro to ML by the way m toh phasee movie Punjabi
intro to ML by the way m toh phasee movie Punjabi
botvillain45
 
Azure ML - November 2014
Azure ML - November 2014
David Green
 
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Madhav Mishra
 
Introduction to machine learning-2023-IT-AI and DS.pdf
Introduction to machine learning-2023-IT-AI and DS.pdf
SisayNegash4
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
Marina Santini
 
L1_Introduction - part 1.pdf
L1_Introduction - part 1.pdf
zakria8
 
Introduction to Machine Learning
Introduction to Machine Learning
SSSSSS354882
 
Azure Machine Learning
Azure Machine Learning
Mostafa
 
notes as .ppt
notes as .ppt
butest
 
Advanced Machine Learning- Introduction to Machine Learning
Advanced Machine Learning- Introduction to Machine Learning
Alamelu
 
Machine Learning: What Assurance Professionals Need to Know
Machine Learning: What Assurance Professionals Need to Know
Andrew Clark
 
Le Machine Learning de A à Z
Le Machine Learning de A à Z
Alexia Audevart
 
Chapter01 introductory handbook
Chapter01 introductory handbook
Raman Kannan
 
Building Powerful and Intelligent Applications with Azure Machine Learning
Building Powerful and Intelligent Applications with Azure Machine Learning
David Walker, CSM,CSD,MCP,MCAD,MCSD,MVP
 
1 -1 Introduction to machine learning.pptx
1 -1 Introduction to machine learning.pptx
LoyisoArnoldJesusFav
 
Azure Machine Learning 101
Azure Machine Learning 101
Renato Jovic
 
Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015
antimo musone
 

More from Hilary Mason (12)

Grace Hopper Conference Opening Keynote
Grace Hopper Conference Opening Keynote
Hilary Mason
 
Short URLs, Big Fun
Short URLs, Big Fun
Hilary Mason
 
Strata NY Sep 2011: Big Data, Short URLs: Learning in Realtime
Strata NY Sep 2011: Big Data, Short URLs: Learning in Realtime
Hilary Mason
 
PyCon 2011 Keynote
PyCon 2011 Keynote
Hilary Mason
 
A Data-driven Look at the Realtime Web
A Data-driven Look at the Realtime Web
Hilary Mason
 
IgniteNYC: How to Replace Yourself With a Very Small Shell Script
IgniteNYC: How to Replace Yourself With a Very Small Shell Script
Hilary Mason
 
Practical Data Analysis in Python
Practical Data Analysis in Python
Hilary Mason
 
Have data? What now?!
Have data? What now?!
Hilary Mason
 
JWU Guest Talk: JavaScript and AJAX
JWU Guest Talk: JavaScript and AJAX
Hilary Mason
 
Analytics for Virtual Worlds
Analytics for Virtual Worlds
Hilary Mason
 
Experiential Learning in Second Life
Experiential Learning in Second Life
Hilary Mason
 
Virtual Worlds in Education
Virtual Worlds in Education
Hilary Mason
 
Grace Hopper Conference Opening Keynote
Grace Hopper Conference Opening Keynote
Hilary Mason
 
Short URLs, Big Fun
Short URLs, Big Fun
Hilary Mason
 
Strata NY Sep 2011: Big Data, Short URLs: Learning in Realtime
Strata NY Sep 2011: Big Data, Short URLs: Learning in Realtime
Hilary Mason
 
PyCon 2011 Keynote
PyCon 2011 Keynote
Hilary Mason
 
A Data-driven Look at the Realtime Web
A Data-driven Look at the Realtime Web
Hilary Mason
 
IgniteNYC: How to Replace Yourself With a Very Small Shell Script
IgniteNYC: How to Replace Yourself With a Very Small Shell Script
Hilary Mason
 
Practical Data Analysis in Python
Practical Data Analysis in Python
Hilary Mason
 
Have data? What now?!
Have data? What now?!
Hilary Mason
 
JWU Guest Talk: JavaScript and AJAX
JWU Guest Talk: JavaScript and AJAX
Hilary Mason
 
Analytics for Virtual Worlds
Analytics for Virtual Worlds
Hilary Mason
 
Experiential Learning in Second Life
Experiential Learning in Second Life
Hilary Mason
 
Virtual Worlds in Education
Virtual Worlds in Education
Hilary Mason
 
Ad

Recently uploaded (20)

OWASP Barcelona 2025 Threat Model Library
OWASP Barcelona 2025 Threat Model Library
PetraVukmirovic
 
Security Tips for Enterprise Azure Solutions
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
digitaljignect
 
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
 
Python Conference Singapore - 19 Jun 2025
Python Conference Singapore - 19 Jun 2025
ninefyi
 
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Josef Weingand
 
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
Fwdays
 
Curietech AI in action - Accelerate MuleSoft development
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
 
Securing AI - There Is No Try, Only Do!.pdf
Securing AI - There Is No Try, Only Do!.pdf
Priyanka Aash
 
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
Fwdays
 
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
Safe Software
 
10 Key Challenges for AI within the EU Data Protection Framework.pdf
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
 
UserCon Belgium: Honey, VMware increased my bill
UserCon Belgium: Honey, VMware increased my bill
stijn40
 
The Future of Product Management in AI ERA.pdf
The Future of Product Management in AI ERA.pdf
Alyona Owens
 
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
 
PyCon SG 25 - Firecracker Made Easy with Python.pdf
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
 
Cyber Defense Matrix Workshop - RSA Conference
Cyber Defense Matrix Workshop - RSA Conference
Priyanka Aash
 
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
From Manual to Auto Searching- FME in the Driver's Seat
From Manual to Auto Searching- FME in the Driver's Seat
Safe Software
 
OWASP Barcelona 2025 Threat Model Library
OWASP Barcelona 2025 Threat Model Library
PetraVukmirovic
 
Security Tips for Enterprise Azure Solutions
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
digitaljignect
 
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
 
Python Conference Singapore - 19 Jun 2025
Python Conference Singapore - 19 Jun 2025
ninefyi
 
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Josef Weingand
 
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
Fwdays
 
Curietech AI in action - Accelerate MuleSoft development
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
 
Securing AI - There Is No Try, Only Do!.pdf
Securing AI - There Is No Try, Only Do!.pdf
Priyanka Aash
 
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
Fwdays
 
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
Safe Software
 
10 Key Challenges for AI within the EU Data Protection Framework.pdf
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
 
UserCon Belgium: Honey, VMware increased my bill
UserCon Belgium: Honey, VMware increased my bill
stijn40
 
The Future of Product Management in AI ERA.pdf
The Future of Product Management in AI ERA.pdf
Alyona Owens
 
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
 
PyCon SG 25 - Firecracker Made Easy with Python.pdf
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
 
Cyber Defense Matrix Workshop - RSA Conference
Cyber Defense Matrix Workshop - RSA Conference
Priyanka Aash
 
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
From Manual to Auto Searching- FME in the Driver's Seat
From Manual to Auto Searching- FME in the Driver's Seat
Safe Software
 
Ad

Machine Learning for Web Data

Editor's Notes

  • #16: Sad puppy.
  • #25: The netflix prize was $1 million for a 10% increase in accuracy. Just 10%!!
  • #37: P(A) is the fraction of possible universes in which A is true.