SlideShare a Scribd company logo
Data Stream Mining
George Tzinos
Introduction
▪ Large amount of data streams every day.
▪ Efficient knowledge discovery of such data streams is an
emerging active research area in data mining with broad
applications.
▪ Data streams typically arrive continuously in high speed with huge
amount and changing data distribution.
▪ New issues that need to be considered.
▪ Data mining techniques which require multiple scans of the entire
data sets can not be applied directly to mine stream data, which
usually allows only one scan and demands fast response time
2
3
Network traffic
Sensor data
Call center records
Applications
Requirements
1. Process an example at a time, and inspect it only once (at most)
2. Use a limited amount of memory
3. Work in a limited amount of time
4. Be ready to predict at any point
4
Requirements
5
Requirements
6
Traditional Techniques vs Stream
7
Traditional Stream
No. of passes Multiple Single
Processing time Unlimited Restricted
Memory usage Unlimited Restricted
Type of result Accurate Approximate
Basic Techniques
8
▪ Sampling
▪ Load shedding
▪ Sketching
▪ Synopsis data structures
▪ Aggregation
Forgetting mechanisms
9
▪ Should be able to react to the changing concept by forgetting
outdated data, while learning new class descriptions
▪ How to select the data range to remember
Utilization of time and space
10
▪ Sliding Window
▪ Algorithm Output Granularity (AOG)
Windowing techniques - 1
11
▪ The most popular approach to dealing with time changing data
involves the use of sliding windows.
▪ Windows provide a way of limiting the amount of examples
introduced to the learner
▪ Eliminating those data points that come from an old concept.
Windowing techniques - 2
12
Windowing techniques - 3 (Fixed Window)
13
▪ Each example updates the window and later the classifier is
updated by that window.
▪ In the simplest approach sliding windows are of fixed size
▪ Include only the most recent examples from the data stream.
▪ With each new data point the oldest example that does not fit in
the window is thrown away.
▪ When using windows of fixed size, the user is caught in a tradeoff.
▪ If he chooses a small window size the classifier will react quickly
to changes, but may loose on accuracy in periods of stability
▪ Choosing a large size will result in increasing accuracy in periods of
stability, but will fail to adapt to rapidly changing concepts.
Windowing techniques - 4
14
▪ Weights:
▫ A simple way of making the forgetting process more
dynamic is providing the window with a decay function
that assigns a weight to each example.
▫ Older examples receive smaller weights and are treated
as less important by the base classifier.
▫ ( Maintaining time-decaying stream aggregates )
▪ FISH
▪ ADWIN
Classification in Data Steams
15
▪ Classification, learning a model in order to assign labels to new,
unlabeled data points is a well studied supervised machine
learning task.
▪ Methods include naive Bayes, k-nearest neighbors, classification
trees, support vector machines, rule-based classifiers and many
more (Hastie et al. 2001).
▪ However, as with clustering these algorithms need access to the
complete training data several times and thus are not suitable for
data streams with constantly arriving new training data and
concept drift.
Classification in Data Steams - 2
16
▪ Wang et al. proposed a general framework for mining concept
drifting data streams.
▪ Domingos et al., VFDT (Very Fast Decision Tree)
▪
Tools for Data Streams
17
▪ Scikit Learn (Out of core)
▪ MOA (Massive Online Analysis)
Refferences
18
▪ [1] Geoff Hulten et al, Mining Time-Changing Data Streams
▪ [2] Qin Zhang et al, Towards Mining Trapezoidal Data Streams
▪ [3] Neha Gupta, Indrjeet Rajput, Stream Data Mining: A Survey
▪ [4] Johns Hopkins, Data Stream Mining: A Review of Learning Methods and Frameworks
▪ [5] Jiawei Han et al, Data mining: Concepts and Techniques
▪ [6] Albert Bife et alt, DATA STREAM MINING A Practical Approach
▪ [7] Oded Maimon, Dr. Lior Rokach, Data Mining and Knowledge Discovery Handbook
▪ [8] Neha Gupta, Indrjeet Rajput, Stream Data Mining: A Survey, International Journal of Engineering
Research and Applications
▪ [9] Dariusz Brzeziński, MINING DATA STREAMS WITH CONCEPT DRIFT
THANKS!
Any questions?
19

More Related Content

PPT
01 Data Mining: Concepts and Techniques, 2nd ed.
PPTX
Hash function
PPTX
Hash Function
PPTX
Transfer learning-presentation
PDF
CRYPTOGRAPHY AND NETWORK SECURITY
PDF
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
PPTX
Cluster Analysis Introduction
01 Data Mining: Concepts and Techniques, 2nd ed.
Hash function
Hash Function
Transfer learning-presentation
CRYPTOGRAPHY AND NETWORK SECURITY
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Cluster Analysis Introduction

What's hot (20)

PPTX
Anomaly Detection Using Generative Adversarial Network(GAN)
PPTX
Kerberos
PPT
Knowledge discovery thru data mining
PDF
03 Analysis of Algorithms: Probabilistic Analysis
PPTX
Unsupervised learning (clustering)
PPTX
Synchronization - Election Algorithms
PDF
Deep learning seminar report
PDF
Big Data Ecosystem
PDF
Machine Learning for Fraud Detection
PPTX
Federated Learning
PDF
Intro to Deep Learning for Computer Vision
PPTX
Reinforcement Learning, Application and Q-Learning
PPTX
PPTX
Techniques Machine Learning
PPTX
MACHINE LEARNING - GENETIC ALGORITHM
PPTX
Deep Learning Explained
PDF
Machine Learning Strategies for Time Series Prediction
PPTX
Naive bayes
PPTX
Bloom filters
PPTX
Sms spam-detection
Anomaly Detection Using Generative Adversarial Network(GAN)
Kerberos
Knowledge discovery thru data mining
03 Analysis of Algorithms: Probabilistic Analysis
Unsupervised learning (clustering)
Synchronization - Election Algorithms
Deep learning seminar report
Big Data Ecosystem
Machine Learning for Fraud Detection
Federated Learning
Intro to Deep Learning for Computer Vision
Reinforcement Learning, Application and Q-Learning
Techniques Machine Learning
MACHINE LEARNING - GENETIC ALGORITHM
Deep Learning Explained
Machine Learning Strategies for Time Series Prediction
Naive bayes
Bloom filters
Sms spam-detection
Ad

Similar to Data stream mining (20)

PDF
Adaptive Stream Mining Pattern Learning And Mining From Evolving Data Streams...
PDF
Paper Title - Mining Techniques for Streaming Data
PDF
MINING TECHNIQUES FOR STREAMING DATA
PDF
MINING TECHNIQUES FOR STREAMING DATA
PPT
081.ppt
PDF
PDF
Aa31163168
PDF
IRJET- AC Duct Monitoring and Cleaning Vehicle for Train Coaches
PDF
IRJET- A Data Stream Mining Technique Dynamically Updating a Model with Dynam...
PPT
Chapter 08 Data Mining Techniques
PDF
Online machine learning in Streaming Applications
PPT
Jewei Hans & Kamber Chapter 8
PDF
1105.1950
PDF
Fn3110961103
PPT
data streammining and its applications.ppt
PPTX
Data Mining: Mining stream time series and sequence data
PPTX
Data Mining: Mining stream time series and sequence data
PDF
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
PPTX
Clustering for Stream and Parallelism (DATA ANALYTICS)
PDF
Adaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Stream Mining Pattern Learning And Mining From Evolving Data Streams...
Paper Title - Mining Techniques for Streaming Data
MINING TECHNIQUES FOR STREAMING DATA
MINING TECHNIQUES FOR STREAMING DATA
081.ppt
Aa31163168
IRJET- AC Duct Monitoring and Cleaning Vehicle for Train Coaches
IRJET- A Data Stream Mining Technique Dynamically Updating a Model with Dynam...
Chapter 08 Data Mining Techniques
Online machine learning in Streaming Applications
Jewei Hans & Kamber Chapter 8
1105.1950
Fn3110961103
data streammining and its applications.ppt
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
Clustering for Stream and Parallelism (DATA ANALYTICS)
Adaptive Learning and Mining for Data Streams and Frequent Patterns
Ad

Recently uploaded (20)

PPTX
Microbiology with diagram medical studies .pptx
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PDF
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
PPTX
famous lake in india and its disturibution and importance
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PDF
HPLC-PPT.docx high performance liquid chromatography
PPTX
2. Earth - The Living Planet earth and life
PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Microbiology with diagram medical studies .pptx
TOTAL hIP ARTHROPLASTY Presentation.pptx
The KM-GBF monitoring framework – status & key messages.pptx
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
famous lake in india and its disturibution and importance
lecture 2026 of Sjogren's syndrome l .pdf
HPLC-PPT.docx high performance liquid chromatography
2. Earth - The Living Planet earth and life
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Taita Taveta Laboratory Technician Workshop Presentation.pptx
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
2. Earth - The Living Planet Module 2ELS
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
Biophysics 2.pdffffffffffffffffffffffffff
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
neck nodes and dissection types and lymph nodes levels
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...

Data stream mining

  • 2. Introduction ▪ Large amount of data streams every day. ▪ Efficient knowledge discovery of such data streams is an emerging active research area in data mining with broad applications. ▪ Data streams typically arrive continuously in high speed with huge amount and changing data distribution. ▪ New issues that need to be considered. ▪ Data mining techniques which require multiple scans of the entire data sets can not be applied directly to mine stream data, which usually allows only one scan and demands fast response time 2
  • 3. 3 Network traffic Sensor data Call center records Applications
  • 4. Requirements 1. Process an example at a time, and inspect it only once (at most) 2. Use a limited amount of memory 3. Work in a limited amount of time 4. Be ready to predict at any point 4
  • 7. Traditional Techniques vs Stream 7 Traditional Stream No. of passes Multiple Single Processing time Unlimited Restricted Memory usage Unlimited Restricted Type of result Accurate Approximate
  • 8. Basic Techniques 8 ▪ Sampling ▪ Load shedding ▪ Sketching ▪ Synopsis data structures ▪ Aggregation
  • 9. Forgetting mechanisms 9 ▪ Should be able to react to the changing concept by forgetting outdated data, while learning new class descriptions ▪ How to select the data range to remember
  • 10. Utilization of time and space 10 ▪ Sliding Window ▪ Algorithm Output Granularity (AOG)
  • 11. Windowing techniques - 1 11 ▪ The most popular approach to dealing with time changing data involves the use of sliding windows. ▪ Windows provide a way of limiting the amount of examples introduced to the learner ▪ Eliminating those data points that come from an old concept.
  • 13. Windowing techniques - 3 (Fixed Window) 13 ▪ Each example updates the window and later the classifier is updated by that window. ▪ In the simplest approach sliding windows are of fixed size ▪ Include only the most recent examples from the data stream. ▪ With each new data point the oldest example that does not fit in the window is thrown away. ▪ When using windows of fixed size, the user is caught in a tradeoff. ▪ If he chooses a small window size the classifier will react quickly to changes, but may loose on accuracy in periods of stability ▪ Choosing a large size will result in increasing accuracy in periods of stability, but will fail to adapt to rapidly changing concepts.
  • 14. Windowing techniques - 4 14 ▪ Weights: ▫ A simple way of making the forgetting process more dynamic is providing the window with a decay function that assigns a weight to each example. ▫ Older examples receive smaller weights and are treated as less important by the base classifier. ▫ ( Maintaining time-decaying stream aggregates ) ▪ FISH ▪ ADWIN
  • 15. Classification in Data Steams 15 ▪ Classification, learning a model in order to assign labels to new, unlabeled data points is a well studied supervised machine learning task. ▪ Methods include naive Bayes, k-nearest neighbors, classification trees, support vector machines, rule-based classifiers and many more (Hastie et al. 2001). ▪ However, as with clustering these algorithms need access to the complete training data several times and thus are not suitable for data streams with constantly arriving new training data and concept drift.
  • 16. Classification in Data Steams - 2 16 ▪ Wang et al. proposed a general framework for mining concept drifting data streams. ▪ Domingos et al., VFDT (Very Fast Decision Tree) ▪
  • 17. Tools for Data Streams 17 ▪ Scikit Learn (Out of core) ▪ MOA (Massive Online Analysis)
  • 18. Refferences 18 ▪ [1] Geoff Hulten et al, Mining Time-Changing Data Streams ▪ [2] Qin Zhang et al, Towards Mining Trapezoidal Data Streams ▪ [3] Neha Gupta, Indrjeet Rajput, Stream Data Mining: A Survey ▪ [4] Johns Hopkins, Data Stream Mining: A Review of Learning Methods and Frameworks ▪ [5] Jiawei Han et al, Data mining: Concepts and Techniques ▪ [6] Albert Bife et alt, DATA STREAM MINING A Practical Approach ▪ [7] Oded Maimon, Dr. Lior Rokach, Data Mining and Knowledge Discovery Handbook ▪ [8] Neha Gupta, Indrjeet Rajput, Stream Data Mining: A Survey, International Journal of Engineering Research and Applications ▪ [9] Dariusz Brzeziński, MINING DATA STREAMS WITH CONCEPT DRIFT