SlideShare a Scribd company logo
Storage + Compute
Data Processing
Data Processing
1. Where is data stored ?
Data Processing
1. Where is data stored ?
2. Where does the compute run ?
Storage
Storage
● In Memory
Storage
● In Memory
● On Disk
Storage
● In Memory
● On Disk
○ File System
Storage
● In Memory
● On Disk
○ File System
■ Local file system - xfs , zfs , etc
Storage
● In Memory
● On Disk
○ File System
■ Local file system - xfs , zfs
■ Distributed File System
Storage
● In Memory
● On Disk
○ File System
■ Local file system - xfs , zfs
■ Distributed File System
● HDFS - Hadoop Distributed File System
● S3
● Ceph etc
H-Distributed-FS
H-Distributed-FS
Motivation:
H-Distributed-FS
Motivation:
● Parallel Processing of Data
H-Distributed-FS
Motivation:
● Parallel Processing of Data
○ When data is distributed, it can be processed in parallel *
* some problem statements are not a fit for this.
H-Distributed-FS
Motivation:
● Parallel Processing of Data
○ When data is distributed, it can be processed in parallel *
● Computation goes to data and not data to computation
* some problem statements are not a fit for this.
Word Count Problem
● Single file
Word Count Problem
● Single file
○ O(n) time
complexity
Word Count Problem
● Distributed data
Word Count Problem
● Distributed data
O(m) time complexity
‘m’ = size of largest file
Parallel Compute
Parallel Compute
● Parallel Computation on the data
Parallel Compute
● Parallel Computation on the data
● Computation goes to data & not data
to computation. Compute works over
the data local to it.
Parallel Compute
● Parallel Computation on the data
● Computation goes to data & not data
to computation. Compute works over
the data local to it.
● A final aggregation happens.
Parallel Compute
● Parallel Computation on the data
● Computation goes to data & not data
to computation. Compute works over
the data local to it.
● A final aggregation happens.
This Compute Paradigm is called
“MAP REDUCE framework”
Storage + Compute
Hdfs + MapReduce
Hdfs + MapReduce = Hadoop
Some Notes
● Hadoop Version 1 - has Map Reduce (MR) framework
● Hadoop Version 2 - has Yarn (resource scheduler)
● Apache Spark
○ Alternative to Map Reduce Compute framework
○ Can use Apache Spark with data in HDFS
○ Can run in Yarn with data in s3 ! (we just use yarn features
without using MR and HDFS features) :)
Hdfs + MapReduce = Hadoop
By Vishnu Rao
mash213.wordpress.com
linkedin.com/in/213vishnu

More Related Content

DOCX
Bigdata & Hadoop
PDF
If the Data Cannot Come To The Algorithm...
PDF
If the data cannot come to the algorithm...
PDF
PPTX
Frequent Itemset Mining on BigData
PPT
NBITSearch. Features.
PPTX
Introduction to Big Data processing (FGRE2016)
PPT
Object multifunctional indexing with an open API
Bigdata & Hadoop
If the Data Cannot Come To The Algorithm...
If the data cannot come to the algorithm...
Frequent Itemset Mining on BigData
NBITSearch. Features.
Introduction to Big Data processing (FGRE2016)
Object multifunctional indexing with an open API

What's hot (20)

PPTX
Your data isn't that big @ Big Things Meetup 2016-05-16
PPTX
Introduction to Big Data and hadoop
PPTX
Tech Talk - Underutilized Resources in Distributed System
PPTX
Hadoop development series(1)
ODP
My talk at Topconf.com conference, Tallinn, 1st of November 2012
PPTX
Intro to hadoop ecosystem
PDF
Big data quiz
PDF
Hadoop Case Studies in the Real World
PPTX
Microsoft on Big Data
PPTX
Bdam presentation on parquet
PPT
HDF5 Performance Enhancements with the Elimination of Unlimited Dimension
TXT
No sql
PPTX
Hadoop 2 cluster architecture
PDF
The Computer Science Behind a modern Distributed Database
PPTX
Mapreduce Tutorial
PPTX
Hadoop 1 vs hadoop2
PPT
Thrashing allocation frames.43
PPTX
Introduction to Big Data and Hadoop
PPTX
WELCOME TO BIG DATA TRANING
Your data isn't that big @ Big Things Meetup 2016-05-16
Introduction to Big Data and hadoop
Tech Talk - Underutilized Resources in Distributed System
Hadoop development series(1)
My talk at Topconf.com conference, Tallinn, 1st of November 2012
Intro to hadoop ecosystem
Big data quiz
Hadoop Case Studies in the Real World
Microsoft on Big Data
Bdam presentation on parquet
HDF5 Performance Enhancements with the Elimination of Unlimited Dimension
No sql
Hadoop 2 cluster architecture
The Computer Science Behind a modern Distributed Database
Mapreduce Tutorial
Hadoop 1 vs hadoop2
Thrashing allocation frames.43
Introduction to Big Data and Hadoop
WELCOME TO BIG DATA TRANING
Ad

Similar to simple introduction to hadoop (20)

PPTX
Big data processing system
PDF
Hadoop-2.6.0 Slides
PDF
Mr hadoop seedrocket
ODP
Hadoop seminar
PDF
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
PPTX
Hadoop introduction
PPTX
BW Tech Meetup: Hadoop and The rise of Big Data
PPTX
Bw tech hadoop
PPTX
Hadoop: The elephant in the room
PPTX
Apache Hive for modern DBAs
PDF
Big Data Architecture and Deployment
PDF
Cisco connect toronto 2015 big data sean mc keown
PPT
Apache hadoop, hdfs and map reduce Overview
PDF
Hadoop distributed computing framework for big data
PPTX
2012 apache hadoop_map_reduce_windows_azure
PDF
9/2017 STL HUG - Back to School
PDF
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
PDF
Hadoop scalability
PPTX
Big Data Unit 4 - Hadoop
PPTX
CLOUD_COMPUTING_MODULE4_RK_BIG_DATA.pptx
Big data processing system
Hadoop-2.6.0 Slides
Mr hadoop seedrocket
Hadoop seminar
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop introduction
BW Tech Meetup: Hadoop and The rise of Big Data
Bw tech hadoop
Hadoop: The elephant in the room
Apache Hive for modern DBAs
Big Data Architecture and Deployment
Cisco connect toronto 2015 big data sean mc keown
Apache hadoop, hdfs and map reduce Overview
Hadoop distributed computing framework for big data
2012 apache hadoop_map_reduce_windows_azure
9/2017 STL HUG - Back to School
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop scalability
Big Data Unit 4 - Hadoop
CLOUD_COMPUTING_MODULE4_RK_BIG_DATA.pptx
Ad

More from vishnu rao (11)

PDF
Assessing Data Pipeline Quality & Sanity with Data Angiograms.pdf
PDF
A talk on mysql & aurora
PDF
Introduction to Apache Kafka
PDF
Mysql Relay log - the unsung hero
PPTX
Druid beginner performance tips
PDF
Demystifying datastores
PDF
Visualising Basic Concepts of Docker
PDF
StormWars - when the data stream shrinks
PDF
Punch clock for debugging apache storm
PDF
a wild Supposition: can MySQL be Kafka ?
PDF
Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...
Assessing Data Pipeline Quality & Sanity with Data Angiograms.pdf
A talk on mysql & aurora
Introduction to Apache Kafka
Mysql Relay log - the unsung hero
Druid beginner performance tips
Demystifying datastores
Visualising Basic Concepts of Docker
StormWars - when the data stream shrinks
Punch clock for debugging apache storm
a wild Supposition: can MySQL be Kafka ?
Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
Spectroscopy.pptx food analysis technology
PPT
Teaching material agriculture food technology
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Electronic commerce courselecture one. Pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
20250228 LYD VKU AI Blended-Learning.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Encapsulation_ Review paper, used for researhc scholars
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
NewMind AI Weekly Chronicles - August'25-Week II
Spectroscopy.pptx food analysis technology
Teaching material agriculture food technology
Advanced methodologies resolving dimensionality complications for autism neur...
Digital-Transformation-Roadmap-for-Companies.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Assigned Numbers - 2025 - Bluetooth® Document
MYSQL Presentation for SQL database connectivity
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Electronic commerce courselecture one. Pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf

simple introduction to hadoop