SlideShare a Scribd company logo
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Agenda for today’s Session
1. What is Hadoop MapReduce?
2. MapReduce In Nutshell
3. Advantages of MapReduce
4. Hadoop MapReduce Approach with an Example
5. Hadoop MapReduce/YARN Components
6. YARN With MapReduce
7. Yarn Application Workflow
8. MapReduce Program with Hands On
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Hadoop Components
2 main Hadoop Components
Storage Processing
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce: Data Processing Using Programming
Big Data
Result
 Hadoop MapReduce is the processing component of Apache
Hadoop
 It processes data parallelly in distributed environment
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce In Nutshell
MapReduce
FeaturesLarge Scale
Distributed Model
Used in
Function
Design Pattern
Parallel
Programming
A Program Model
Classification
Analytics
Recommendation
Index and Search
Map
Reduce
Classification
Eg: Top N records
Analytics
Eg: Join, Selection
Recommendation
Eg: Sort
Summarization
Eg: Inverted Index
Implemented
Google
Apache Hadoop
HDFS
Pig
Hive
HBase
For
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
2 Biggest Advantages of MapReduce
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
 Data is processed in parallel
 Processing becomes fast
Advantage 1: Parallel Processing
Slave A
Slave B
Slave C Slave D
Slave E
Master
Data 
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
 Moving Data to processing is very costly
 In MapReduce, we move processing to
Data
Advantage 2: Data Locality - Processing to Storage
Slave A
Slave B
Slave C Slave D
Slave E
Data 
Master
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Traditional vs MapReduce Way
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Election Votes Counting
Election Votes Casting
 Votes is stored at different Booths
 Result Centre has the details of all the Booths
Data 
Booth A
Booth B
Booth C Booth D
Booth E
Result
Centre
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Election Votes Counting – Traditional Way
Counting – Traditional Approach
 Votes are moved to Result Centre for
counting
 Moving all the votes to Centre is costly
 Result Centre is over-burdened
 Counting takes time
Data 
Booth A
Booth B
Booth C Booth D
Booth E
Result
Centre
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Result
Centre
Data 
Hadoop MapReduce To the Rescue!
Hadoop MapReduce Doesn’t
Follow This Approach
Booth A
Booth B
Booth C Booth D
Booth E
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Election Votes Counting – MapReduce Way
Booth A
Booth B
Booth C Booth D
Booth E
Result
Centre
Counting – MapReduce Approach
 Votes are counted at individual booths
 Booth-wise results are sent back to the result
centre
 Final Result is declared easily and quickly using
this way
Votes
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce In Detail
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce Way
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Anatomy of a MapReduce Program
MapReduce
Map:
Reduce:
(K1, V1) List (K2, V2)
(K2, list (V2)) List (K3, V3)
Key Value
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Let us take an example to understand
MapReduce Way
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce Way – Word Count Process
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Executing a MapReduce Program
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce Using Yarn
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
YARN – Moving beyond MapReduce
BATCH
(MapReduce)
INTERACTIVE
(Text)
ONLINE
(HBase)
STREAMING
(Storm, S4, …)
GRAPH
(Giraph)
IN-MEMORY
(Spark)
HPC MPI
(OpenMPI)
OTHER
(Search)
(Weave..)
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Hadoop 2.x Daemons
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Hadoop 2.x MapReduce Yarn Components
 ApplicationMaster
» One per application
» Short life
» Coordinates and Manages MapReduce Jobs
» Negotiates with Resource Manager to
schedule tasks
» The tasks are started by NodeManager(s)
 Job History Server
» Maintains information about submitted
MapReduce jobs after their ApplicationMaster
terminates
 Client
» Submits a MapReduce Job
 Resource Manager
» Cluster Level resource manager
» Long Life, High Quality Hardware
 Node Manager
» One per Data Node
» Monitors resources on Data Node
 Container
» Created by NM when requested
» Allocates certain amount of resources
(memory, CPU etc.) on a slave node
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
YARN Application Workflow in MapReduce
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
YARN Workflow
Node Manager
Node Manager
Node Manager
Node Manager
Node Manager
Node Manager
Node Manager
Node Manager
Node Manager
Node Manager
Node Manager
Node Manager
Container 1.2
Container 1.1
Container 2.1
Container 2.2
Container 2.3
App
Master 2
App
Master 1
Scheduler
Applications
Manager (AsM)
Resource
Manager
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Application Workflow
Execution Sequence :
1. Client submits an application Client RM NM AM
1
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Application Workflow
Execution Sequence :
1. Client submits an application
2. RM allocates a container to start AM
Client RM NM AM
1
2
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Application Workflow
Execution Sequence :
1. Client submits an application
2. RM allocates a container to start AM
3. AM registers with RM
Client RM NM AM
1
2
3
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Application Workflow
Execution Sequence :
1. Client submits an application
2. RM allocates a container to start AM
3. AM registers with RM
4. AM asks containers from RM
Client RM NM AM
1
2
3
4
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Application Workflow
Execution Sequence :
1. Client submits an application
2. RM allocates a container to start AM
3. AM registers with RM
4. AM asks containers from RM
5. AM notifies NM to launch containers
Client RM NM AM
1
2
3
4
5
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Application Workflow
Execution Sequence :
1. Client submits an application
2. RM allocates a container to start AM
3. AM registers with RM
4. AM asks containers from RM
5. AM notifies NM to launch containers
6. Application code is executed in container
Client RM NM AM
1
2
3
4
5
6
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Application Workflow
Execution Sequence :
1. Client submits an application
2. RM allocates a container to start AM
3. AM registers with RM
4. AM asks containers from RM
5. AM notifies NM to launch containers
6. Application code is executed in container
7. Client contacts RM/AM to monitor application’s status
Client RM NM AM
1
2
3
4
5
7 6
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Application Workflow
Execution Sequence :
1. Client submits an application
2. RM allocates a container to start AM
3. AM registers with RM
4. AM asks containers from RM
5. AM notifies NM to launch containers
6. Application code is executed in container
7. Client contacts RM/AM to monitor application’s status
8. AM unregisters with RM
Client RM NM AM
1
2
3
4
5
7
8
6
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Learning Resources
 Hadoop Tutorial: www.edureka.co/blog/hadoop-tutorial
 MapReduce Tutorial: www.edureka.co/blog/mapreduce-tutorial
 MapReduce Interview Questions:
www.edureka.co/blog/interview-questions/hadoop-interview-questions-mapreduce
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Thank You …
Questions/Queries/Feedback

More Related Content

PPTX
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
PDF
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
PDF
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
PDF
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
PDF
Introduction to Spark Internals
PDF
Introduction to Apache Hive
PDF
PPTX
Learn Hadoop Administration
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
Introduction to Spark Internals
Introduction to Apache Hive
Learn Hadoop Administration

What's hot (20)

PPTX
PPT on Hadoop
PDF
Hadoop ecosystem
PPTX
Introduction to HDFS
PDF
Simplifying Big Data Analytics with Apache Spark
PDF
HDFS Architecture
PDF
Hadoop YARN
PPTX
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
PDF
PPTX
Introduction to package in java
PPTX
Introduction to Hadoop and Hadoop component
PDF
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
PDF
Introduction to Apache Sqoop
PPTX
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
PDF
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
PPTX
PPTX
Spring boot
PDF
SQOOP PPT
PPTX
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
PPTX
Hadoop File system (HDFS)
PPT
Map Reduce
PPT on Hadoop
Hadoop ecosystem
Introduction to HDFS
Simplifying Big Data Analytics with Apache Spark
HDFS Architecture
Hadoop YARN
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Introduction to package in java
Introduction to Hadoop and Hadoop component
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Introduction to Apache Sqoop
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Spring boot
SQOOP PPT
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop File system (HDFS)
Map Reduce
Ad

Viewers also liked (20)

PDF
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
PDF
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
PDF
Big Data Career Path | Big Data Learning Path | Hadoop Tutorial | Edureka
PPTX
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
PDF
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
PPTX
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
PPTX
Splunk Tutorial for Beginners - What is Splunk | Edureka
PPTX
Salesforce Marketing Cloud Training | Salesforce Training For Beginners - Mar...
PPTX
What Is Salesforce? | Salesforce Training - What Does Salesforce Do? | Salesf...
PDF
Control Transactions using PowerCenter
PPTX
What Is Salesforce CRM? | Salesforce CRM Tutorial For Beginners | Salesforce ...
PDF
Differences between OpenStack and AWS
PPTX
Salesforce Service Cloud Training | Salesforce Training For Beginners - Servi...
PPTX
Salesforce Certification | Salesforce Careers | Salesforce Training For Begin...
PPTX
Selenium Tutorial For Beginners | What Is Selenium? | Selenium Automation Tes...
PDF
Introduction on Data Science
PPTX
Big Data & Hadoop Tutorial
PDF
Hadoop Architecture and HDFS
PPTX
Application of Clustering in Data Science using Real-life Examples
PDF
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Big Data Career Path | Big Data Learning Path | Hadoop Tutorial | Edureka
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
Splunk Tutorial for Beginners - What is Splunk | Edureka
Salesforce Marketing Cloud Training | Salesforce Training For Beginners - Mar...
What Is Salesforce? | Salesforce Training - What Does Salesforce Do? | Salesf...
Control Transactions using PowerCenter
What Is Salesforce CRM? | Salesforce CRM Tutorial For Beginners | Salesforce ...
Differences between OpenStack and AWS
Salesforce Service Cloud Training | Salesforce Training For Beginners - Servi...
Salesforce Certification | Salesforce Careers | Salesforce Training For Begin...
Selenium Tutorial For Beginners | What Is Selenium? | Selenium Automation Tes...
Introduction on Data Science
Big Data & Hadoop Tutorial
Hadoop Architecture and HDFS
Application of Clustering in Data Science using Real-life Examples
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Ad

Similar to MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka (20)

PDF
XML Parsing with Map Reduce
PDF
Bulk Loading Into HBase With MapReduce
PDF
Distributed Cache With MapReduce
PDF
Hadoop Training For Beginners | Hadoop Tutorial | Big Data Training |Edureka
PDF
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
PDF
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
PDF
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
PDF
Changes Expected in Hadoop 3 | Getting to Know Hadoop 3 Alpha | Upcoming Hado...
PDF
Hadoop MapReduce Framework
PDF
Hadoop Administration pdf
PDF
Big Data Analytics [email protected]
PPTX
What is hadoop
PPT
Lecture 4 Parallel and Distributed Systems Fall 2024.ppt
PDF
Introduction to Spark
PDF
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
PPTX
Hadoop Adminstration with Latest Release (2.0)
PDF
Mapredtutorial
PDF
Hadoop and Mapreduce Certification
PDF
Apache Spark Overview
XML Parsing with Map Reduce
Bulk Loading Into HBase With MapReduce
Distributed Cache With MapReduce
Hadoop Training For Beginners | Hadoop Tutorial | Big Data Training |Edureka
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Changes Expected in Hadoop 3 | Getting to Know Hadoop 3 Alpha | Upcoming Hado...
Hadoop MapReduce Framework
Hadoop Administration pdf
Big Data Analytics [email protected]
What is hadoop
Lecture 4 Parallel and Distributed Systems Fall 2024.ppt
Introduction to Spark
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Hadoop Adminstration with Latest Release (2.0)
Mapredtutorial
Hadoop and Mapreduce Certification
Apache Spark Overview

More from Edureka! (20)

PDF
What to learn during the 21 days Lockdown | Edureka
PDF
Top 10 Dying Programming Languages in 2020 | Edureka
PDF
Top 5 Trending Business Intelligence Tools | Edureka
PDF
Tableau Tutorial for Data Science | Edureka
PDF
Python Programming Tutorial | Edureka
PDF
Top 5 PMP Certifications | Edureka
PDF
Top Maven Interview Questions in 2020 | Edureka
PDF
Linux Mint Tutorial | Edureka
PDF
How to Deploy Java Web App in AWS| Edureka
PDF
Importance of Digital Marketing | Edureka
PDF
RPA in 2020 | Edureka
PDF
Email Notifications in Jenkins | Edureka
PDF
EA Algorithm in Machine Learning | Edureka
PDF
Cognitive AI Tutorial | Edureka
PDF
AWS Cloud Practitioner Tutorial | Edureka
PDF
Blue Prism Top Interview Questions | Edureka
PDF
Big Data on AWS Tutorial | Edureka
PDF
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
PDF
Kubernetes Installation on Ubuntu | Edureka
PDF
Introduction to DevOps | Edureka
What to learn during the 21 days Lockdown | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
Tableau Tutorial for Data Science | Edureka
Python Programming Tutorial | Edureka
Top 5 PMP Certifications | Edureka
Top Maven Interview Questions in 2020 | Edureka
Linux Mint Tutorial | Edureka
How to Deploy Java Web App in AWS| Edureka
Importance of Digital Marketing | Edureka
RPA in 2020 | Edureka
Email Notifications in Jenkins | Edureka
EA Algorithm in Machine Learning | Edureka
Cognitive AI Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
Blue Prism Top Interview Questions | Edureka
Big Data on AWS Tutorial | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Kubernetes Installation on Ubuntu | Edureka
Introduction to DevOps | Edureka

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
cuic standard and advanced reporting.pdf
PPTX
Machine Learning_overview_presentation.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPT
Teaching material agriculture food technology
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Programs and apps: productivity, graphics, security and other tools
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Unlocking AI with Model Context Protocol (MCP)
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Network Security Unit 5.pdf for BCA BBA.
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Spectral efficient network and resource selection model in 5G networks
cuic standard and advanced reporting.pdf
Machine Learning_overview_presentation.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
A comparative analysis of optical character recognition models for extracting...
Building Integrated photovoltaic BIPV_UPV.pdf
NewMind AI Weekly Chronicles - August'25-Week II
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
“AI and Expert System Decision Support & Business Intelligence Systems”
Teaching material agriculture food technology
sap open course for s4hana steps from ECC to s4
Programs and apps: productivity, graphics, security and other tools

MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka