SlideShare a Scribd company logo
Confidential, Copyright © Quanticate
Introduction to Map - Reduce
Muralidharan Deenathayalan
Technical Lead
Muralidharan.deenathayalan@quanticate.com
Apache logo are trademarks of The Apache Software Foundation.
All other marks mentioned may be trademarks or registered trademarks of their respective owners.
Confidential, Copyright © Quanticate
Agenda
What is Map-Reduce?
Map-Reduce architecture
Advantages of Map-Reduce
Frameworks available for writing Map-Reduce?
WordCount – Map-Reduce Program explained
How to compile Map-Reduce program using Eclipse?
How to deploy Map-Reduce program?
How to run Map-Reduce program?
Q & A
Confidential, Copyright © Quanticate
Who Am I ?
7+ years of experience in Microsoft technologies like Asp.net, C#,
SQL server and SharePoint
2+ years of experience in open source technologies like Java, Alfresco and Apache
Cassandra
Author of Apache Cassandra Cookbook (In writing )
Csharpcorner MVP
Frequent blogger
Confidential, Copyright © Quanticate
What is Map-Reduce?
 Generally called as Map-R program
 MapReduce Map() + Reduce()
 MapReduce is a programming approach to process large datasets in parallel, distributed on a
cluster ( Divide and conquer).
Map
Confidential, Copyright © Quanticate
What is Map-Reduce?
• Map:
– Receives input key/value pair
– Outputs intermediate key/value pair
• Reduce :
– Receives intermediate key/value pair
– Outputs key/value pair
Input Data
Map
Reduce
Reduce
Map
Map
Input Data
Confidential, Copyright © Quanticate
Map-Reduce Architecture overview
Job trackerJob tracker
Task tracker
Task tracker
Task tracker
Master node
Slave node 1 Slave node 2 Slave node N
Workers
user
Workers Workers
Confidential, Copyright © Quanticate
Advantages of Map-Reduce
 Distributed pattern-based searching
 Distributed sorting
 Web access logs
 Machine Learning
Confidential, Copyright © Quanticate
Framework available for writing
Map-Reduce
Courtesy & ©: http://blog.matthewrathbone.com/2013/01/05/a-quick-guide-to-hadoop-map-reduce-frameworks.html
JAVA
Cascading
Crunch
CLOJURE
Cascalog
SCALA
Scrunch
Scalding
Scoobi
R
Rhadoop
MICROSOFT
.Net (C# / VB.net)
SPECIAL (HIGH-LEVEL)
Apache Hive
Apache Pig
RUBY
Wukong
Cascading Jruby
PYTHON
MR Job
Dumbo
Hadooppy
Pydoop
Luigi
Confidential, Copyright © Quanticate
WordCount – Map-Reduce Program
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text,
Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
} } }
Confidential, Copyright © Quanticate
WordCount – Map-Reduce Program
public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable,
Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text,
IntWritable> output, Reporter reporter) throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
} }
Confidential, Copyright © Quanticate
WordCount – Map-Reduce Program
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf); }
Confidential, Copyright © Quanticate
How to compile Map-Reduce
program using Eclipse?
 Refer Hadoop jar file from your disk
 Maven is simple to use
 Eclipse  Project  Build Project
 No errors in the eclipse console 
Confidential, Copyright © Quanticate
How to deploy Map-Reduce program?
Confidential, Copyright © Quanticate
How to run Map-Reduce program?
Confidential, Copyright © Quanticate
Summary
 What is Map-Reduce?
 Architecture of Map-Reduce?
 Advantages of Map-Reduce
 Frameworks available for Map-Reduce?
 WordCount – Map-Reduce Program explained
 Compiling WordCount Map-Reduce program using Eclipse
 Deploying Map-Reduce program
 Executing a Map-Reduce program
Confidential, Copyright © Quanticate
Q & A
Confidential, Copyright © Quanticate
References
http://en.wikipedia.org/wiki/MapReduce
http://hortonworks.com
http://hadoop.apache.org
Confidential, Copyright © Quanticate
Coding-Freaks.Net
www.codingfreaks.net
Quanticate OPDev Twitter
https://twitter.com/quanticateopdev
Twitter
www.Twitter.com/muralidharand
Confidential, Copyright © Quanticate

More Related Content

PDF
Spark at-hackthon8jan2014
PPT
Hadoop Map Reduce
PPTX
Introduction to Map Reduce
PPT
Map Reduce
PPTX
MapReduce basic
PDF
Mapreduce by examples
PDF
Introduction to Map-Reduce
PPTX
Map Reduce
Spark at-hackthon8jan2014
Hadoop Map Reduce
Introduction to Map Reduce
Map Reduce
MapReduce basic
Mapreduce by examples
Introduction to Map-Reduce
Map Reduce

What's hot (20)

PPTX
Map reduce presentation
PPTX
Introduction to MapReduce
PPTX
MapReduce Paradigm
PDF
Map Reduce
PPT
Map Reduce
PDF
An Introduction to MapReduce
PPT
Map Reduce
PPTX
Analysing of big data using map reduce
PDF
Mapreduce Algorithms
PPTX
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
PDF
Large Scale Data Analysis with Map/Reduce, part I
PPT
An Introduction To Map-Reduce
PPTX
Map reduce paradigm explained
PDF
MapReduce Algorithm Design
PDF
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
PPTX
Introduction to MapReduce
PPTX
Hadoop/MapReduce/HDFS
PPT
Introduction To Map Reduce
PDF
Topic 6: MapReduce Applications
PPT
Hadoop MapReduce Fundamentals
Map reduce presentation
Introduction to MapReduce
MapReduce Paradigm
Map Reduce
Map Reduce
An Introduction to MapReduce
Map Reduce
Analysing of big data using map reduce
Mapreduce Algorithms
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Large Scale Data Analysis with Map/Reduce, part I
An Introduction To Map-Reduce
Map reduce paradigm explained
MapReduce Algorithm Design
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Introduction to MapReduce
Hadoop/MapReduce/HDFS
Introduction To Map Reduce
Topic 6: MapReduce Applications
Hadoop MapReduce Fundamentals
Ad

Similar to Map Reduce introduction (20)

PPTX
Hadoop and Mapreduce for .NET User Group
PDF
Big Data, a space adventure - Mario Cartia - Codemotion Rome 2015
PDF
lec8_ref.pdf
PPTX
Map reduce and Hadoop on windows
PPTX
PDF
Lecture 2 part 3
PPT
Big Data, a space adventure - Mario Cartia - Codemotion Milan 2014
PDF
Mypreson 27
PPTX
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspective
 
PDF
Scalding for Hadoop
PPTX
introduction to Complete Map and Reduce Framework
PPTX
COMPLETE MAP AND REDUCE FRAMEWORK INTRODUCTION
PPTX
Map reduce in Hadoop BIG DATA ANALYTICS
PPT
L19CloudMapReduce introduction for cloud computing .ppt
PPTX
Introduction to Map-Reduce Programming with Hadoop
PPTX
Intro to Big Data using Hadoop
PDF
Intro to Map Reduce
PDF
MapReduce
PPTX
This gives a brief detail about big data
PDF
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Hadoop and Mapreduce for .NET User Group
Big Data, a space adventure - Mario Cartia - Codemotion Rome 2015
lec8_ref.pdf
Map reduce and Hadoop on windows
Lecture 2 part 3
Big Data, a space adventure - Mario Cartia - Codemotion Milan 2014
Mypreson 27
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspective
 
Scalding for Hadoop
introduction to Complete Map and Reduce Framework
COMPLETE MAP AND REDUCE FRAMEWORK INTRODUCTION
Map reduce in Hadoop BIG DATA ANALYTICS
L19CloudMapReduce introduction for cloud computing .ppt
Introduction to Map-Reduce Programming with Hadoop
Intro to Big Data using Hadoop
Intro to Map Reduce
MapReduce
This gives a brief detail about big data
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Ad

More from Muralidharan Deenathayalan (10)

PPTX
What's new in C# 8.0 (beta)
PPTX
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
PPT
Alfresco 5.0 features
PPT
Test drive on driven development process
PPT
Apache Hive - Introduction
PPT
Apache cassandra
PPT
Alfresco share 4.1 to 4.2 customisation
PPT
Introduction about Alfresco webscript
PPT
Alfresco activiti workflows
PPT
Alfresco content model
What's new in C# 8.0 (beta)
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Alfresco 5.0 features
Test drive on driven development process
Apache Hive - Introduction
Apache cassandra
Alfresco share 4.1 to 4.2 customisation
Introduction about Alfresco webscript
Alfresco activiti workflows
Alfresco content model

Recently uploaded (20)

PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Getting Started with Data Integration: FME Form 101
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Spectroscopy.pptx food analysis technology
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Getting Started with Data Integration: FME Form 101
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Spectroscopy.pptx food analysis technology
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Big Data Technologies - Introduction.pptx
Network Security Unit 5.pdf for BCA BBA.
Reach Out and Touch Someone: Haptics and Empathic Computing
A comparative analysis of optical character recognition models for extracting...
Per capita expenditure prediction using model stacking based on satellite ima...
SOPHOS-XG Firewall Administrator PPT.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
MYSQL Presentation for SQL database connectivity
Dropbox Q2 2025 Financial Results & Investor Presentation
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation_ Review paper, used for researhc scholars
The Rise and Fall of 3GPP – Time for a Sabbatical?

Map Reduce introduction

  • 1. Confidential, Copyright © Quanticate Introduction to Map - Reduce Muralidharan Deenathayalan Technical Lead [email protected] Apache logo are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.
  • 2. Confidential, Copyright © Quanticate Agenda What is Map-Reduce? Map-Reduce architecture Advantages of Map-Reduce Frameworks available for writing Map-Reduce? WordCount – Map-Reduce Program explained How to compile Map-Reduce program using Eclipse? How to deploy Map-Reduce program? How to run Map-Reduce program? Q & A
  • 3. Confidential, Copyright © Quanticate Who Am I ? 7+ years of experience in Microsoft technologies like Asp.net, C#, SQL server and SharePoint 2+ years of experience in open source technologies like Java, Alfresco and Apache Cassandra Author of Apache Cassandra Cookbook (In writing ) Csharpcorner MVP Frequent blogger
  • 4. Confidential, Copyright © Quanticate What is Map-Reduce?  Generally called as Map-R program  MapReduce Map() + Reduce()  MapReduce is a programming approach to process large datasets in parallel, distributed on a cluster ( Divide and conquer). Map
  • 5. Confidential, Copyright © Quanticate What is Map-Reduce? • Map: – Receives input key/value pair – Outputs intermediate key/value pair • Reduce : – Receives intermediate key/value pair – Outputs key/value pair Input Data Map Reduce Reduce Map Map Input Data
  • 6. Confidential, Copyright © Quanticate Map-Reduce Architecture overview Job trackerJob tracker Task tracker Task tracker Task tracker Master node Slave node 1 Slave node 2 Slave node N Workers user Workers Workers
  • 7. Confidential, Copyright © Quanticate Advantages of Map-Reduce  Distributed pattern-based searching  Distributed sorting  Web access logs  Machine Learning
  • 8. Confidential, Copyright © Quanticate Framework available for writing Map-Reduce Courtesy & ©: http://blog.matthewrathbone.com/2013/01/05/a-quick-guide-to-hadoop-map-reduce-frameworks.html JAVA Cascading Crunch CLOJURE Cascalog SCALA Scrunch Scalding Scoobi R Rhadoop MICROSOFT .Net (C# / VB.net) SPECIAL (HIGH-LEVEL) Apache Hive Apache Pig RUBY Wukong Cascading Jruby PYTHON MR Job Dumbo Hadooppy Pydoop Luigi
  • 9. Confidential, Copyright © Quanticate WordCount – Map-Reduce Program public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one); } } }
  • 10. Confidential, Copyright © Quanticate WordCount – Map-Reduce Program public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } }
  • 11. Confidential, Copyright © Quanticate WordCount – Map-Reduce Program public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); }
  • 12. Confidential, Copyright © Quanticate How to compile Map-Reduce program using Eclipse?  Refer Hadoop jar file from your disk  Maven is simple to use  Eclipse  Project  Build Project  No errors in the eclipse console 
  • 13. Confidential, Copyright © Quanticate How to deploy Map-Reduce program?
  • 14. Confidential, Copyright © Quanticate How to run Map-Reduce program?
  • 15. Confidential, Copyright © Quanticate Summary  What is Map-Reduce?  Architecture of Map-Reduce?  Advantages of Map-Reduce  Frameworks available for Map-Reduce?  WordCount – Map-Reduce Program explained  Compiling WordCount Map-Reduce program using Eclipse  Deploying Map-Reduce program  Executing a Map-Reduce program
  • 16. Confidential, Copyright © Quanticate Q & A
  • 17. Confidential, Copyright © Quanticate References http://en.wikipedia.org/wiki/MapReduce http://hortonworks.com http://hadoop.apache.org
  • 18. Confidential, Copyright © Quanticate Coding-Freaks.Net www.codingfreaks.net Quanticate OPDev Twitter https://twitter.com/quanticateopdev Twitter www.Twitter.com/muralidharand