SlideShare a Scribd company logo
QUICK AND DIRTY  PARALLEL PROCESSING  ON THE CLOUD Daniel Sikar
EC2 S3
 
Tools AWS Command line tools
Elastic MapReduce Ruby library
Hadoop
s3cmd
Hadoop MapReduce Job Tracker + Task Tracker + Slaves HDFS – Distributed file system
Hadoop MapReduce usage Data crunching in general Clicks Statistics etc
Hadoop Project Mgmt Committee
MapReduce ?
MapReduce Key Pairs <key,value>
MapReduce
HTTP Logs Log file A: (...) FreeTouchScreenNokia5230 (...) (...) GetRidofAllSpeedCameras(...) (...) USManWinsLottery (...) (...) BNPToLaunchElectionManifesto (...) Log file B: (...) FreeTouchScreenNokia5230 (...) (...) BodyLanguageTellsAll (...)
MapReduce <FreeTouchScreenNokia5230, 1> + <FreeTouchScreenNokia5230, 1> = <FreeTouchScreenNokia5230, 2>
Hadoop Streaming Running MapReduce jobs  with .exe fiels  and scripts $ <list> | mapper | reducer
Hadoop Streaming Running MapReduce jobs  with .exe fiels  and scripts $ <list> | mapper | reducer
Real life example of Hadoop Streaming usage
Wikipedia Page Access Logs
Wine Grape Varieties
Wikipedia WGV Page Access Stats
Business Decisions
Launching a virtual Hadoop Cluster $  elastic-mapreduce  --create --name &quot;Wiki log crunch&quot; --alive --num-instances –instance-type c1.medium 20 Created job flow <job flow id> $  ec2din (...)
 
Hadoop Standalone Operation
Pseudo-Distributed Operation
Fully-Distributed Operation
NameNode
JobTracker

More Related Content

ODP
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
PPTX
scalable machine learning
PPTX
RedisConf17 - Distributed Java Map Structures and Services with Redisson
PDF
Java data structures powered by Redis. Introduction to Redisson @ Redis Light...
PDF
Scala+data
DOCX
A Shiny Example-- R
PDF
Tracing and awk in ns2
PPTX
Cloudstack interfaces to EC2 and GCE
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
scalable machine learning
RedisConf17 - Distributed Java Map Structures and Services with Redisson
Java data structures powered by Redis. Introduction to Redisson @ Redis Light...
Scala+data
A Shiny Example-- R
Tracing and awk in ns2
Cloudstack interfaces to EC2 and GCE

What's hot (19)

PDF
NS2: AWK and GNUplot - PArt III
PDF
Unified Data Platform, by Pauline Yeung of Cisco Systems
PDF
Unleash your build with nuke
PPT
Upgrading To The New Map Reduce API
PPTX
Shrug2017 arcpy data_and_you
PDF
37562259 top-consuming-process
PDF
"Metrics: Where and How", Vsevolod Polyakov
PDF
Openstack 簡介
DOCX
Raw system logs processing with hive
PDF
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
PDF
PyCon KR 2019 sprint - RustPython by example
PDF
Cocoa勉強会23-識別情報の変換〜文字エンコードとデータタイプ
PPTX
Leveraging Intra-Node Parallelization in HPCC Systems
PDF
Data warehouse or conventional database: Which is right for you?
PDF
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
PDF
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
PDF
Debugging & Tuning in Spark
KEY
Hadoop導入事例 in クックパッド
PDF
Parallel Computing with R
NS2: AWK and GNUplot - PArt III
Unified Data Platform, by Pauline Yeung of Cisco Systems
Unleash your build with nuke
Upgrading To The New Map Reduce API
Shrug2017 arcpy data_and_you
37562259 top-consuming-process
"Metrics: Where and How", Vsevolod Polyakov
Openstack 簡介
Raw system logs processing with hive
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
PyCon KR 2019 sprint - RustPython by example
Cocoa勉強会23-識別情報の変換〜文字エンコードとデータタイプ
Leveraging Intra-Node Parallelization in HPCC Systems
Data warehouse or conventional database: Which is right for you?
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Debugging & Tuning in Spark
Hadoop導入事例 in クックパッド
Parallel Computing with R
Ad

Viewers also liked (6)

PPTX
Big data presenation
PPTX
Big data
PDF
10 Tips for WeChat
PPTX
5 Steps To A Smart Compensation Plan
PDF
Benefits of drinking water
PDF
Stay Up To Date on the Latest Happenings in the Boardroom: Recommended Summer...
Big data presenation
Big data
10 Tips for WeChat
5 Steps To A Smart Compensation Plan
Benefits of drinking water
Stay Up To Date on the Latest Happenings in the Boardroom: Recommended Summer...
Ad

Similar to Daniel Sikar: Hadoop MapReduce - 06/09/2010 (20)

PDF
Hopping in clouds - phpuk 17
PDF
Improving Apache Spark Downscaling
PPTX
FP - Découverte de Play Framework Scala
PPTX
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
PPTX
4Developers 2018: Pyt(h)on vs słoń: aktualny stan przetwarzania dużych danych...
PPT
Cloud State of the Union for Java Developers
PDF
Machine Learning with H2O, Spark, and Python at Strata 2015
PDF
Declarative & workflow based infrastructure with Terraform
PPTX
ETL with SPARK - First Spark London meetup
PDF
R the unsung hero of Big Data
PPT
TopicMapReduceComet log analysis by using splunk
PDF
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
KEY
Introduction to cloudforecast
PDF
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
PDF
Miscelaneous Debris
PDF
Into The Box 2018 Going live with commandbox and docker
PDF
Going live with BommandBox and docker Into The Box 2018
PDF
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
PDF
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
PDF
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Hopping in clouds - phpuk 17
Improving Apache Spark Downscaling
FP - Découverte de Play Framework Scala
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
4Developers 2018: Pyt(h)on vs słoń: aktualny stan przetwarzania dużych danych...
Cloud State of the Union for Java Developers
Machine Learning with H2O, Spark, and Python at Strata 2015
Declarative & workflow based infrastructure with Terraform
ETL with SPARK - First Spark London meetup
R the unsung hero of Big Data
TopicMapReduceComet log analysis by using splunk
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Introduction to cloudforecast
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
Miscelaneous Debris
Into The Box 2018 Going live with commandbox and docker
Going live with BommandBox and docker Into The Box 2018
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...

More from Skills Matter (20)

PDF
5 things cucumber is bad at by Richard Lawrence
ODP
Patterns for slick database applications
PDF
Scala e xchange 2013 haoyi li on metascala a tiny diy jvm
ODP
Oscar reiken jr on our success at manheim
ODP
Progressive f# tutorials nyc dmitry mozorov & jack pappas on code quotations ...
PDF
Cukeup nyc ian dees on elixir, erlang, and cucumberl
PDF
Cukeup nyc peter bell on getting started with cucumber.js
PDF
Agile testing & bdd e xchange nyc 2013 jeffrey davidson & lav pathak & sam ho...
ODP
Progressive f# tutorials nyc rachel reese & phil trelford on try f# from zero...
ODP
Progressive f# tutorials nyc don syme on keynote f# in the open source world
PDF
Agile testing & bdd e xchange nyc 2013 gojko adzic on bond villain guide to s...
PPTX
Dmitry mozorov on code quotations code as-data for f#
PDF
A poet's guide_to_acceptance_testing
PDF
Russ miles-cloudfoundry-deep-dive
KEY
Serendipity-neo4j
PDF
Simon Peyton Jones: Managing parallelism
PDF
Plug 20110217
PDF
Lug presentation
PPT
I went to_a_communications_workshop_and_they_t
PDF
Plug saiku
5 things cucumber is bad at by Richard Lawrence
Patterns for slick database applications
Scala e xchange 2013 haoyi li on metascala a tiny diy jvm
Oscar reiken jr on our success at manheim
Progressive f# tutorials nyc dmitry mozorov & jack pappas on code quotations ...
Cukeup nyc ian dees on elixir, erlang, and cucumberl
Cukeup nyc peter bell on getting started with cucumber.js
Agile testing & bdd e xchange nyc 2013 jeffrey davidson & lav pathak & sam ho...
Progressive f# tutorials nyc rachel reese & phil trelford on try f# from zero...
Progressive f# tutorials nyc don syme on keynote f# in the open source world
Agile testing & bdd e xchange nyc 2013 gojko adzic on bond villain guide to s...
Dmitry mozorov on code quotations code as-data for f#
A poet's guide_to_acceptance_testing
Russ miles-cloudfoundry-deep-dive
Serendipity-neo4j
Simon Peyton Jones: Managing parallelism
Plug 20110217
Lug presentation
I went to_a_communications_workshop_and_they_t
Plug saiku

Recently uploaded (20)

PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PPT
Teaching material agriculture food technology
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Mushroom cultivation and it's methods.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
Network Security Unit 5.pdf for BCA BBA.
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
SOPHOS-XG Firewall Administrator PPT.pptx
Heart disease approach using modified random forest and particle swarm optimi...
Teaching material agriculture food technology
Reach Out and Touch Someone: Haptics and Empathic Computing
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
A comparative study of natural language inference in Swahili using monolingua...
Mushroom cultivation and it's methods.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Digital-Transformation-Roadmap-for-Companies.pptx
NewMind AI Weekly Chronicles - August'25-Week II
TLE Review Electricity (Electricity).pptx
Assigned Numbers - 2025 - Bluetooth® Document
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
A comparative analysis of optical character recognition models for extracting...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Univ-Connecticut-ChatGPT-Presentaion.pdf

Daniel Sikar: Hadoop MapReduce - 06/09/2010

Editor's Notes

  • #21: So without further ado lets get this show on the road and run a job concurrently on a few virtual machines.