SlideShare a Scribd company logo
Cloud and Big Data
Sebastien Goasguen,
January 29th
@sebgoa
A view on Big Data
http://www.economist.com/node/15557443?story_id=15557443
SKA
Cloud and Big Data trends
Cloud and Big Data trends
Cloud and Big Data trends
How did we get there ?
A natural evolution
New Distributed systems for:
Large scale datasets
• From scientific instruments
• From Web apps logs

Complex datasets
• Not necessarily large.

Object stores
• S3 clones
BigData and map-reduce
• While BigData is often associated with HDFS,
Map-Reduce is the algorithm used to
parallelize data processing.
• BigData ≠ Map-Reduce ≠ HDFS
• Map-reduce is a way to express
embarrassingly parallel work easily.
• You can do Map-Reduce without HDFS.
• e.g Basho map-reduce on riackCS
A really quick view on Clouds
Cloud and Big Data trends
Cloud and Big Data trends
Open Source IaaS
Today
BigData at
peak
History
2003 –Google File System
2005 – Hadoop
2006 – Hadoop enters ASF incubator (Feb)
2006 – S3 launched
2007 – Paper on Amazon Dynamo
2009 – EMR launched
2013 – CloudStack as a ASF TLP (March)
2013 – Spark/Mesos enters ASF incubator
The Apache Software Foundation
Apache Software Foundation
35 projects in incubation:
•
•
•

12 Hadoop related
~30% Big Data related
Spark

117 top level projects:
•
•
•
•
•

~16 cloud or bigdata +10%
Deltacloud, Libcloud, Whirr, jclouds
Hadoop, couchdb, cassandra, mesos
Bigtop, accumulo, lucene, UIMA
CloudStack
Hadoop Ecosystem

+ Up-coming next generation BD
systems
Big Data and Cloud (Stack)s
Clouds and BigData
• Object store + compute IaaS to build EC2+S3
clone
• BigData solutions as storage backends for
image catalogue and large scale instance
storage.

• BigData solutions as workloads to CloudStack
based clouds.
EC2, S3 clone
• An open source IaaS with an EC2
wrapper e.g Opennebula
• Deploy a S3 compatible object store –
separately- e.g riakCS
• Two independent distributed systems
deployed

Cloud = EC2 + S3
Big Data
as IaaS backend
“Big Data” solutions can be used as secondary
storage
.
Example
• Open source IaaS + EC2 wrapper, e.g
CloudStack
• Deploy S3 compatible object store, e.g
riakCS or Ceph or glusterFS
• Use S3 as image store
• Your EC2 service is a customer to your
S3 service
• Logstash + elasticsearch for
logs/monitoring
Even use Bare Metal
Big Data as a Workload to the Cloud
Mesos, Spark are EC2 native

o ec2_deploy.py
o ec2_deploy.sh
o…
Tools
“PaaS”
Dev Pipeline
Conclusions
• Big Data is “catching up”
• Tackle the big three head on:
• BigData, Cloud and DevOps
• Add a big data backend to your cloud
from the start
• Provide Big Data services on your cloud
Still
behind !
Final Thoughts

Who manages my data transfers ?
Event
ApacheCON + CloudStack Collaboration
Conference
Denver April 7-11th.

Cloud and Big Data
Get Involved with Apache
CloudStack
Web: http://cloudstack.apache.org/
Mailing Lists: cloudstack.apache.org/mailing-lists.html
IRC: irc.freenode.net: 6667 #cloudstack #cloudstack-dev
Twitter: @cloudstack
LinkedIn: www.linkedin.com/groups/CloudStack-Users-Group-3144859
If it didn’t happen on the mailing list, it didn’t happen.

More Related Content

PPTX
Webinar: BI in the Sky - The New Rules of Cloud Analytics
PPTX
SnapLogic Live: Big Data Integration
PPTX
Snaplogic Live: Big Data in Motion
PDF
Big data on AWS
PDF
Webinar: The 5 Most Critical Things to Understand About Modern Data Integration
PPTX
SnapLogic Live: Salesforce Integration
PDF
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
PDF
Azure Data Factory v2
Webinar: BI in the Sky - The New Rules of Cloud Analytics
SnapLogic Live: Big Data Integration
Snaplogic Live: Big Data in Motion
Big data on AWS
Webinar: The 5 Most Critical Things to Understand About Modern Data Integration
SnapLogic Live: Salesforce Integration
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Azure Data Factory v2

What's hot (20)

PDF
Infochimps: Cloud for Big Data
PPTX
Building a Self-Service Big Data Pipeline
PPTX
Reblaze Case Study on GCP
PDF
SLC Snowflake User Group - Mar 12, 2020
PDF
Bridging to a hybrid cloud data services architecture
PPTX
Big Data on azure
PDF
Treasure Data From MySQL to Redshift
PPTX
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1
PDF
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
PDF
Google Bigtable
PDF
Scaling Privacy in a Spark Ecosystem
PDF
Unleash the Power of Azure Data Factory - SQL User Group
PDF
Unified Data Access with Gimel
PPTX
Snowflake Overview
PPTX
Big Data Best Practices on GCP
PDF
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
PPTX
Hadoop Hadoop & Spark meetup - Altiscale
PPTX
DataStax Enterprise in Practice (Field Notes)
PDF
Witsml data processing with kafka and spark streaming
PDF
Migrate and Modernize Hadoop-Based Security Policies for Databricks
Infochimps: Cloud for Big Data
Building a Self-Service Big Data Pipeline
Reblaze Case Study on GCP
SLC Snowflake User Group - Mar 12, 2020
Bridging to a hybrid cloud data services architecture
Big Data on azure
Treasure Data From MySQL to Redshift
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Google Bigtable
Scaling Privacy in a Spark Ecosystem
Unleash the Power of Azure Data Factory - SQL User Group
Unified Data Access with Gimel
Snowflake Overview
Big Data Best Practices on GCP
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Hadoop Hadoop & Spark meetup - Altiscale
DataStax Enterprise in Practice (Field Notes)
Witsml data processing with kafka and spark streaming
Migrate and Modernize Hadoop-Based Security Policies for Databricks
Ad

Viewers also liked (18)

PDF
How the IoT market may change our digital life thanks to the Data Tsunami it ...
PPTX
DevOps in the clouds
PPTX
Tips For a Successful Cloud Proof-of-Concept - RightScale Compute 2013
PDF
Cloud and Machine Learning in real world business
PPTX
Orange Data Centre and Cloud
PDF
Cloud Big Data Architectures
PDF
Overview of big data in cloud computing
PDF
DevOps Oxford- DevOps + BigData @ RealTime
PPTX
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
PPTX
Big Data on OpenStack
PPTX
Pragmatic approach to Microservice Architecture: Role of Middleware
PPTX
Blog vine b2
PPTX
Tik kelompok 1
PPTX
สรุปการเรียน Meditation พี่หมาน
PPTX
Ejercicios de sistemas de ecuaciones
PPTX
Endocrine consultant South San Francisco CA
PPT
Lo sviluppo della relazione e della comunicazione v
PPT
Wallpaper Retailers in Delhi, Residential Wallpaper Retailers in Delhi, Desig...
How the IoT market may change our digital life thanks to the Data Tsunami it ...
DevOps in the clouds
Tips For a Successful Cloud Proof-of-Concept - RightScale Compute 2013
Cloud and Machine Learning in real world business
Orange Data Centre and Cloud
Cloud Big Data Architectures
Overview of big data in cloud computing
DevOps Oxford- DevOps + BigData @ RealTime
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Big Data on OpenStack
Pragmatic approach to Microservice Architecture: Role of Middleware
Blog vine b2
Tik kelompok 1
สรุปการเรียน Meditation พี่หมาน
Ejercicios de sistemas de ecuaciones
Endocrine consultant South San Francisco CA
Lo sviluppo della relazione e della comunicazione v
Wallpaper Retailers in Delhi, Residential Wallpaper Retailers in Delhi, Desig...
Ad

Similar to Cloud and Big Data trends (20)

PPT
CloudStack and BigData
PDF
Big data and cloud computing 9 sep-2017
PPT
Build A Cloud Day London - Introduction
PPT
Bd cloud v3
PDF
Big Data - in the cloud or rather on-premises?
PPTX
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
PPTX
Solving Big Data problems on AWS by Rajnish Malik
PPTX
Big Data Day LA 2015 - Lessons learned from scaling Big Data in the Cloud by...
PDF
Big Data and Analytics Innovation Summit
PDF
Big Data Analytics with Amazon Web Services
PPTX
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
PDF
Field Notes from Expeditions in the Cloud-(Matt Wood, Amazon Web Services)
PPTX
Cloud platforms - Cloud Computing
PPTX
Architecting Your First Big Data Implementation
PPTX
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
PDF
Big Data on AWS
PPTX
An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)
PPT
Big Data on The Cloud
PDF
PPTX
Cloud Services for Big Data Analytics
CloudStack and BigData
Big data and cloud computing 9 sep-2017
Build A Cloud Day London - Introduction
Bd cloud v3
Big Data - in the cloud or rather on-premises?
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
Solving Big Data problems on AWS by Rajnish Malik
Big Data Day LA 2015 - Lessons learned from scaling Big Data in the Cloud by...
Big Data and Analytics Innovation Summit
Big Data Analytics with Amazon Web Services
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
Field Notes from Expeditions in the Cloud-(Matt Wood, Amazon Web Services)
Cloud platforms - Cloud Computing
Architecting Your First Big Data Implementation
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Big Data on AWS
An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)
Big Data on The Cloud
Cloud Services for Big Data Analytics

More from Sebastien Goasguen (20)

PDF
Kubernetes Sealed secrets
PDF
Kubernetes Native Serverless solution: Kubeless
PPTX
Serverless on Kubernetes
PPTX
Kubernetes kubecon-roundup
PPT
Docker and CloudStack
PPTX
On Docker and its use for LHC at CERN
PPTX
CloudStack Conference Public Clouds Use Cases
PPT
Kubernetes on CloudStack with coreOS
PPTX
Apache Libcloud
PPTX
Moving from Publican to Read The Docs
PPTX
SDN: Network Agility in the Cloud
PPT
Build a Cloud Day Paris
PPT
CloudStack / Saltstack lightning talk at DevOps Amsterdam
PPT
CloudStack Clients and Tools
PPT
CloudMonkey
PPT
Intro to CloudStack API
PPT
Apache CloudStack Google Summer of Code
PPT
DevCloud and CloudMonkey
PDF
Git 101 for CloudStack
PPT
Intro to CloudStack Build a Cloud Day
Kubernetes Sealed secrets
Kubernetes Native Serverless solution: Kubeless
Serverless on Kubernetes
Kubernetes kubecon-roundup
Docker and CloudStack
On Docker and its use for LHC at CERN
CloudStack Conference Public Clouds Use Cases
Kubernetes on CloudStack with coreOS
Apache Libcloud
Moving from Publican to Read The Docs
SDN: Network Agility in the Cloud
Build a Cloud Day Paris
CloudStack / Saltstack lightning talk at DevOps Amsterdam
CloudStack Clients and Tools
CloudMonkey
Intro to CloudStack API
Apache CloudStack Google Summer of Code
DevCloud and CloudMonkey
Git 101 for CloudStack
Intro to CloudStack Build a Cloud Day

Recently uploaded (20)

PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
TLE Review Electricity (Electricity).pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Approach and Philosophy of On baking technology
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPT
Teaching material agriculture food technology
PDF
August Patch Tuesday
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
Diabetes mellitus diagnosis method based random forest with bat algorithm
cloud_computing_Infrastucture_as_cloud_p
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
TLE Review Electricity (Electricity).pptx
MIND Revenue Release Quarter 2 2025 Press Release
Approach and Philosophy of On baking technology
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Univ-Connecticut-ChatGPT-Presentaion.pdf
Teaching material agriculture food technology
August Patch Tuesday
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Heart disease approach using modified random forest and particle swarm optimi...
Network Security Unit 5.pdf for BCA BBA.
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Programs and apps: productivity, graphics, security and other tools
Advanced methodologies resolving dimensionality complications for autism neur...

Cloud and Big Data trends

Editor's Notes

  • #4: Walmart, 1m customer transactions every hour, db of 2.5 PB in 2010 http://www.economist.com/node/15557443?story_id=15557443
  • #5: Square Kilometer Array 10-500 TB per second ….1 exabyte per dayFacebook June 2012, 100 PB hadoop cluster, ½ PB per day = 180 PB per year -> ~350 PB now ?CERN ~20 PB EOS
  • #6: 250k cables war and peace 450k words, 260M worlds in cable gate = 500x war and peace
  • #7: 200 Million pages, 4 TB