SlideShare a Scribd company logo
Welcome to big data
Agenda
• What is Big data?
• Some BIG facts
• Objective
• Sources
• 3 V’s of Big data
• 3 + 1 V’s of Big data
• Technologies
• Opportunities
• Major Players
• Questions
• Conclusion
What is Big data?

Data

Big Data
What is Big data?

Data

Big Data
Some BIG facts
• 90% of the data in the world today has been created in the
last two years alone
• IDC Forecasting: The global universe of data will double
every two years, reaching 40,000 exabytes or 40 trillion GB
by 2020
• The Large Hadron Collider near Geneva, Switzerland, will
produce about 15 petabytes of data per year.
• Ancestry.com, the genealogy site, stores around 2.5
petabytes of data.
• The Internet Archive stores around 2 petabytes of data, and
is growing at a rate of 20 terabytes per month.
Some BIG facts – What happens everyday?
• The New York Stock Exchange generates about one
terabyte of new trade data
• Zynga processes 1 Petabyte of content
• 30 billion pieces of content were added to Facebook
• 2 billion videos are watched in Youtube
• 2.5 quintillion bytes of data is created
Some BIG facts – What happens every minute?

Courtesy: http://practicalanalytics.files.wordpress.com
Big data – Objective

Effectively store, manage and analyze all
the data to create meaningful information
out of it
Big data – Sources
Big data – 3 V’s of Big data

Courtesy: bigdatablog.emc.com
Big data – 3 + 1 V’s of Big data

Courtesy: http://www.datasciencecentral.com/
Big data - Volume

Volumes are in:
• Terabytes
• Exabytes
• Petabytes
• Zetabytes

Courtesy: http://www.datasciencecentral.com/
Big data - Volume

Name

Value

1 GB
1 Terabyte (TB)

1024 GB

1 Petabyte (PB)

1,048,576 GB

1 Exabyte (EB)

1,073,741,824 GB

1 Zeta byte (ZB)

1,099,511,627,776 GB

1 Yottabyte (YB)

Courtesy: http://www.datasciencecentral.com/

1,073,741,824 bytes

1,125,899,906,842,624 GB
Big data - Velocity

• Live Stream
• Real time
• Batch

Courtesy: http://www.datasciencecentral.com/
Big data - Variety

• Structured (Tables)
• Unstructured (Tweets, SMSes)
• Semi-structured (Logfiles, RFID)

Courtesy: http://www.datasciencecentral.com/
Big data - Veracity

• This kind of data is often
overlooked
• It is now considered as
important as 3 V’s of Big Data
• Effort to clean up data is rather
not given importance
• Poor data quality costs the U.S.
economy around $3.1 trillions a
year

Source: McKinsey, Gartner, Twitter, Cisco, EMC, SAS, IBM, MEPTEC, QAS
Big data Technologies
Technologies & Solution providers:
• Storage (MS SqlServer, Apache Hadoop, Mongo DB)
• Processing (MapReduce, Impala)
• Analytics (SAS, R, Business Intelligence)
• Integration (Flume, Sqoop)
Big data - Opportunities
•
•
•
•
•

Storage
Processing
Analytics
Integration
Solution
Big data – Major Players
Big data – Questions?
Big data – Thank you !!!

More Related Content

PDF
Data Monetization
PPT
Hack reduce introduction
PPTX
Gail Zhou on "Big Data Technology, Strategy, and Applications"
PPTX
Big Data
PDF
Apache Con Eu2008 Hadoop Tour Tom White
PPTX
INTRODUCTION OF BIG DATA
PDF
Big data and data science
PPTX
IPTC Semantic Web 2012 Spring Working Group
Data Monetization
Hack reduce introduction
Gail Zhou on "Big Data Technology, Strategy, and Applications"
Big Data
Apache Con Eu2008 Hadoop Tour Tom White
INTRODUCTION OF BIG DATA
Big data and data science
IPTC Semantic Web 2012 Spring Working Group

Viewers also liked (7)

PPTX
Big data big rewards
PPTX
Case 3.1 - Big data big rewards
PPTX
Week 3 Case 1 : Big Data Big Reward
PPTX
Case study 8
PPTX
Big Data Analytics MIS presentation
PPT
Big data ppt
PPTX
Big data ppt
Big data big rewards
Case 3.1 - Big data big rewards
Week 3 Case 1 : Big Data Big Reward
Case study 8
Big Data Analytics MIS presentation
Big data ppt
Big data ppt
Ad

Similar to Welcome to big data (20)

PDF
Data-Ed: Demystifying Big Data
PDF
DataEd Online: Demystifying Big Data
PDF
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
PDF
Big Data - Gerami
PDF
Big data for cio 2015
PPT
Briefing Room 20161213 - ep019 - Red Hat - Modern Business Storage
PPT
The Elephant in the Library - Integrating Hadoop
PPTX
Big Data Analytics Strategy and Roadmap
PPTX
Big Data Analytics
PPT
big data
PPTX
Big Data basics-Unit-1.pptx
PPTX
Introduction to Big Data
PDF
Big data
PPTX
Cassandra ppt 1
PDF
Big data
PPTX
BigData.pptx
PPTX
PPT
Hadoop HDFS.ppt
PPTX
Gyorgy balogh modern_big_data_technologies_sec_world_2014
PPTX
An introduction to data science- beginner level
Data-Ed: Demystifying Big Data
DataEd Online: Demystifying Big Data
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Big Data - Gerami
Big data for cio 2015
Briefing Room 20161213 - ep019 - Red Hat - Modern Business Storage
The Elephant in the Library - Integrating Hadoop
Big Data Analytics Strategy and Roadmap
Big Data Analytics
big data
Big Data basics-Unit-1.pptx
Introduction to Big Data
Big data
Cassandra ppt 1
Big data
BigData.pptx
Hadoop HDFS.ppt
Gyorgy balogh modern_big_data_technologies_sec_world_2014
An introduction to data science- beginner level
Ad

More from Saravanan Subburayal (6)

PPTX
Devops as a service
PPTX
Machine learning
PDF
Azure series 2 creating a cloud service - web role
PPTX
Fluent validation
PPTX
Asp.Net MVC3 - Basics
PPTX
Cloud - Azure – an introduction
Devops as a service
Machine learning
Azure series 2 creating a cloud service - web role
Fluent validation
Asp.Net MVC3 - Basics
Cloud - Azure – an introduction

Recently uploaded (20)

PPTX
Machine Learning_overview_presentation.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
1. Introduction to Computer Programming.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPT
Teaching material agriculture food technology
PPTX
Tartificialntelligence_presentation.pptx
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
Machine Learning_overview_presentation.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
1. Introduction to Computer Programming.pptx
MIND Revenue Release Quarter 2 2025 Press Release
Per capita expenditure prediction using model stacking based on satellite ima...
A comparative analysis of optical character recognition models for extracting...
Group 1 Presentation -Planning and Decision Making .pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Teaching material agriculture food technology
Tartificialntelligence_presentation.pptx
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Heart disease approach using modified random forest and particle swarm optimi...
Encapsulation_ Review paper, used for researhc scholars
Univ-Connecticut-ChatGPT-Presentaion.pdf
Encapsulation theory and applications.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Assigned Numbers - 2025 - Bluetooth® Document
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Diabetes mellitus diagnosis method based random forest with bat algorithm

Welcome to big data

  • 2. Agenda • What is Big data? • Some BIG facts • Objective • Sources • 3 V’s of Big data • 3 + 1 V’s of Big data • Technologies • Opportunities • Major Players • Questions • Conclusion
  • 3. What is Big data? Data Big Data
  • 4. What is Big data? Data Big Data
  • 5. Some BIG facts • 90% of the data in the world today has been created in the last two years alone • IDC Forecasting: The global universe of data will double every two years, reaching 40,000 exabytes or 40 trillion GB by 2020 • The Large Hadron Collider near Geneva, Switzerland, will produce about 15 petabytes of data per year. • Ancestry.com, the genealogy site, stores around 2.5 petabytes of data. • The Internet Archive stores around 2 petabytes of data, and is growing at a rate of 20 terabytes per month.
  • 6. Some BIG facts – What happens everyday? • The New York Stock Exchange generates about one terabyte of new trade data • Zynga processes 1 Petabyte of content • 30 billion pieces of content were added to Facebook • 2 billion videos are watched in Youtube • 2.5 quintillion bytes of data is created
  • 7. Some BIG facts – What happens every minute? Courtesy: http://practicalanalytics.files.wordpress.com
  • 8. Big data – Objective Effectively store, manage and analyze all the data to create meaningful information out of it
  • 9. Big data – Sources
  • 10. Big data – 3 V’s of Big data Courtesy: bigdatablog.emc.com
  • 11. Big data – 3 + 1 V’s of Big data Courtesy: http://www.datasciencecentral.com/
  • 12. Big data - Volume Volumes are in: • Terabytes • Exabytes • Petabytes • Zetabytes Courtesy: http://www.datasciencecentral.com/
  • 13. Big data - Volume Name Value 1 GB 1 Terabyte (TB) 1024 GB 1 Petabyte (PB) 1,048,576 GB 1 Exabyte (EB) 1,073,741,824 GB 1 Zeta byte (ZB) 1,099,511,627,776 GB 1 Yottabyte (YB) Courtesy: http://www.datasciencecentral.com/ 1,073,741,824 bytes 1,125,899,906,842,624 GB
  • 14. Big data - Velocity • Live Stream • Real time • Batch Courtesy: http://www.datasciencecentral.com/
  • 15. Big data - Variety • Structured (Tables) • Unstructured (Tweets, SMSes) • Semi-structured (Logfiles, RFID) Courtesy: http://www.datasciencecentral.com/
  • 16. Big data - Veracity • This kind of data is often overlooked • It is now considered as important as 3 V’s of Big Data • Effort to clean up data is rather not given importance • Poor data quality costs the U.S. economy around $3.1 trillions a year Source: McKinsey, Gartner, Twitter, Cisco, EMC, SAS, IBM, MEPTEC, QAS
  • 17. Big data Technologies Technologies & Solution providers: • Storage (MS SqlServer, Apache Hadoop, Mongo DB) • Processing (MapReduce, Impala) • Analytics (SAS, R, Business Intelligence) • Integration (Flume, Sqoop)
  • 18. Big data - Opportunities • • • • • Storage Processing Analytics Integration Solution
  • 19. Big data – Major Players
  • 20. Big data – Questions?
  • 21. Big data – Thank you !!!

Editor's Notes

  • #12: Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety. Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management. Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age. As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored. It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.
  • #13: Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety. Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management. Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age. As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored. It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.
  • #14: Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety. Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management. Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age. As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored. It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.
  • #15: Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety. Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management. Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age. As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored. It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.
  • #16: Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety. Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management. Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age. As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored. It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.
  • #17: Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety. Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management. Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age. As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored. It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.