SlideShare a Scribd company logo
Big Data Processing With 
Scala and Spark 
Slide 1 www.edureka.Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions co/apache-spark-scala-training
Objectives of this Session 
What is Big Data? 
What is Spark? 
Why Spark? 
Spark Ecosystem 
A note about Scala 
Why Scala? 
Hello Spark! 
For Queries during the session and class recording: 
Post on Twitter @edurekaIN: #askEdureka 
Post on Facebook /edurekaIN 
Slide 2 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.co/apache-spark-scala-training
Big Data 
 Lots of Data (Terabytes or Petabytes) 
 Big data is the term for a collection of data sets 
so large and complex that it becomes difficult to 
process using on-hand database management 
tools or traditional data processing applications 
 The challenges include capture, curation, 
storage, search, sharing, transfer, analysis, and 
visualization 
cloud 
tools 
statistics 
No SQL 
Big Data 
compression 
support 
database 
storage 
analyze 
information 
mobile 
processing 
terabytes 
Slide 3 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.co/apache-spark-scala-training
What is Spark? 
 Apache Spark is a general-purpose cluster in-memory computing system 
 Provides high-level APIs in Java, Scala and Python, and an optimized engine that supports general execution graphs 
 Provides various high level tools like Spark SQL for structured data processing, Mlib for Machine Learning and more.. 
High Level 
APIs 
High Level 
Tools 
More… 
Slide 4 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.co/apache-spark-scala-training
Why Spark? 
via YARN 
Cluster Manager 
 The Spark framework can be deployed through 
Apache Mesos, Apache Hadoop via Yarn, or 
Spark’s own cluster manager. 
Deployment 
Slide 5 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.co/apache-spark-scala-training
Why Spark? 
 Spark framework is polyglot – Can be programmed 
in several programming languages (Currently 
Scala, Java and Python supported). 
Polyglot Scala 
Slide 6 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.co/apache-spark-scala-training
Why Spark? 
A fully Apache Hive compatible data 
warehousing system that can run 100x 
faster than Hive. 
100x faster than for certain applications. 
Slide 7 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.co/apache-spark-scala-training
Why Spark? 
 Provides powerful caching and disk persistence capabilities 
 Interactive Data Analysis 
 Faster Batch 
 Iterative Algorithms 
 Real-Time Stream Processing 
 Faster Decision-Making 
Slide 8 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.co/apache-spark-scala-training
Spark Community is Super Active! 
Slide 9 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.co/apache-spark-scala-training
Spark Ecosystem 
MLLib 
(Machine 
learning) 
Spark Core Engine 
Aplha/Pre-alpha 
BlindDB 
(Approximate 
SQL) 
Shark 
(SQL) 
Spark 
Streaming 
(Streaming) 
GraphX 
(Graph 
Computation) 
SparkR 
(R on Spark) 
Slide 10 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.co/apache-spark-scala-training
Spark Ecosystem (Contd.) 
An approximate 
query engine. To 
run over Core 
Spark Engine. 
Used for structured 
data. Can run 
unmodified hive 
queries on existing 
Hadoop 
deployment. 
MLLib 
(Machine 
learning) 
Enables analytical 
and interactive 
apps for live 
streaming data. 
Spark Core Engine 
Aplha/Pre-alpha 
BlindDB 
(Approximate 
SQL) 
Shark 
(SQL) 
Spark 
Streaming 
(Streaming) 
Graph Computation 
engine. 
(Similar to Giraph) 
GraphX 
(Graph 
Computation) 
Package for R language 
to enable R-users to 
leverage Spark power 
from R shell. 
SparkR 
(R on Spark) 
Machine learning library being built on top of Spark. Provision for support to many 
machine learning algorithms with speeds upto 100 times faster than Map-Reduce. 
Slide 11 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.co/apache-spark-scala-training
A Note on Scala 
 Scala is a general-purpose programming language designed 
to express common programming patterns in a concise, 
elegant, and type-safe way 
 Scala supports both Object Oriented Programming and 
Functional Programming 
 Scala is very much in fabric of present and Future Big Data 
frameworks like Scalding, Spark, Akka 
» All examples of Spark in class will be 
covered in Scala 
» Scala would be covered before Spark 
coverage as part of course! 
Slide 12 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.co/apache-spark-scala-training
Why Scala? 
 Scala is a pure object-oriented language. Conceptually, every value is an object and every operation is a 
method-call. The language supports advanced component architectures through classes and traits 
 Scala is also a functional language. Supports functions, immutable data structures and preference for 
immutability over mutation 
 Seamlessly integrated with Java 
 Being used heavily for future Big data and we developments frameworks like Spark, Akka, Scalding, Play etc 
Slide 13 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.co/apache-spark-scala-training
Hello Spark! 
Hello Spark! 
Slide 14 www.Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions edureka.co/apache-spark-scala-training
Questions? 
Buy Spark Course at : www.edureka.co 
Slide 15 www.edureka.Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions co/apache-spark-scala-training
Big data Processing with Apache Spark & Scala
Ad

Recommended

Spark for big data analytics
Spark for big data analytics
Edureka!
 
Performance of Spark vs MapReduce
Performance of Spark vs MapReduce
Edureka!
 
Big Data Processing With Spark
Big Data Processing With Spark
Edureka!
 
Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala
Edureka!
 
Apache Spark & Scala
Apache Spark & Scala
Edureka!
 
Spark Streaming
Spark Streaming
Edureka!
 
Introduction to Apache Spark
Introduction to Apache Spark
Vincent Poncet
 
Spark SQL | Apache Spark
Spark SQL | Apache Spark
Edureka!
 
Apache Spark beyond Hadoop MapReduce
Apache Spark beyond Hadoop MapReduce
Edureka!
 
Apache spark
Apache spark
Dona Mary Philip
 
Introduction to Apache Spark and MLlib
Introduction to Apache Spark and MLlib
pumaranikar
 
Apache spark
Apache spark
TEJPAL GAUTAM
 
Intro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of Twingo
MapR Technologies
 
Intro to Apache Spark
Intro to Apache Spark
BTI360
 
5 reasons why spark is in demand!
5 reasons why spark is in demand!
Edureka!
 
Apache spark
Apache spark
Edureka!
 
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Edureka!
 
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
Spark Summit
 
An Introduction to Apache Spark
An Introduction to Apache Spark
Dona Mary Philip
 
Apache spark linkedin
Apache spark linkedin
Yukti Kaura
 
Introduction to Apache Spark
Introduction to Apache Spark
Rahul Jain
 
Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & libraries
Walaa Hamdy Assy
 
Apache Spark Notes
Apache Spark Notes
Venkateswaran Kandasamy
 
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Simplilearn
 
5 things one must know about spark!
5 things one must know about spark!
Edureka!
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific Applciations
Dr. Mirko Kämpf
 
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Edureka!
 
Spark
Spark
Intellipaat
 
Scala and spark
Scala and spark
Fabio Fumarola
 
Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)
Thomas Vanhove
 

More Related Content

What's hot (20)

Apache Spark beyond Hadoop MapReduce
Apache Spark beyond Hadoop MapReduce
Edureka!
 
Apache spark
Apache spark
Dona Mary Philip
 
Introduction to Apache Spark and MLlib
Introduction to Apache Spark and MLlib
pumaranikar
 
Apache spark
Apache spark
TEJPAL GAUTAM
 
Intro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of Twingo
MapR Technologies
 
Intro to Apache Spark
Intro to Apache Spark
BTI360
 
5 reasons why spark is in demand!
5 reasons why spark is in demand!
Edureka!
 
Apache spark
Apache spark
Edureka!
 
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Edureka!
 
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
Spark Summit
 
An Introduction to Apache Spark
An Introduction to Apache Spark
Dona Mary Philip
 
Apache spark linkedin
Apache spark linkedin
Yukti Kaura
 
Introduction to Apache Spark
Introduction to Apache Spark
Rahul Jain
 
Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & libraries
Walaa Hamdy Assy
 
Apache Spark Notes
Apache Spark Notes
Venkateswaran Kandasamy
 
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Simplilearn
 
5 things one must know about spark!
5 things one must know about spark!
Edureka!
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific Applciations
Dr. Mirko Kämpf
 
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Edureka!
 
Spark
Spark
Intellipaat
 
Apache Spark beyond Hadoop MapReduce
Apache Spark beyond Hadoop MapReduce
Edureka!
 
Introduction to Apache Spark and MLlib
Introduction to Apache Spark and MLlib
pumaranikar
 
Intro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of Twingo
MapR Technologies
 
Intro to Apache Spark
Intro to Apache Spark
BTI360
 
5 reasons why spark is in demand!
5 reasons why spark is in demand!
Edureka!
 
Apache spark
Apache spark
Edureka!
 
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Edureka!
 
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
Spark Summit
 
An Introduction to Apache Spark
An Introduction to Apache Spark
Dona Mary Philip
 
Apache spark linkedin
Apache spark linkedin
Yukti Kaura
 
Introduction to Apache Spark
Introduction to Apache Spark
Rahul Jain
 
Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & libraries
Walaa Hamdy Assy
 
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Simplilearn
 
5 things one must know about spark!
5 things one must know about spark!
Edureka!
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific Applciations
Dr. Mirko Kämpf
 
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Edureka!
 

Viewers also liked (20)

Scala and spark
Scala and spark
Fabio Fumarola
 
Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)
Thomas Vanhove
 
Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"
Discover Pinterest
 
Introduction to Spark
Introduction to Spark
Li Ming Tsai
 
5 things one must know about spark!
5 things one must know about spark!
Edureka!
 
Spark Will Replace Hadoop ! Know Why
Spark Will Replace Hadoop ! Know Why
Edureka!
 
Scaling Big Data with Hadoop and Mesos
Scaling Big Data with Hadoop and Mesos
Discover Pinterest
 
Hadoop and Spark
Hadoop and Spark
Shravan (Sean) Pabba
 
"Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016
"Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016
René Pfitzner
 
Apache spark sneha challa- google pittsburgh-aug 25th
Apache spark sneha challa- google pittsburgh-aug 25th
Sneha Challa
 
Spark is going to replace Apache Hadoop! Know Why?
Spark is going to replace Apache Hadoop! Know Why?
Edureka!
 
Understanding Big Data And Hadoop
Understanding Big Data And Hadoop
Edureka!
 
Apache Spark Usage in the Open Source Ecosystem
Apache Spark Usage in the Open Source Ecosystem
Databricks
 
Fault Tolerance with Kafka
Fault Tolerance with Kafka
Edureka!
 
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
tsliwowicz
 
Apache Spark 101
Apache Spark 101
Abdullah Çetin ÇAVDAR
 
Introduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
Edureka!
 
2016 spark survey
2016 spark survey
Abhishek Choudhary
 
Big Data Trend with Open Platform
Big Data Trend with Open Platform
Jongwook Woo
 
Spark For Faster Batch Processing
Spark For Faster Batch Processing
Edureka!
 
Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)
Thomas Vanhove
 
Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"
Discover Pinterest
 
Introduction to Spark
Introduction to Spark
Li Ming Tsai
 
5 things one must know about spark!
5 things one must know about spark!
Edureka!
 
Spark Will Replace Hadoop ! Know Why
Spark Will Replace Hadoop ! Know Why
Edureka!
 
Scaling Big Data with Hadoop and Mesos
Scaling Big Data with Hadoop and Mesos
Discover Pinterest
 
"Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016
"Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016
René Pfitzner
 
Apache spark sneha challa- google pittsburgh-aug 25th
Apache spark sneha challa- google pittsburgh-aug 25th
Sneha Challa
 
Spark is going to replace Apache Hadoop! Know Why?
Spark is going to replace Apache Hadoop! Know Why?
Edureka!
 
Understanding Big Data And Hadoop
Understanding Big Data And Hadoop
Edureka!
 
Apache Spark Usage in the Open Source Ecosystem
Apache Spark Usage in the Open Source Ecosystem
Databricks
 
Fault Tolerance with Kafka
Fault Tolerance with Kafka
Edureka!
 
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
tsliwowicz
 
Introduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
Edureka!
 
Big Data Trend with Open Platform
Big Data Trend with Open Platform
Jongwook Woo
 
Spark For Faster Batch Processing
Spark For Faster Batch Processing
Edureka!
 
Ad

Similar to Big data Processing with Apache Spark & Scala (20)

5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!
Edureka!
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Edureka!
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
Edureka!
 
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Edureka!
 
Scala & Spark Online Training
Scala & Spark Online Training
Learntek1
 
Introduction to Spark - DataFactZ
Introduction to Spark - DataFactZ
DataFactZ
 
Scala and spark
Scala and spark
Janu Jahnavi
 
Apache Spark Fundamentals
Apache Spark Fundamentals
Zahra Eskandari
 
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
sasuke20y4sh
 
Learning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with Spark
phanleson
 
Learning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with Spark
phanleson
 
Google Developer Group Lublin 8 - Modern Lambda architecture in Big Data
Google Developer Group Lublin 8 - Modern Lambda architecture in Big Data
Hejwowski Piotr
 
Apache Spark for Beginners
Apache Spark for Beginners
Anirudh
 
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
Codemotion
 
Hadoop to spark_v2
Hadoop to spark_v2
elephantscale
 
Big Data Introduction - Solix empower
Big Data Introduction - Solix empower
Durga Gadiraju
 
Getting Started with Spark Scala
Getting Started with Spark Scala
Knoldus Inc.
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache Spark
C4Media
 
Big_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_Session
RUHULAMINHAZARIKA
 
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Edureka!
 
5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!
Edureka!
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Edureka!
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
Edureka!
 
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Edureka!
 
Scala & Spark Online Training
Scala & Spark Online Training
Learntek1
 
Introduction to Spark - DataFactZ
Introduction to Spark - DataFactZ
DataFactZ
 
Apache Spark Fundamentals
Apache Spark Fundamentals
Zahra Eskandari
 
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
sasuke20y4sh
 
Learning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with Spark
phanleson
 
Learning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with Spark
phanleson
 
Google Developer Group Lublin 8 - Modern Lambda architecture in Big Data
Google Developer Group Lublin 8 - Modern Lambda architecture in Big Data
Hejwowski Piotr
 
Apache Spark for Beginners
Apache Spark for Beginners
Anirudh
 
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
Codemotion
 
Big Data Introduction - Solix empower
Big Data Introduction - Solix empower
Durga Gadiraju
 
Getting Started with Spark Scala
Getting Started with Spark Scala
Knoldus Inc.
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache Spark
C4Media
 
Big_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_Session
RUHULAMINHAZARIKA
 
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Edureka!
 
Ad

More from Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
Edureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
Edureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
Edureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
Edureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
Edureka!
 
RPA in 2020 | Edureka
RPA in 2020 | Edureka
Edureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
Edureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
Edureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
Edureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
Edureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | Edureka
Edureka!
 
What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
Edureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
Edureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
Edureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
Edureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
Edureka!
 
RPA in 2020 | Edureka
RPA in 2020 | Edureka
Edureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
Edureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
Edureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
Edureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
Edureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | Edureka
Edureka!
 

Recently uploaded (20)

Nice Dream.pdf /
Nice Dream.pdf /
ErinUsher3
 
Measuring, learning and applying multiplication facts.
Measuring, learning and applying multiplication facts.
cgilmore6
 
Publishing Your Memoir with Brooke Warner
Publishing Your Memoir with Brooke Warner
Brooke Warner
 
THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...
THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...
parmarjuli1412
 
Battle of Bookworms 2025 - U25 Literature Quiz by Pragya
Battle of Bookworms 2025 - U25 Literature Quiz by Pragya
Pragya - UEM Kolkata Quiz Club
 
Exploring Ocean Floor Features for Middle School
Exploring Ocean Floor Features for Middle School
Marie
 
Wax Moon, Richmond, VA. Terrence McPherson
Wax Moon, Richmond, VA. Terrence McPherson
TerrenceMcPherson1
 
Energy Balances Of Oecd Countries 2011 Iea Statistics 1st Edition Oecd
Energy Balances Of Oecd Countries 2011 Iea Statistics 1st Edition Oecd
razelitouali
 
JHS SHS Back to School 2024-2025 .pptx
JHS SHS Back to School 2024-2025 .pptx
melvinapay78
 
LDMMIA GRAD Student Check-in Orientation Sampler
LDMMIA GRAD Student Check-in Orientation Sampler
LDM & Mia eStudios
 
ROLE PLAY: FIRST AID -CPR & RECOVERY POSITION.pptx
ROLE PLAY: FIRST AID -CPR & RECOVERY POSITION.pptx
Belicia R.S
 
Paper 107 | From Watchdog to Lapdog: Ishiguro’s Fiction and the Rise of “Godi...
Paper 107 | From Watchdog to Lapdog: Ishiguro’s Fiction and the Rise of “Godi...
Rajdeep Bavaliya
 
Overview of Off Boarding in Odoo 18 Employees
Overview of Off Boarding in Odoo 18 Employees
Celine George
 
How to Configure Vendor Management in Lunch App of Odoo 18
How to Configure Vendor Management in Lunch App of Odoo 18
Celine George
 
How to Manage & Create a New Department in Odoo 18 Employee
How to Manage & Create a New Department in Odoo 18 Employee
Celine George
 
BUSINESS QUIZ PRELIMS | QUIZ CLUB OF PSGCAS | 9 SEPTEMBER 2024
BUSINESS QUIZ PRELIMS | QUIZ CLUB OF PSGCAS | 9 SEPTEMBER 2024
Quiz Club of PSG College of Arts & Science
 
What is FIle and explanation of text files.pptx
What is FIle and explanation of text files.pptx
Ramakrishna Reddy Bijjam
 
Revista digital preescolar en transformación
Revista digital preescolar en transformación
guerragallardo26
 
Overview of Employee in Odoo 18 - Odoo Slides
Overview of Employee in Odoo 18 - Odoo Slides
Celine George
 
How to Manage Upselling of Subscriptions in Odoo 18
How to Manage Upselling of Subscriptions in Odoo 18
Celine George
 
Nice Dream.pdf /
Nice Dream.pdf /
ErinUsher3
 
Measuring, learning and applying multiplication facts.
Measuring, learning and applying multiplication facts.
cgilmore6
 
Publishing Your Memoir with Brooke Warner
Publishing Your Memoir with Brooke Warner
Brooke Warner
 
THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...
THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...
parmarjuli1412
 
Battle of Bookworms 2025 - U25 Literature Quiz by Pragya
Battle of Bookworms 2025 - U25 Literature Quiz by Pragya
Pragya - UEM Kolkata Quiz Club
 
Exploring Ocean Floor Features for Middle School
Exploring Ocean Floor Features for Middle School
Marie
 
Wax Moon, Richmond, VA. Terrence McPherson
Wax Moon, Richmond, VA. Terrence McPherson
TerrenceMcPherson1
 
Energy Balances Of Oecd Countries 2011 Iea Statistics 1st Edition Oecd
Energy Balances Of Oecd Countries 2011 Iea Statistics 1st Edition Oecd
razelitouali
 
JHS SHS Back to School 2024-2025 .pptx
JHS SHS Back to School 2024-2025 .pptx
melvinapay78
 
LDMMIA GRAD Student Check-in Orientation Sampler
LDMMIA GRAD Student Check-in Orientation Sampler
LDM & Mia eStudios
 
ROLE PLAY: FIRST AID -CPR & RECOVERY POSITION.pptx
ROLE PLAY: FIRST AID -CPR & RECOVERY POSITION.pptx
Belicia R.S
 
Paper 107 | From Watchdog to Lapdog: Ishiguro’s Fiction and the Rise of “Godi...
Paper 107 | From Watchdog to Lapdog: Ishiguro’s Fiction and the Rise of “Godi...
Rajdeep Bavaliya
 
Overview of Off Boarding in Odoo 18 Employees
Overview of Off Boarding in Odoo 18 Employees
Celine George
 
How to Configure Vendor Management in Lunch App of Odoo 18
How to Configure Vendor Management in Lunch App of Odoo 18
Celine George
 
How to Manage & Create a New Department in Odoo 18 Employee
How to Manage & Create a New Department in Odoo 18 Employee
Celine George
 
What is FIle and explanation of text files.pptx
What is FIle and explanation of text files.pptx
Ramakrishna Reddy Bijjam
 
Revista digital preescolar en transformación
Revista digital preescolar en transformación
guerragallardo26
 
Overview of Employee in Odoo 18 - Odoo Slides
Overview of Employee in Odoo 18 - Odoo Slides
Celine George
 
How to Manage Upselling of Subscriptions in Odoo 18
How to Manage Upselling of Subscriptions in Odoo 18
Celine George
 

Big data Processing with Apache Spark & Scala

  • 1. Big Data Processing With Scala and Spark Slide 1 www.edureka.Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions co/apache-spark-scala-training
  • 2. Objectives of this Session What is Big Data? What is Spark? Why Spark? Spark Ecosystem A note about Scala Why Scala? Hello Spark! For Queries during the session and class recording: Post on Twitter @edurekaIN: #askEdureka Post on Facebook /edurekaIN Slide 2 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.co/apache-spark-scala-training
  • 3. Big Data  Lots of Data (Terabytes or Petabytes)  Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications  The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization cloud tools statistics No SQL Big Data compression support database storage analyze information mobile processing terabytes Slide 3 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.co/apache-spark-scala-training
  • 4. What is Spark?  Apache Spark is a general-purpose cluster in-memory computing system  Provides high-level APIs in Java, Scala and Python, and an optimized engine that supports general execution graphs  Provides various high level tools like Spark SQL for structured data processing, Mlib for Machine Learning and more.. High Level APIs High Level Tools More… Slide 4 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.co/apache-spark-scala-training
  • 5. Why Spark? via YARN Cluster Manager  The Spark framework can be deployed through Apache Mesos, Apache Hadoop via Yarn, or Spark’s own cluster manager. Deployment Slide 5 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.co/apache-spark-scala-training
  • 6. Why Spark?  Spark framework is polyglot – Can be programmed in several programming languages (Currently Scala, Java and Python supported). Polyglot Scala Slide 6 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.co/apache-spark-scala-training
  • 7. Why Spark? A fully Apache Hive compatible data warehousing system that can run 100x faster than Hive. 100x faster than for certain applications. Slide 7 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.co/apache-spark-scala-training
  • 8. Why Spark?  Provides powerful caching and disk persistence capabilities  Interactive Data Analysis  Faster Batch  Iterative Algorithms  Real-Time Stream Processing  Faster Decision-Making Slide 8 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.co/apache-spark-scala-training
  • 9. Spark Community is Super Active! Slide 9 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.co/apache-spark-scala-training
  • 10. Spark Ecosystem MLLib (Machine learning) Spark Core Engine Aplha/Pre-alpha BlindDB (Approximate SQL) Shark (SQL) Spark Streaming (Streaming) GraphX (Graph Computation) SparkR (R on Spark) Slide 10 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.co/apache-spark-scala-training
  • 11. Spark Ecosystem (Contd.) An approximate query engine. To run over Core Spark Engine. Used for structured data. Can run unmodified hive queries on existing Hadoop deployment. MLLib (Machine learning) Enables analytical and interactive apps for live streaming data. Spark Core Engine Aplha/Pre-alpha BlindDB (Approximate SQL) Shark (SQL) Spark Streaming (Streaming) Graph Computation engine. (Similar to Giraph) GraphX (Graph Computation) Package for R language to enable R-users to leverage Spark power from R shell. SparkR (R on Spark) Machine learning library being built on top of Spark. Provision for support to many machine learning algorithms with speeds upto 100 times faster than Map-Reduce. Slide 11 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.co/apache-spark-scala-training
  • 12. A Note on Scala  Scala is a general-purpose programming language designed to express common programming patterns in a concise, elegant, and type-safe way  Scala supports both Object Oriented Programming and Functional Programming  Scala is very much in fabric of present and Future Big Data frameworks like Scalding, Spark, Akka » All examples of Spark in class will be covered in Scala » Scala would be covered before Spark coverage as part of course! Slide 12 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.co/apache-spark-scala-training
  • 13. Why Scala?  Scala is a pure object-oriented language. Conceptually, every value is an object and every operation is a method-call. The language supports advanced component architectures through classes and traits  Scala is also a functional language. Supports functions, immutable data structures and preference for immutability over mutation  Seamlessly integrated with Java  Being used heavily for future Big data and we developments frameworks like Spark, Akka, Scalding, Play etc Slide 13 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.co/apache-spark-scala-training
  • 14. Hello Spark! Hello Spark! Slide 14 www.Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions edureka.co/apache-spark-scala-training
  • 15. Questions? Buy Spark Course at : www.edureka.co Slide 15 www.edureka.Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions co/apache-spark-scala-training