SlideShare a Scribd company logo
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spark Tutorial |Simplilearn
1. What is Spark?
2. Components of Spark
Spark Core
Spark SQL
Spark Streaming
Spark MLlib
GraphX
3. Apache Spark Architecture
4. Running a Spark Application
What’s in it for you?
What is Apache Spark?
Apache Spark is a top-level open-source cluster computing framework used for real-time
processing and analysis of a large amount of data
What is Apache Spark?
Apache Spark is a top-level open-source cluster computing framework used for real-time
processing and analysis of a large amount of data
Fast
processing
Spark processes data faster since it
saves time in reading and writing
operations
What is Apache Spark?
Apache Spark is a top-level open-source cluster computing framework used for real-time
processing and analysis of a large amount of data
Fast
processing
Real-time
streaming
Spark processes data faster since it
saves time in reading and writing
operations
Spark allows real-time streaming and
processing of data
What is Apache Spark?
Apache Spark is a top-level open-source cluster computing framework used for real-time
processing and analysis of a large amount of data
Fast
processing
Real-time
streaming
In-memory
computation
Spark processes data faster since it
saves time in reading and writing
operations
Spark allows real-time streaming and
processing of data
Spark has DAG execution engine that
provides in-memory computation
What is Apache Spark?
Apache Spark is a top-level open-source cluster computing framework used for real-time
processing and analysis of a large amount of data
Fast
processing
Real-time
streaming
In-memory
computation
Fault
tolerant
Spark processes data faster since it
saves time in reading and writing
operations
Spark allows real-time streaming and
processing of data
Spark has DAG execution engine that
provides in-memory computation
Spark is fault tolerant through RDDs which
are designed to handle the failure of any
worker node in the cluster
Spark Components
Spark Core
Apache Spark Components
Spark Core Spark SQL
SQL
Apache Spark Components
Spark
Streaming
Spark Core Spark SQL
SQL Streaming
Apache Spark Components
MLlib
Spark
Streaming
Spark Core Spark SQL
SQL Streaming MLlib
Apache Spark Components
MLlib
Spark
Streaming
Spark Core Spark SQL GraphX
SQL Streaming MLlib
Apache Spark Components
Spark Core
Spark is the core engine for large-scale parallel and distributed data processing
Spark Core
Spark is the core engine for large-scale parallel and distributed data processing
Memory management and fault recovery
Scheduling, distributing and monitoring jobs on a cluster
Interacting with storage system
Performs the following:
Spark RDD
Resilient Distributed Datasets (RDDs) are the building blocks of any Spark application
Create RDD Transformations
RDD
Actions Results
Transformations are Operations (such as
map, filter, join, union) that are performed on
an RDD that yields a new RDD containing the
result
Actions are operations (such as
reduce, first, count) that return
a value after running a computation
on an RDD
Spark SQL
Spark SQL is Apache Spark’s module for working with structured data
SQL
Spark SQL
Spark SQL is Apache Spark’s module for working with structured data
SQL
Integrated
You can integrate Spark SQL with
Spark programs and query
structured data inside Spark
programs
Spark SQL features
Spark SQL
Spark SQL is Apache Spark’s module for working with structured data
SQL
Integrated
High
Compatibility
You can integrate Spark SQL with
Spark programs and query
structured data inside Spark
programs
You can run unmodified Hive
queries on existing warehouses
in Spark SQL. With existing Hive
data, queries and UDFs, Spark
SQL offers full compatibility
Spark SQL features
Spark SQL
Spark SQL is Apache Spark’s module for working with structured data
SQL
Integrated
High
Compatibility
Scalability
You can integrate Spark SQL with
Spark programs and query
structured data inside Spark
programs
You can run unmodified Hive
queries on existing warehouses
in Spark SQL. With existing Hive
data, queries and UDFs, Spark
SQL offers full compatibility
Spark SQL leverages RDD model
as it supports large jobs and mid-
query fault tolerance. Moreover,
for both interactive and long
queries, it uses the same engine
Spark SQL features
Spark SQL
Spark SQL is Apache Spark’s module for working with structured data
SQL
Integrated
Spark SQL features
High
Compatibility
Scalability
Standard
Connectivity
You can integrate Spark SQL with
Spark programs and query
structured data inside Spark
programs
You can run unmodified Hive
queries on existing warehouses
in Spark SQL. With existing Hive
data, queries and UDFs, Spark
SQL offers full compatibility
Spark SQL leverages RDD model
as it supports large jobs and mid-
query fault tolerance. Moreover,
for both interactive and long
queries, it uses the same engine
You can easily connect Spark
SQL with JDBC or ODBC. For
connectivity for business
intelligence tools, both turned as
industry norms
Spark SQL
Spark SQL is Apache Spark’s module for working with structured data
DataFrame DSLSpark SQL and HQL
DataFrame API
Data Source API
CSV JSON JDBC
SQL Architecture
SQL
Spark SQL
Spark SQL has three main layers
Spark SQL is Apache Spark’s module for working with structured data
Language API SchemaRDD Data Sources
Spark is compatible and even
supported by the languages like
Python, HiveQL, Scala, and Java
As Spark SQL works on schema,
tables, and records, you can use
SchemaRDD or data frame as a
temporary table
Data sources for Spark SQL are
different like JSON document, HIVE
tables, and Cassandra database
SQL
Spark SQL
Spark allows you to define custom SQL functions called User Defined Functions (UDFs)
SQL
def lowerRemoveAllWhiteSpaces(s: String): String = {
s.tolowerCase().replace(“S”, ‘’”)
}
val lowerRemoveAllWhiteSpacesUDF = udf[String, String]
(lowerRemoveAllWhiteSpaces)
val sourceDF = spark.createDF(
List(
(“ WELCOME “)
(“ SpaRk SqL “)
), List(
(“text”, StringType, true)
)
)
sourceDF.select(
lowerRemoveAllWhiteSpacesUDF(col(“text”)).as(“clean_text”)
).show()
UDF that removes all
the whitespace and
lowercases all the characters
in a string
clean_text
welcome
sparksql
Output
Spark Streaming
Spark Streaming an extension of the core Spark API that enables scalable,
high-throughput, fault-tolerant stream processing of live data streams
Streaming
Spark Streaming
Spark Streaming an extension of the core Spark API that enables scalable,
high-throughput, fault-tolerant stream processing of live data streams
Data can be ingested from many sources and the processed data
can be pushed out to different filesystems
Streaming
Spark Streaming
Spark Streaming an extension of the core Spark API that enables scalable,
high-throughput, fault-tolerant stream processing of live data streams
Data can be ingested from many sources and the processed data
can be pushed out to different filesystems
Streaming
Streaming data sources
Static data sources
Spark Streaming
Spark Streaming an extension of the core Spark API that enables scalable,
high-throughput, fault-tolerant stream processing of live data streams
Data can be ingested from many sources and the processed data
can be pushed out to different filesystems
Streaming
Streaming
Streaming data sources
Static data sources
Spark Streaming
Spark Streaming an extension of the core Spark API that enables scalable,
high-throughput, fault-tolerant stream processing of live data streams
Data can be ingested from many sources and the processed data
can be pushed out to different filesystems
Streaming
Streaming
Streaming data sources
Static data sources
Data storage
Spark Streaming
Spark Streaming an extension of the core Spark API that enables scalable,
high-throughput, fault-tolerant stream processing of live data streams
Spark Streaming receives live input data streams and divides the
data into batches, which are then processed by the Spark engine to
generate the final stream of results in batches
Streaming Engine
Input data
stream
Batches of
input data
Batches of
processed
data
Streaming
Spark Streaming
Spark Streaming an extension of the core Spark API that enables scalable,
high-throughput, fault-tolerant stream processing of live data streams
Streaming
Here is an example of a basic RDD operation to extract individual
words from lines of text in an input data stream
Lines From
Time 0 and 1
Lines From
Time 1 and 2
Lines From
Time 2 and 3
Lines From
Time 3 and 4
Words From
Time 0 and 1
Words From
Time 1 and 2
Words From
Time 2 and 3
Words From
Time 3 and 4
Lines
DStream
Words
DStream
flatMap
Operation
Spark MLlib
MLlib is Spark’s machine learning library. Its goal is to make practical machine learning
scalable and easy
MLlib
MLlib is Spark’s machine learning library. Its goal is to make practical machine learning
scalable and easy
MLlib
At a high level, it provides the following:
ML Algorithms: classification, regression, clustering, and
collaborative filtering
Spark MLlib
MLlib is Spark’s machine learning library. Its goal is to make practical machine learning
scalable and easy
MLlib
At a high level, it provides the following:
ML Algorithms: classification, regression, clustering, and
collaborative filtering
Featurization: feature extraction, transformation,
dimensionality reduction, and selection
Spark MLlib
MLlib is Spark’s machine learning library. Its goal is to make practical machine learning
scalable and easy
MLlib
At a high level, it provides the following:
ML Algorithms: classification, regression, clustering, and
collaborative filtering
Featurization: feature extraction, transformation,
dimensionality reduction, and selection
Pipelines: tools for constructing, evaluating, and tuning ML
pipelines
Spark MLlib
MLlib is Spark’s machine learning library. Its goal is to make practical machine learning
scalable and easy
MLlib
At a high level, it provides the following:
ML Algorithms: classification, regression, clustering, and
collaborative filtering
Featurization: feature extraction, transformation,
dimensionality reduction, and selection
Pipelines: tools for constructing, evaluating, and tuning ML
pipelines
Utilities: linear algebra, statistics, data handling
Spark MLlib
GraphX
GraphX is a component in Spark for graphs and graph-parallel computation
GraphX is used to model relations between objects. A graph has vertices
(objects) and edges (relationships).
Mathew Justin
Edge
Vertex
Relationship: Friends
GraphX
GraphX is a component in Spark for graphs and graph-parallel computation
Provides a uniform tool
for ETL
Exploratory data
analysis
Interactive graph
computations
GraphX is a component in Spark for graphs and graph-parallel computation
Page Rank
Fraud
Detection
Geographic
information system
Disaster
management
Following are the applications of GraphX
GraphX
Spark Architecture
Spark Architecture
Spark Architecture is based on 2 important abstractions
Spark Architecture
Spark Architecture is based on 2 important abstractions
Resilient Distributed Dataset
(RDD)
RDD’s are the fundamental units of data in Apache
Spark that are split into partitions and can be executed
on different nodes of a cluster
Cluster
RDD
Spark Architecture
Spark Architecture is based on 2 important abstractions
Resilient Distributed Dataset
(RDD)
Directed Acyclic Graph
(DAG)
RDD’s are the fundamental units of data in Apache
Spark that are split into partitions and can be executed
on different nodes of a cluster
Cluster
DAG is the scheduling layer of the Spark
Architecture that implements stage-oriented
scheduling and eliminates the Hadoop MapReduce
multistage execution model
RDD
Stage 1
Parallelize
Filter
Map
Stage 2
reduceByKey
Map
Spark Architecture
Master Node
Driver Program
SparkContext
• Master Node has a Driver Program
• The Spark code behaves as a driver
program and creates a SparkContext
which is a gateway to all the Spark
functionalities
Apache Spark uses a master-slave architecture that consists of a driver, that runs on a
master node, and multiple executors which run across the worker nodes in the cluster
Spark Architecture
Cluster Manager
• Spark applications run as independent
sets of processes
on a cluster
• The driver program & Spark context
takes care of the job execution within
the cluster
Master Node
Driver Program
SparkContext
Spark Architecture
Cache
Task Task
Executor
Worker Node
Cache
Task Task
Executor
Worker Node
• A job is split into multiple tasks that are
distributed over the worker node
• When an RDD is created in Spark
context, it can be distributed across
various nodes
• Worker nodes are slaves that execute
different tasks
Cluster Manager
Master Node
Driver Program
SparkContext
Spark Architecture
Cache
Task Task
Executor
Worker Node
Cache
Task Task
Executor
Worker Node
• Executor is responsible for the
execution of these tasks
• Worker nodes execute the tasks
assigned by the Cluster Manager and
returns the resultback to the Spark
Context
Master Node
Driver Program
SparkContext Cluster Manager
Spark Architecture
Cache
Task Task
Executor
Worker Node
Cache
Task Task
Executor
Worker Node
• Worker nodes execute the tasks
assigned by the Cluster Manager and
returns it back to the Spark Context
• Executor is responsible for the
execution of these tasks
Master Node
Driver Program
SparkContext Cluster Manager
Running a Spark
Application
Spark Session
Driver Program
Application
How a Spark application runs on a cluster?
Spark applications run as independent processes, coordinated by the
SparkSession object in the driver program
Spark Session
Driver Program
Application
Resource Manager/
Cluster Manager
How a Spark application runs on a cluster?
The resource or cluster manager assigns tasks to workers,
one task per partition
Spark Session
Driver Program
Application
Worker Node
Executor
Task
Task
Cache
Partition
Partition
Disk
Data
Data
How a Spark application runs on a cluster?
Resource Manager/
Cluster Manager
• A task applies its unit of work to the dataset in its
partition and outputs a new partition dataset
• Because iterative algorithms apply operations
repeatedly to data, they benefit from caching datasets
across iterations
How a Spark application runs on a cluster?
Spark Session
Driver Program
Application
Executor
Task
Task
Cache
Partition
Partition
Disk
Data
Data
Resource Manager/
Cluster Manager
Results are sent back to the driver application or
can be saved to disk
Worker Node
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spark Tutorial |Simplilearn

More Related Content

What's hot (20)

Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0
Databricks
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
Databricks
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
Anton Kirillov
 
Introduction to Spark with Python
Introduction to Spark with PythonIntroduction to Spark with Python
Introduction to Spark with Python
Gokhan Atil
 
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Simplilearn
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
DataArt
 
Apache spark
Apache sparkApache spark
Apache spark
shima jafari
 
Spark
SparkSpark
Spark
Heena Madan
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta Lakehouse
Databricks
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Rahul Jain
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on Kubernetes
Databricks
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
Spark Summit
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
Mostafa
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
Databricks
 
Spark
SparkSpark
Spark
Koushik Mondal
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
datamantra
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
Joud Khattab
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
Databricks
 
Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0
Databricks
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
Databricks
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
Anton Kirillov
 
Introduction to Spark with Python
Introduction to Spark with PythonIntroduction to Spark with Python
Introduction to Spark with Python
Gokhan Atil
 
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Simplilearn
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
DataArt
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta Lakehouse
Databricks
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Rahul Jain
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on Kubernetes
Databricks
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
Spark Summit
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
Mostafa
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
Databricks
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
datamantra
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
Databricks
 

Similar to Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spark Tutorial |Simplilearn (20)

Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache Spark
C4Media
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
Databricks
 
Big data processing with Apache Spark and Oracle Database
Big data processing with Apache Spark and Oracle DatabaseBig data processing with Apache Spark and Oracle Database
Big data processing with Apache Spark and Oracle Database
Martin Toshev
 
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsfPyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
sasuke20y4sh
 
Spark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingSpark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, Streaming
Petr Zapletal
 
SparkPaper
SparkPaperSparkPaper
SparkPaper
Suraj Thapaliya
 
Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & librariesApache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & libraries
Walaa Hamdy Assy
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with Databricks
Anyscale
 
Big_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_SessionBig_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_Session
RUHULAMINHAZARIKA
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
Dharmjit Singh
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Databricks
 
Spark 101
Spark 101Spark 101
Spark 101
Shahaf Azriely {TopLinked} ☁
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
sudhakara st
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
Venkata Naga Ravi
 
Spark: A Unified Engine for Big Data Processing
Spark: A Unified Engine for Big Data ProcessingSpark: A Unified Engine for Big Data Processing
Spark: A Unified Engine for Big Data Processing
ChadrequeCruzManuela
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
Anyscale
 
Apache Spark - A High Level overview
Apache Spark - A High Level overviewApache Spark - A High Level overview
Apache Spark - A High Level overview
Karan Alang
 
Apache spark
Apache sparkApache spark
Apache spark
Ramakrishna kapa
 
Apachespark 160612140708
Apachespark 160612140708Apachespark 160612140708
Apachespark 160612140708
Srikrishna k
 
Apache Spark for Beginners
Apache Spark for BeginnersApache Spark for Beginners
Apache Spark for Beginners
Anirudh
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache Spark
C4Media
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
Databricks
 
Big data processing with Apache Spark and Oracle Database
Big data processing with Apache Spark and Oracle DatabaseBig data processing with Apache Spark and Oracle Database
Big data processing with Apache Spark and Oracle Database
Martin Toshev
 
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsfPyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
sasuke20y4sh
 
Spark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingSpark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, Streaming
Petr Zapletal
 
Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & librariesApache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & libraries
Walaa Hamdy Assy
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with Databricks
Anyscale
 
Big_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_SessionBig_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_Session
RUHULAMINHAZARIKA
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Databricks
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
sudhakara st
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
Venkata Naga Ravi
 
Spark: A Unified Engine for Big Data Processing
Spark: A Unified Engine for Big Data ProcessingSpark: A Unified Engine for Big Data Processing
Spark: A Unified Engine for Big Data Processing
ChadrequeCruzManuela
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
Anyscale
 
Apache Spark - A High Level overview
Apache Spark - A High Level overviewApache Spark - A High Level overview
Apache Spark - A High Level overview
Karan Alang
 
Apachespark 160612140708
Apachespark 160612140708Apachespark 160612140708
Apachespark 160612140708
Srikrishna k
 
Apache Spark for Beginners
Apache Spark for BeginnersApache Spark for Beginners
Apache Spark for Beginners
Anirudh
 
Ad

More from Simplilearn (20)

Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Simplilearn
 
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Simplilearn
 
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Simplilearn
 
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
Simplilearn
 
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
Simplilearn
 
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
Simplilearn
 
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Simplilearn
 
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
Simplilearn
 
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
Simplilearn
 
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Simplilearn
 
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Simplilearn
 
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Simplilearn
 
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
Simplilearn
 
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Simplilearn
 
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Simplilearn
 
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Simplilearn
 
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Simplilearn
 
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Simplilearn
 
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Simplilearn
 
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
Simplilearn
 
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
Simplilearn
 
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
Simplilearn
 
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Simplilearn
 
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
Simplilearn
 
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
Simplilearn
 
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Simplilearn
 
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Simplilearn
 
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Simplilearn
 
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
Simplilearn
 
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Simplilearn
 
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Simplilearn
 
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Simplilearn
 
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Ad

Recently uploaded (20)

Strengthened Senior High School - Landas Tool Kit.pptx
Strengthened Senior High School - Landas Tool Kit.pptxStrengthened Senior High School - Landas Tool Kit.pptx
Strengthened Senior High School - Landas Tool Kit.pptx
SteffMusniQuiballo
 
Black and White Illustrative Group Project Presentation.pdf (1).pdf
Black and White Illustrative Group Project Presentation.pdf (1).pdfBlack and White Illustrative Group Project Presentation.pdf (1).pdf
Black and White Illustrative Group Project Presentation.pdf (1).pdf
AnnasofiaUrsini
 
Trends Spotting Strategic foresight for tomorrow’s education systems - Debora...
Trends Spotting Strategic foresight for tomorrow’s education systems - Debora...Trends Spotting Strategic foresight for tomorrow’s education systems - Debora...
Trends Spotting Strategic foresight for tomorrow’s education systems - Debora...
EduSkills OECD
 
Ray Dalio How Countries go Broke the Big Cycle
Ray Dalio How Countries go Broke the Big CycleRay Dalio How Countries go Broke the Big Cycle
Ray Dalio How Countries go Broke the Big Cycle
Dadang Solihin
 
How to Configure Vendor Management in Lunch App of Odoo 18
How to Configure Vendor Management in Lunch App of Odoo 18How to Configure Vendor Management in Lunch App of Odoo 18
How to Configure Vendor Management in Lunch App of Odoo 18
Celine George
 
How to Manage & Create a New Department in Odoo 18 Employee
How to Manage & Create a New Department in Odoo 18 EmployeeHow to Manage & Create a New Department in Odoo 18 Employee
How to Manage & Create a New Department in Odoo 18 Employee
Celine George
 
THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...
THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...
THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...
parmarjuli1412
 
Module 4 Presentation - Enhancing Competencies and Engagement Strategies in Y...
Module 4 Presentation - Enhancing Competencies and Engagement Strategies in Y...Module 4 Presentation - Enhancing Competencies and Engagement Strategies in Y...
Module 4 Presentation - Enhancing Competencies and Engagement Strategies in Y...
GeorgeDiamandis11
 
How to Manage Upselling of Subscriptions in Odoo 18
How to Manage Upselling of Subscriptions in Odoo 18How to Manage Upselling of Subscriptions in Odoo 18
How to Manage Upselling of Subscriptions in Odoo 18
Celine George
 
Final Sketch Designs for poster production.pptx
Final Sketch Designs for poster production.pptxFinal Sketch Designs for poster production.pptx
Final Sketch Designs for poster production.pptx
bobby205207
 
Unit 3 Poster Sketches with annotations.pptx
Unit 3 Poster Sketches with annotations.pptxUnit 3 Poster Sketches with annotations.pptx
Unit 3 Poster Sketches with annotations.pptx
bobby205207
 
FEBA Sofia Univercity final diplian v3 GSDG 5.2025.pdf
FEBA Sofia Univercity final diplian v3 GSDG 5.2025.pdfFEBA Sofia Univercity final diplian v3 GSDG 5.2025.pdf
FEBA Sofia Univercity final diplian v3 GSDG 5.2025.pdf
ChristinaFortunova
 
Analysis of Quantitative Data Parametric and non-parametric tests.pptx
Analysis of Quantitative Data Parametric and non-parametric tests.pptxAnalysis of Quantitative Data Parametric and non-parametric tests.pptx
Analysis of Quantitative Data Parametric and non-parametric tests.pptx
Shrutidhara2
 
Exploring Ocean Floor Features for Middle School
Exploring Ocean Floor Features for Middle SchoolExploring Ocean Floor Features for Middle School
Exploring Ocean Floor Features for Middle School
Marie
 
Pfeiffer "Secrets to Changing Behavior in Scholarly Communication: A 2025 NIS...
Pfeiffer "Secrets to Changing Behavior in Scholarly Communication: A 2025 NIS...Pfeiffer "Secrets to Changing Behavior in Scholarly Communication: A 2025 NIS...
Pfeiffer "Secrets to Changing Behavior in Scholarly Communication: A 2025 NIS...
National Information Standards Organization (NISO)
 
Parenting Teens: Supporting Trust, resilience and independence
Parenting Teens: Supporting Trust, resilience and independenceParenting Teens: Supporting Trust, resilience and independence
Parenting Teens: Supporting Trust, resilience and independence
Pooky Knightsmith
 
MATERI PPT TOPIK 1 LANDASAN FILOSOFIS PENDIDIKAN
MATERI PPT TOPIK 1 LANDASAN FILOSOFIS PENDIDIKANMATERI PPT TOPIK 1 LANDASAN FILOSOFIS PENDIDIKAN
MATERI PPT TOPIK 1 LANDASAN FILOSOFIS PENDIDIKAN
aditya23173
 
Capitol Doctoral Presentation -June 2025.pptx
Capitol Doctoral Presentation -June 2025.pptxCapitol Doctoral Presentation -June 2025.pptx
Capitol Doctoral Presentation -June 2025.pptx
CapitolTechU
 
IDF 30min presentation - December 2, 2024.pptx
IDF 30min presentation - December 2, 2024.pptxIDF 30min presentation - December 2, 2024.pptx
IDF 30min presentation - December 2, 2024.pptx
ArneeAgligar
 
LDMMIA Free Reiki Yoga S9 Grad Level Intuition II
LDMMIA Free Reiki Yoga S9 Grad Level Intuition IILDMMIA Free Reiki Yoga S9 Grad Level Intuition II
LDMMIA Free Reiki Yoga S9 Grad Level Intuition II
LDM & Mia eStudios
 
Strengthened Senior High School - Landas Tool Kit.pptx
Strengthened Senior High School - Landas Tool Kit.pptxStrengthened Senior High School - Landas Tool Kit.pptx
Strengthened Senior High School - Landas Tool Kit.pptx
SteffMusniQuiballo
 
Black and White Illustrative Group Project Presentation.pdf (1).pdf
Black and White Illustrative Group Project Presentation.pdf (1).pdfBlack and White Illustrative Group Project Presentation.pdf (1).pdf
Black and White Illustrative Group Project Presentation.pdf (1).pdf
AnnasofiaUrsini
 
Trends Spotting Strategic foresight for tomorrow’s education systems - Debora...
Trends Spotting Strategic foresight for tomorrow’s education systems - Debora...Trends Spotting Strategic foresight for tomorrow’s education systems - Debora...
Trends Spotting Strategic foresight for tomorrow’s education systems - Debora...
EduSkills OECD
 
Ray Dalio How Countries go Broke the Big Cycle
Ray Dalio How Countries go Broke the Big CycleRay Dalio How Countries go Broke the Big Cycle
Ray Dalio How Countries go Broke the Big Cycle
Dadang Solihin
 
How to Configure Vendor Management in Lunch App of Odoo 18
How to Configure Vendor Management in Lunch App of Odoo 18How to Configure Vendor Management in Lunch App of Odoo 18
How to Configure Vendor Management in Lunch App of Odoo 18
Celine George
 
How to Manage & Create a New Department in Odoo 18 Employee
How to Manage & Create a New Department in Odoo 18 EmployeeHow to Manage & Create a New Department in Odoo 18 Employee
How to Manage & Create a New Department in Odoo 18 Employee
Celine George
 
THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...
THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...
THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...
parmarjuli1412
 
Module 4 Presentation - Enhancing Competencies and Engagement Strategies in Y...
Module 4 Presentation - Enhancing Competencies and Engagement Strategies in Y...Module 4 Presentation - Enhancing Competencies and Engagement Strategies in Y...
Module 4 Presentation - Enhancing Competencies and Engagement Strategies in Y...
GeorgeDiamandis11
 
How to Manage Upselling of Subscriptions in Odoo 18
How to Manage Upselling of Subscriptions in Odoo 18How to Manage Upselling of Subscriptions in Odoo 18
How to Manage Upselling of Subscriptions in Odoo 18
Celine George
 
Final Sketch Designs for poster production.pptx
Final Sketch Designs for poster production.pptxFinal Sketch Designs for poster production.pptx
Final Sketch Designs for poster production.pptx
bobby205207
 
Unit 3 Poster Sketches with annotations.pptx
Unit 3 Poster Sketches with annotations.pptxUnit 3 Poster Sketches with annotations.pptx
Unit 3 Poster Sketches with annotations.pptx
bobby205207
 
FEBA Sofia Univercity final diplian v3 GSDG 5.2025.pdf
FEBA Sofia Univercity final diplian v3 GSDG 5.2025.pdfFEBA Sofia Univercity final diplian v3 GSDG 5.2025.pdf
FEBA Sofia Univercity final diplian v3 GSDG 5.2025.pdf
ChristinaFortunova
 
Analysis of Quantitative Data Parametric and non-parametric tests.pptx
Analysis of Quantitative Data Parametric and non-parametric tests.pptxAnalysis of Quantitative Data Parametric and non-parametric tests.pptx
Analysis of Quantitative Data Parametric and non-parametric tests.pptx
Shrutidhara2
 
Exploring Ocean Floor Features for Middle School
Exploring Ocean Floor Features for Middle SchoolExploring Ocean Floor Features for Middle School
Exploring Ocean Floor Features for Middle School
Marie
 
Parenting Teens: Supporting Trust, resilience and independence
Parenting Teens: Supporting Trust, resilience and independenceParenting Teens: Supporting Trust, resilience and independence
Parenting Teens: Supporting Trust, resilience and independence
Pooky Knightsmith
 
MATERI PPT TOPIK 1 LANDASAN FILOSOFIS PENDIDIKAN
MATERI PPT TOPIK 1 LANDASAN FILOSOFIS PENDIDIKANMATERI PPT TOPIK 1 LANDASAN FILOSOFIS PENDIDIKAN
MATERI PPT TOPIK 1 LANDASAN FILOSOFIS PENDIDIKAN
aditya23173
 
Capitol Doctoral Presentation -June 2025.pptx
Capitol Doctoral Presentation -June 2025.pptxCapitol Doctoral Presentation -June 2025.pptx
Capitol Doctoral Presentation -June 2025.pptx
CapitolTechU
 
IDF 30min presentation - December 2, 2024.pptx
IDF 30min presentation - December 2, 2024.pptxIDF 30min presentation - December 2, 2024.pptx
IDF 30min presentation - December 2, 2024.pptx
ArneeAgligar
 
LDMMIA Free Reiki Yoga S9 Grad Level Intuition II
LDMMIA Free Reiki Yoga S9 Grad Level Intuition IILDMMIA Free Reiki Yoga S9 Grad Level Intuition II
LDMMIA Free Reiki Yoga S9 Grad Level Intuition II
LDM & Mia eStudios
 

Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spark Tutorial |Simplilearn

  • 2. 1. What is Spark? 2. Components of Spark Spark Core Spark SQL Spark Streaming Spark MLlib GraphX 3. Apache Spark Architecture 4. Running a Spark Application What’s in it for you?
  • 3. What is Apache Spark? Apache Spark is a top-level open-source cluster computing framework used for real-time processing and analysis of a large amount of data
  • 4. What is Apache Spark? Apache Spark is a top-level open-source cluster computing framework used for real-time processing and analysis of a large amount of data Fast processing Spark processes data faster since it saves time in reading and writing operations
  • 5. What is Apache Spark? Apache Spark is a top-level open-source cluster computing framework used for real-time processing and analysis of a large amount of data Fast processing Real-time streaming Spark processes data faster since it saves time in reading and writing operations Spark allows real-time streaming and processing of data
  • 6. What is Apache Spark? Apache Spark is a top-level open-source cluster computing framework used for real-time processing and analysis of a large amount of data Fast processing Real-time streaming In-memory computation Spark processes data faster since it saves time in reading and writing operations Spark allows real-time streaming and processing of data Spark has DAG execution engine that provides in-memory computation
  • 7. What is Apache Spark? Apache Spark is a top-level open-source cluster computing framework used for real-time processing and analysis of a large amount of data Fast processing Real-time streaming In-memory computation Fault tolerant Spark processes data faster since it saves time in reading and writing operations Spark allows real-time streaming and processing of data Spark has DAG execution engine that provides in-memory computation Spark is fault tolerant through RDDs which are designed to handle the failure of any worker node in the cluster
  • 10. Spark Core Spark SQL SQL Apache Spark Components
  • 11. Spark Streaming Spark Core Spark SQL SQL Streaming Apache Spark Components
  • 12. MLlib Spark Streaming Spark Core Spark SQL SQL Streaming MLlib Apache Spark Components
  • 13. MLlib Spark Streaming Spark Core Spark SQL GraphX SQL Streaming MLlib Apache Spark Components
  • 14. Spark Core Spark is the core engine for large-scale parallel and distributed data processing
  • 15. Spark Core Spark is the core engine for large-scale parallel and distributed data processing Memory management and fault recovery Scheduling, distributing and monitoring jobs on a cluster Interacting with storage system Performs the following:
  • 16. Spark RDD Resilient Distributed Datasets (RDDs) are the building blocks of any Spark application Create RDD Transformations RDD Actions Results Transformations are Operations (such as map, filter, join, union) that are performed on an RDD that yields a new RDD containing the result Actions are operations (such as reduce, first, count) that return a value after running a computation on an RDD
  • 17. Spark SQL Spark SQL is Apache Spark’s module for working with structured data SQL
  • 18. Spark SQL Spark SQL is Apache Spark’s module for working with structured data SQL Integrated You can integrate Spark SQL with Spark programs and query structured data inside Spark programs Spark SQL features
  • 19. Spark SQL Spark SQL is Apache Spark’s module for working with structured data SQL Integrated High Compatibility You can integrate Spark SQL with Spark programs and query structured data inside Spark programs You can run unmodified Hive queries on existing warehouses in Spark SQL. With existing Hive data, queries and UDFs, Spark SQL offers full compatibility Spark SQL features
  • 20. Spark SQL Spark SQL is Apache Spark’s module for working with structured data SQL Integrated High Compatibility Scalability You can integrate Spark SQL with Spark programs and query structured data inside Spark programs You can run unmodified Hive queries on existing warehouses in Spark SQL. With existing Hive data, queries and UDFs, Spark SQL offers full compatibility Spark SQL leverages RDD model as it supports large jobs and mid- query fault tolerance. Moreover, for both interactive and long queries, it uses the same engine Spark SQL features
  • 21. Spark SQL Spark SQL is Apache Spark’s module for working with structured data SQL Integrated Spark SQL features High Compatibility Scalability Standard Connectivity You can integrate Spark SQL with Spark programs and query structured data inside Spark programs You can run unmodified Hive queries on existing warehouses in Spark SQL. With existing Hive data, queries and UDFs, Spark SQL offers full compatibility Spark SQL leverages RDD model as it supports large jobs and mid- query fault tolerance. Moreover, for both interactive and long queries, it uses the same engine You can easily connect Spark SQL with JDBC or ODBC. For connectivity for business intelligence tools, both turned as industry norms
  • 22. Spark SQL Spark SQL is Apache Spark’s module for working with structured data DataFrame DSLSpark SQL and HQL DataFrame API Data Source API CSV JSON JDBC SQL Architecture SQL
  • 23. Spark SQL Spark SQL has three main layers Spark SQL is Apache Spark’s module for working with structured data Language API SchemaRDD Data Sources Spark is compatible and even supported by the languages like Python, HiveQL, Scala, and Java As Spark SQL works on schema, tables, and records, you can use SchemaRDD or data frame as a temporary table Data sources for Spark SQL are different like JSON document, HIVE tables, and Cassandra database SQL
  • 24. Spark SQL Spark allows you to define custom SQL functions called User Defined Functions (UDFs) SQL def lowerRemoveAllWhiteSpaces(s: String): String = { s.tolowerCase().replace(“S”, ‘’”) } val lowerRemoveAllWhiteSpacesUDF = udf[String, String] (lowerRemoveAllWhiteSpaces) val sourceDF = spark.createDF( List( (“ WELCOME “) (“ SpaRk SqL “) ), List( (“text”, StringType, true) ) ) sourceDF.select( lowerRemoveAllWhiteSpacesUDF(col(“text”)).as(“clean_text”) ).show() UDF that removes all the whitespace and lowercases all the characters in a string clean_text welcome sparksql Output
  • 25. Spark Streaming Spark Streaming an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams Streaming
  • 26. Spark Streaming Spark Streaming an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams Data can be ingested from many sources and the processed data can be pushed out to different filesystems Streaming
  • 27. Spark Streaming Spark Streaming an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams Data can be ingested from many sources and the processed data can be pushed out to different filesystems Streaming Streaming data sources Static data sources
  • 28. Spark Streaming Spark Streaming an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams Data can be ingested from many sources and the processed data can be pushed out to different filesystems Streaming Streaming Streaming data sources Static data sources
  • 29. Spark Streaming Spark Streaming an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams Data can be ingested from many sources and the processed data can be pushed out to different filesystems Streaming Streaming Streaming data sources Static data sources Data storage
  • 30. Spark Streaming Spark Streaming an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams Spark Streaming receives live input data streams and divides the data into batches, which are then processed by the Spark engine to generate the final stream of results in batches Streaming Engine Input data stream Batches of input data Batches of processed data Streaming
  • 31. Spark Streaming Spark Streaming an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams Streaming Here is an example of a basic RDD operation to extract individual words from lines of text in an input data stream Lines From Time 0 and 1 Lines From Time 1 and 2 Lines From Time 2 and 3 Lines From Time 3 and 4 Words From Time 0 and 1 Words From Time 1 and 2 Words From Time 2 and 3 Words From Time 3 and 4 Lines DStream Words DStream flatMap Operation
  • 32. Spark MLlib MLlib is Spark’s machine learning library. Its goal is to make practical machine learning scalable and easy MLlib
  • 33. MLlib is Spark’s machine learning library. Its goal is to make practical machine learning scalable and easy MLlib At a high level, it provides the following: ML Algorithms: classification, regression, clustering, and collaborative filtering Spark MLlib
  • 34. MLlib is Spark’s machine learning library. Its goal is to make practical machine learning scalable and easy MLlib At a high level, it provides the following: ML Algorithms: classification, regression, clustering, and collaborative filtering Featurization: feature extraction, transformation, dimensionality reduction, and selection Spark MLlib
  • 35. MLlib is Spark’s machine learning library. Its goal is to make practical machine learning scalable and easy MLlib At a high level, it provides the following: ML Algorithms: classification, regression, clustering, and collaborative filtering Featurization: feature extraction, transformation, dimensionality reduction, and selection Pipelines: tools for constructing, evaluating, and tuning ML pipelines Spark MLlib
  • 36. MLlib is Spark’s machine learning library. Its goal is to make practical machine learning scalable and easy MLlib At a high level, it provides the following: ML Algorithms: classification, regression, clustering, and collaborative filtering Featurization: feature extraction, transformation, dimensionality reduction, and selection Pipelines: tools for constructing, evaluating, and tuning ML pipelines Utilities: linear algebra, statistics, data handling Spark MLlib
  • 37. GraphX GraphX is a component in Spark for graphs and graph-parallel computation GraphX is used to model relations between objects. A graph has vertices (objects) and edges (relationships). Mathew Justin Edge Vertex Relationship: Friends
  • 38. GraphX GraphX is a component in Spark for graphs and graph-parallel computation Provides a uniform tool for ETL Exploratory data analysis Interactive graph computations
  • 39. GraphX is a component in Spark for graphs and graph-parallel computation Page Rank Fraud Detection Geographic information system Disaster management Following are the applications of GraphX GraphX
  • 41. Spark Architecture Spark Architecture is based on 2 important abstractions
  • 42. Spark Architecture Spark Architecture is based on 2 important abstractions Resilient Distributed Dataset (RDD) RDD’s are the fundamental units of data in Apache Spark that are split into partitions and can be executed on different nodes of a cluster Cluster RDD
  • 43. Spark Architecture Spark Architecture is based on 2 important abstractions Resilient Distributed Dataset (RDD) Directed Acyclic Graph (DAG) RDD’s are the fundamental units of data in Apache Spark that are split into partitions and can be executed on different nodes of a cluster Cluster DAG is the scheduling layer of the Spark Architecture that implements stage-oriented scheduling and eliminates the Hadoop MapReduce multistage execution model RDD Stage 1 Parallelize Filter Map Stage 2 reduceByKey Map
  • 44. Spark Architecture Master Node Driver Program SparkContext • Master Node has a Driver Program • The Spark code behaves as a driver program and creates a SparkContext which is a gateway to all the Spark functionalities Apache Spark uses a master-slave architecture that consists of a driver, that runs on a master node, and multiple executors which run across the worker nodes in the cluster
  • 45. Spark Architecture Cluster Manager • Spark applications run as independent sets of processes on a cluster • The driver program & Spark context takes care of the job execution within the cluster Master Node Driver Program SparkContext
  • 46. Spark Architecture Cache Task Task Executor Worker Node Cache Task Task Executor Worker Node • A job is split into multiple tasks that are distributed over the worker node • When an RDD is created in Spark context, it can be distributed across various nodes • Worker nodes are slaves that execute different tasks Cluster Manager Master Node Driver Program SparkContext
  • 47. Spark Architecture Cache Task Task Executor Worker Node Cache Task Task Executor Worker Node • Executor is responsible for the execution of these tasks • Worker nodes execute the tasks assigned by the Cluster Manager and returns the resultback to the Spark Context Master Node Driver Program SparkContext Cluster Manager
  • 48. Spark Architecture Cache Task Task Executor Worker Node Cache Task Task Executor Worker Node • Worker nodes execute the tasks assigned by the Cluster Manager and returns it back to the Spark Context • Executor is responsible for the execution of these tasks Master Node Driver Program SparkContext Cluster Manager
  • 50. Spark Session Driver Program Application How a Spark application runs on a cluster? Spark applications run as independent processes, coordinated by the SparkSession object in the driver program
  • 51. Spark Session Driver Program Application Resource Manager/ Cluster Manager How a Spark application runs on a cluster? The resource or cluster manager assigns tasks to workers, one task per partition
  • 52. Spark Session Driver Program Application Worker Node Executor Task Task Cache Partition Partition Disk Data Data How a Spark application runs on a cluster? Resource Manager/ Cluster Manager • A task applies its unit of work to the dataset in its partition and outputs a new partition dataset • Because iterative algorithms apply operations repeatedly to data, they benefit from caching datasets across iterations
  • 53. How a Spark application runs on a cluster? Spark Session Driver Program Application Executor Task Task Cache Partition Partition Disk Data Data Resource Manager/ Cluster Manager Results are sent back to the driver application or can be saved to disk Worker Node