SlideShare a Scribd company logo
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginners | Simplilearn
What is Spark SQL?
Spark SQL Features
Spark SQL Architecture
Spark SQL – DataFrame API
Spark SQL – Data Source
API
Spark SQL – Catalyst
Optimizer
Running SQL Queries
Spark SQL Demo
What’s in it for you?
SQL
What is Spark SQL?
SQL
Spark SQL is Apache Spark’s module for working with structured and
semi-structured data
Click here to watch the video
SQL
Spark SQL is Apache Spark’s module for working with structured and
semi-structured data
It originated to overcome the
limitations of Apache Hive
What is Spark SQL?
SQL
Spark SQL is Apache Spark’s module for working with structured and
semi-structured data
It originated to overcome the
limitations of Apache Hive
Hive lags in performance as it uses MapReduce
jobs for executing ad-hoc queries
Hive does not allow you to resume a job
processing if it fails in the middle
Limitations
What is Spark SQL?
SQL
Spark performs better than Hive in most scenarios
Source: https://engineering.fb.com/
Hive ~ Spark
Spark SQL
Features
SQL
Integrated
High
Compatibility
You can integrate Spark SQL
and query structured data
inside Spark programs
You can run unmodified Hive queries
on existing warehouses in Spark
SQL. With existing Hive data, queries
and UDFs, Spark SQL offers full
compatibility
Below are some essential features of Spark SQL that makes it a compelling
framework for data processing and analyzing
Spark SQL Features
Spark
SQL
Spark
programs
SQLQueries
SQL
Scalability
Standard
Connectivity
Spark SQL leverages RDD model as
it supports large jobs and mid-
query fault tolerance. For interactive
and long queries, it uses the same
engine
You can easily connect Spark
SQL with JDBC or ODBC. For
connectivity for business
intelligence tools, both turned as
industry norms
Spark SQL Features
SQL
SQL
RDD
Below are some essential features of Spark SQL that makes it a compelling
framework for data processing and analyzing
Spark SQL
Architecture
SQL
DataFrame DSLDataframe DSL
DataFrame API
Data Source API
CSV JSON JDBC
DataFrame DSLSpark SQL and HQL
Spark SQL Architecture
Spark SQL has three main layers
Spark SQL is Apache Spark’s module for working with structured data
Language API SchemaRDD Data Sources
Spark is very compatible as it
supports languages like Python,
HiveQL, Scala, and Java
As Spark SQL works on schema,
tables, and records, you can use
SchemaRDD or DataFrame as a
temporary table
SQL
Spark SQL supports multiple
data sources like JSON,
Cassandra database, Hive
tables
Spark SQL Architecture
Spark SQL –
DataFrame API
A DataFrame is a domain-specific language (DSL) for working
with structured and semi-structured data, i.e., datasets with a schema
Spark SQL – Data Frame API
DataFrame API in Spark was
designed taking inspiration from
DataFrame in R programming and
Pandas in Python
Spark SQL – Data Frame API
A DataFrame is a domain-specific language (DSL) for working
with structured and semi-structured data, i.e., datasets with a schema
Has can process the data in the size of Kilobytes to Petabytes
on a single node cluster
Can be easily integrated with all Big Data tools and frameworks
via Spark-Core
Provides API for Python, Java, Scala, and R Programming
DataFrame features
Spark SQL – Data Frame API
DataFrame API in Spark was
designed taking inspiration from
DataFrame in R programming and
Pandas in Python
A DataFrame is a domain-specific language (DSL) for working
with structured and semi-structured data, i.e., datasets with a schema
Spark SQL –
Data Source API
Spark SQL supports operating on a variety of data sources through the
DataFrame interface
Spark SQL – Data Source API
Spark SQL supports operating on a variety of data sources through the
DataFrame interface
It supports different files such as
CSV, Hive, Avro, JSON, Parquet
Spark SQL – Data Source API
It supports different files such as
CSV, Hive, Avro, JSON, Parquet
It is lazily evaluated like Apache
Spark Transformations and can
be accessed through SQL
Context and Hive Context
ContextSQL
Spark SQL – Data Source API
Spark SQL supports operating on a variety of data sources through the
DataFrame interface
It can be easily integrated with all
Big Data tools and frameworks
via Spark-Core
ContextSQL
Spark SQL – Data Source API
It supports different files such as
CSV, Hive, Avro, JSON, Parquet
It is lazily evaluated like Apache
Spark Transformations and can
be accessed through SQL
Context and Hive Context
Spark SQL supports operating on a variety of data sources through the
DataFrame interface
Spark SQL –
Catalyst
Optimizer
Catalyst optimizer leverages advanced programming language features
(such as Scala’s pattern matching and quasi quotes) in a novel way to build
an extensible query optimizer
Spark SQL – Catalyst Optimizer
It works in 4 phases:
1 Analyzing a logical plan to
resolve references
2 Logical plan optimization
3 Physical planning 4
Code generation to compile parts of
the query to Java bytecode
Spark SQL – Catalyst Optimizer
Catalyst optimizer leverages advanced programming language features
(such as Scala’s pattern matching and quasi quotes) in a novel way to build
an extensible query optimizer
SQL
Query
SQL
Query
Spark SQL – Catalyst Optimizer
Catalyst optimizer leverages advanced programming language features
(such as Scala’s pattern matching and quasi quotes) in a novel way to build
an extensible query optimizer
SQL
Query
SQL
Query
Unresolved
Logical plan
Spark SQL – Catalyst Optimizer
Catalyst optimizer leverages advanced programming language features
(such as Scala’s pattern matching and quasi quotes) in a novel way to build
an extensible query optimizer
SQL
Query
SQL
Query
Unresolved
Logical plan
Logical plan
Catalog
Analysis
Spark SQL – Catalyst Optimizer
Catalyst optimizer leverages advanced programming language features
(such as Scala’s pattern matching and quasi quotes) in a novel way to build
an extensible query optimizer
SQL
Query
SQL
Query
Unresolved
Logical plan
Logical plan
Optimized
Logical plan
Catalog
Analysis
Logical
Optimization
Spark SQL – Catalyst Optimizer
Catalyst optimizer leverages advanced programming language features
(such as Scala’s pattern matching and quasi quotes) in a novel way to build
an extensible query optimizer
SQL
Query
SQL
Query
Unresolved
Logical plan
Logical plan
Optimized
Logical plan
Physical
plans
Catalog
Analysis
Logical
Optimization
Physical
Planning
Spark SQL – Catalyst Optimizer
Catalyst optimizer leverages advanced programming language features
(such as Scala’s pattern matching and quasi quotes) in a novel way to build
an extensible query optimizer
SQL
Query
SQL
Query
Unresolved
Logical plan
Logical plan
Optimized
Logical plan
Physical
plans
Cost Model
Catalog
Analysis
Logical
Optimization
Physical
Planning
Spark SQL – Catalyst Optimizer
Catalyst optimizer leverages advanced programming language features
(such as Scala’s pattern matching and quasi quotes) in a novel way to build
an extensible query optimizer
SQL
Query
SQL
Query
Unresolved
Logical plan
Logical plan
Optimized
Logical plan
Physical
plans
Cost Model
Selected
Physical
Plan
Catalog
Analysis
Logical
Optimization
Physical
Planning
Spark SQL – Catalyst Optimizer
Catalyst optimizer leverages advanced programming language features
(such as Scala’s pattern matching and quasi quotes) in a novel way to build
an extensible query optimizer
SQL
Query
SQL
Query
Unresolved
Logical plan
Logical plan
Optimized
Logical plan
Physical
plans
Cost Model
Selected
Physical
Plan
RDDs
Catalog
Analysis
Logical
Optimization
Physical
Planning
Code Generation
Spark SQL – Catalyst Optimizer
Catalyst optimizer leverages advanced programming language features
(such as Scala’s pattern matching and quasi quotes) in a novel way to build
an extensible query optimizer
Spark
SQLContext
Spark SQLContext
SQLContext is a class used for initializing the functionalities of Spark SQL
SparkContext class object (sc) is required for initializing SQLContext class object
The following command initializes
SparkContext through spark-shell
$ spark-shell
Spark SQLContext
SQLContext is a class used for initializing the functionalities of Spark SQL
The following command creates a SQLContext
scala> val sqlcontext = new
org.apache.sql.SQLContext(sc)
Spark SQLContext
SparkContext class object (sc) is required for initializing SQLContext class object
SQLContext is a class used for initializing the functionalities of Spark SQL
The following command initializes
SparkContext through spark-shell
$ spark-shell
SparkSession
It is the entry point to any functionality in Spark. To create a basic
SparkSession, use SparkSession.builder()
Source: https://spark.apache.org/
SparkSession
Applications can create DataFrames with the help of an existing RDD using a
Hive table, or from Spark data sources
The following creates a DataFrame based on the content of a JSON file:
https://spark.apache.org/Source:
Creating DataFrames
DataFrame
Operations
Structured data can be manipulated using domain-specific language provided
by DataFrames
https://spark.apache.org/Source:
DataFrame Operations
Below are some examples of structured data processing:
https://spark.apache.org/Source:
DataFrame Operations
Structured data can be manipulated using domain-specific language provided
by DataFrames
Below are some examples of structured data processing:
https://spark.apache.org/Source:
DataFrame Operations
Structured data can be manipulated using domain-specific language provided
by DataFrames
Below are some examples of structured data processing:
Running SQL
Queries
The sql function on a SparkSession allows applications to run SQL queries
programmatically and returns the result in the form of a DataFrame
https://spark.apache.org/Source:
Running SQL Queries
Demo on Spark
SQL
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginners | Simplilearn

More Related Content

What's hot (20)

Spark SQL
Spark SQLSpark SQL
Spark SQL
Joud Khattab
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta Lakehouse
Databricks
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
Anton Kirillov
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySpark
Russell Jurney
 
Introduction to ML with Apache Spark MLlib
Introduction to ML with Apache Spark MLlibIntroduction to ML with Apache Spark MLlib
Introduction to ML with Apache Spark MLlib
Taras Matyashovsky
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Rahul Jain
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
DataArt
 
Beyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesBeyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFrames
Databricks
 
Introduction to Spark with Python
Introduction to Spark with PythonIntroduction to Spark with Python
Introduction to Spark with Python
Gokhan Atil
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Introduction to apache spark
Introduction to apache spark Introduction to apache spark
Introduction to apache spark
Aakashdata
 
PySpark dataframe
PySpark dataframePySpark dataframe
PySpark dataframe
Jaemun Jung
 
1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptx1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptx
BRIJESH KUMAR
 
Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0
Databricks
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in Spark
Databricks
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
BizTalk360
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
Edureka!
 
Mixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache SparkMixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache Spark
VMware Tanzu
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
Databricks
 
Intro to Azure Data Factory v1
Intro to Azure Data Factory v1Intro to Azure Data Factory v1
Intro to Azure Data Factory v1
Eric Bragas
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta Lakehouse
Databricks
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
Anton Kirillov
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySpark
Russell Jurney
 
Introduction to ML with Apache Spark MLlib
Introduction to ML with Apache Spark MLlibIntroduction to ML with Apache Spark MLlib
Introduction to ML with Apache Spark MLlib
Taras Matyashovsky
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Rahul Jain
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
DataArt
 
Beyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesBeyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFrames
Databricks
 
Introduction to Spark with Python
Introduction to Spark with PythonIntroduction to Spark with Python
Introduction to Spark with Python
Gokhan Atil
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Introduction to apache spark
Introduction to apache spark Introduction to apache spark
Introduction to apache spark
Aakashdata
 
PySpark dataframe
PySpark dataframePySpark dataframe
PySpark dataframe
Jaemun Jung
 
1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptx1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptx
BRIJESH KUMAR
 
Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0
Databricks
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in Spark
Databricks
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
BizTalk360
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
Edureka!
 
Mixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache SparkMixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache Spark
VMware Tanzu
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
Databricks
 
Intro to Azure Data Factory v1
Intro to Azure Data Factory v1Intro to Azure Data Factory v1
Intro to Azure Data Factory v1
Eric Bragas
 

Similar to Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginners | Simplilearn (20)

Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQL
datamantra
 
Spark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.comSpark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.com
Syed Hadoop
 
Spark sql
Spark sqlSpark sql
Spark sql
Zahra Eskandari
 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sql
aftab alam
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
Anyscale
 
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst OptimizerDeep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Sachin Aggarwal
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Databricks
 
Learning spark ch09 - Spark SQL
Learning spark ch09 - Spark SQLLearning spark ch09 - Spark SQL
Learning spark ch09 - Spark SQL
phanleson
 
Spark sql under the hood - Data KRK meetup
Spark sql under the hood  - Data KRK meetupSpark sql under the hood  - Data KRK meetup
Spark sql under the hood - Data KRK meetup
Mikołaj Kromka
 
Spark sql
Spark sqlSpark sql
Spark sql
Freeman Zhang
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with Databricks
Anyscale
 
Spark sql meetup
Spark sql meetupSpark sql meetup
Spark sql meetup
Michael Zhang
 
SparkPaper
SparkPaperSparkPaper
SparkPaper
Suraj Thapaliya
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
Databricks
 
Introduction to Spark SQL, query types and UDF
Introduction to Spark SQL, query types and UDFIntroduction to Spark SQL, query types and UDF
Introduction to Spark SQL, query types and UDF
sundharakumarkb2
 
A Step to programming with Apache Spark
A Step to programming with Apache SparkA Step to programming with Apache Spark
A Step to programming with Apache Spark
Knoldus Inc.
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
Dharmjit Singh
 
Getting started with SparkSQL - Desert Code Camp 2016
Getting started with SparkSQL  - Desert Code Camp 2016Getting started with SparkSQL  - Desert Code Camp 2016
Getting started with SparkSQL - Desert Code Camp 2016
clairvoyantllc
 
20140908 spark sql & catalyst
20140908 spark sql & catalyst20140908 spark sql & catalyst
20140908 spark sql & catalyst
Takuya UESHIN
 
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
Anatomy of Data Frame API :  A deep dive into Spark Data Frame APIAnatomy of Data Frame API :  A deep dive into Spark Data Frame API
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
datamantra
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQL
datamantra
 
Spark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.comSpark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.com
Syed Hadoop
 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sql
aftab alam
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
Anyscale
 
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst OptimizerDeep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Sachin Aggarwal
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Databricks
 
Learning spark ch09 - Spark SQL
Learning spark ch09 - Spark SQLLearning spark ch09 - Spark SQL
Learning spark ch09 - Spark SQL
phanleson
 
Spark sql under the hood - Data KRK meetup
Spark sql under the hood  - Data KRK meetupSpark sql under the hood  - Data KRK meetup
Spark sql under the hood - Data KRK meetup
Mikołaj Kromka
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with Databricks
Anyscale
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
Databricks
 
Introduction to Spark SQL, query types and UDF
Introduction to Spark SQL, query types and UDFIntroduction to Spark SQL, query types and UDF
Introduction to Spark SQL, query types and UDF
sundharakumarkb2
 
A Step to programming with Apache Spark
A Step to programming with Apache SparkA Step to programming with Apache Spark
A Step to programming with Apache Spark
Knoldus Inc.
 
Getting started with SparkSQL - Desert Code Camp 2016
Getting started with SparkSQL  - Desert Code Camp 2016Getting started with SparkSQL  - Desert Code Camp 2016
Getting started with SparkSQL - Desert Code Camp 2016
clairvoyantllc
 
20140908 spark sql & catalyst
20140908 spark sql & catalyst20140908 spark sql & catalyst
20140908 spark sql & catalyst
Takuya UESHIN
 
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
Anatomy of Data Frame API :  A deep dive into Spark Data Frame APIAnatomy of Data Frame API :  A deep dive into Spark Data Frame API
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
datamantra
 
Ad

More from Simplilearn (20)

Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Simplilearn
 
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Simplilearn
 
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Simplilearn
 
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
Simplilearn
 
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
Simplilearn
 
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
Simplilearn
 
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Simplilearn
 
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
Simplilearn
 
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
Simplilearn
 
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Simplilearn
 
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Simplilearn
 
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Simplilearn
 
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
Simplilearn
 
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Simplilearn
 
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Simplilearn
 
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Simplilearn
 
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Simplilearn
 
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Simplilearn
 
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Simplilearn
 
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
Simplilearn
 
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
Simplilearn
 
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
Simplilearn
 
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Simplilearn
 
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
Simplilearn
 
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
Simplilearn
 
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Simplilearn
 
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Simplilearn
 
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Simplilearn
 
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
Simplilearn
 
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Simplilearn
 
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Simplilearn
 
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Simplilearn
 
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Ad

Recently uploaded (20)

Optimization technique in pharmaceutical product development.pptx
Optimization technique in pharmaceutical product development.pptxOptimization technique in pharmaceutical product development.pptx
Optimization technique in pharmaceutical product development.pptx
UrmiPrajapati3
 
june 10 2025 ppt for madden on art science is over.pptx
june 10 2025 ppt for madden on art science is over.pptxjune 10 2025 ppt for madden on art science is over.pptx
june 10 2025 ppt for madden on art science is over.pptx
roger malina
 
THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...
THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...
THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...
parmarjuli1412
 
Unit- 4 Biostatistics & Research Methodology.pdf
Unit- 4 Biostatistics & Research Methodology.pdfUnit- 4 Biostatistics & Research Methodology.pdf
Unit- 4 Biostatistics & Research Methodology.pdf
KRUTIKA CHANNE
 
BUSINESS QUIZ PRELIMS | QUIZ CLUB OF PSGCAS | 9 SEPTEMBER 2024
BUSINESS QUIZ PRELIMS | QUIZ CLUB OF PSGCAS | 9 SEPTEMBER 2024BUSINESS QUIZ PRELIMS | QUIZ CLUB OF PSGCAS | 9 SEPTEMBER 2024
BUSINESS QUIZ PRELIMS | QUIZ CLUB OF PSGCAS | 9 SEPTEMBER 2024
Quiz Club of PSG College of Arts & Science
 
SEXUALITY , UNWANTED PREGANCY AND SEXUAL ASSAULT .pptx
SEXUALITY , UNWANTED PREGANCY AND SEXUAL ASSAULT .pptxSEXUALITY , UNWANTED PREGANCY AND SEXUAL ASSAULT .pptx
SEXUALITY , UNWANTED PREGANCY AND SEXUAL ASSAULT .pptx
PoojaSen20
 
Diptera: The Two-Winged Wonders, The Fly Squad: Order Diptera.pptx
Diptera: The Two-Winged Wonders, The Fly Squad: Order Diptera.pptxDiptera: The Two-Winged Wonders, The Fly Squad: Order Diptera.pptx
Diptera: The Two-Winged Wonders, The Fly Squad: Order Diptera.pptx
Arshad Shaikh
 
Unit 3 Poster Sketches with annotations.pptx
Unit 3 Poster Sketches with annotations.pptxUnit 3 Poster Sketches with annotations.pptx
Unit 3 Poster Sketches with annotations.pptx
bobby205207
 
What is FIle and explanation of text files.pptx
What is FIle and explanation of text files.pptxWhat is FIle and explanation of text files.pptx
What is FIle and explanation of text files.pptx
Ramakrishna Reddy Bijjam
 
Energy Balances Of Oecd Countries 2011 Iea Statistics 1st Edition Oecd
Energy Balances Of Oecd Countries 2011 Iea Statistics 1st Edition OecdEnergy Balances Of Oecd Countries 2011 Iea Statistics 1st Edition Oecd
Energy Balances Of Oecd Countries 2011 Iea Statistics 1st Edition Oecd
razelitouali
 
Pests of Rice: Damage, Identification, Life history, and Management.pptx
Pests of Rice: Damage, Identification, Life history, and Management.pptxPests of Rice: Damage, Identification, Life history, and Management.pptx
Pests of Rice: Damage, Identification, Life history, and Management.pptx
Arshad Shaikh
 
How to Create a Rainbow Man Effect in Odoo 18
How to Create a Rainbow Man Effect in Odoo 18How to Create a Rainbow Man Effect in Odoo 18
How to Create a Rainbow Man Effect in Odoo 18
Celine George
 
Exploring Ocean Floor Features for Middle School
Exploring Ocean Floor Features for Middle SchoolExploring Ocean Floor Features for Middle School
Exploring Ocean Floor Features for Middle School
Marie
 
LDMMIA Free Reiki Yoga S9 Grad Level Intuition II
LDMMIA Free Reiki Yoga S9 Grad Level Intuition IILDMMIA Free Reiki Yoga S9 Grad Level Intuition II
LDMMIA Free Reiki Yoga S9 Grad Level Intuition II
LDM & Mia eStudios
 
EUPHORIA GENERAL QUIZ FINALS | QUIZ CLUB OF PSGCAS | 21 MARCH 2025
EUPHORIA GENERAL QUIZ FINALS | QUIZ CLUB OF PSGCAS | 21 MARCH 2025EUPHORIA GENERAL QUIZ FINALS | QUIZ CLUB OF PSGCAS | 21 MARCH 2025
EUPHORIA GENERAL QUIZ FINALS | QUIZ CLUB OF PSGCAS | 21 MARCH 2025
Quiz Club of PSG College of Arts & Science
 
TV Shows and web-series quiz | QUIZ CLUB OF PSGCAS | 13TH MARCH 2025
TV Shows and web-series quiz | QUIZ CLUB OF PSGCAS | 13TH MARCH 2025TV Shows and web-series quiz | QUIZ CLUB OF PSGCAS | 13TH MARCH 2025
TV Shows and web-series quiz | QUIZ CLUB OF PSGCAS | 13TH MARCH 2025
Quiz Club of PSG College of Arts & Science
 
Capitol Doctoral Presentation -June 2025.pptx
Capitol Doctoral Presentation -June 2025.pptxCapitol Doctoral Presentation -June 2025.pptx
Capitol Doctoral Presentation -June 2025.pptx
CapitolTechU
 
MATERI PPT TOPIK 1 LANDASAN FILOSOFIS PENDIDIKAN
MATERI PPT TOPIK 1 LANDASAN FILOSOFIS PENDIDIKANMATERI PPT TOPIK 1 LANDASAN FILOSOFIS PENDIDIKAN
MATERI PPT TOPIK 1 LANDASAN FILOSOFIS PENDIDIKAN
aditya23173
 
How to Manage Upselling of Subscriptions in Odoo 18
How to Manage Upselling of Subscriptions in Odoo 18How to Manage Upselling of Subscriptions in Odoo 18
How to Manage Upselling of Subscriptions in Odoo 18
Celine George
 
Hemiptera & Neuroptera: Insect Diversity.pptx
Hemiptera & Neuroptera: Insect Diversity.pptxHemiptera & Neuroptera: Insect Diversity.pptx
Hemiptera & Neuroptera: Insect Diversity.pptx
Arshad Shaikh
 
Optimization technique in pharmaceutical product development.pptx
Optimization technique in pharmaceutical product development.pptxOptimization technique in pharmaceutical product development.pptx
Optimization technique in pharmaceutical product development.pptx
UrmiPrajapati3
 
june 10 2025 ppt for madden on art science is over.pptx
june 10 2025 ppt for madden on art science is over.pptxjune 10 2025 ppt for madden on art science is over.pptx
june 10 2025 ppt for madden on art science is over.pptx
roger malina
 
THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...
THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...
THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...
parmarjuli1412
 
Unit- 4 Biostatistics & Research Methodology.pdf
Unit- 4 Biostatistics & Research Methodology.pdfUnit- 4 Biostatistics & Research Methodology.pdf
Unit- 4 Biostatistics & Research Methodology.pdf
KRUTIKA CHANNE
 
SEXUALITY , UNWANTED PREGANCY AND SEXUAL ASSAULT .pptx
SEXUALITY , UNWANTED PREGANCY AND SEXUAL ASSAULT .pptxSEXUALITY , UNWANTED PREGANCY AND SEXUAL ASSAULT .pptx
SEXUALITY , UNWANTED PREGANCY AND SEXUAL ASSAULT .pptx
PoojaSen20
 
Diptera: The Two-Winged Wonders, The Fly Squad: Order Diptera.pptx
Diptera: The Two-Winged Wonders, The Fly Squad: Order Diptera.pptxDiptera: The Two-Winged Wonders, The Fly Squad: Order Diptera.pptx
Diptera: The Two-Winged Wonders, The Fly Squad: Order Diptera.pptx
Arshad Shaikh
 
Unit 3 Poster Sketches with annotations.pptx
Unit 3 Poster Sketches with annotations.pptxUnit 3 Poster Sketches with annotations.pptx
Unit 3 Poster Sketches with annotations.pptx
bobby205207
 
What is FIle and explanation of text files.pptx
What is FIle and explanation of text files.pptxWhat is FIle and explanation of text files.pptx
What is FIle and explanation of text files.pptx
Ramakrishna Reddy Bijjam
 
Energy Balances Of Oecd Countries 2011 Iea Statistics 1st Edition Oecd
Energy Balances Of Oecd Countries 2011 Iea Statistics 1st Edition OecdEnergy Balances Of Oecd Countries 2011 Iea Statistics 1st Edition Oecd
Energy Balances Of Oecd Countries 2011 Iea Statistics 1st Edition Oecd
razelitouali
 
Pests of Rice: Damage, Identification, Life history, and Management.pptx
Pests of Rice: Damage, Identification, Life history, and Management.pptxPests of Rice: Damage, Identification, Life history, and Management.pptx
Pests of Rice: Damage, Identification, Life history, and Management.pptx
Arshad Shaikh
 
How to Create a Rainbow Man Effect in Odoo 18
How to Create a Rainbow Man Effect in Odoo 18How to Create a Rainbow Man Effect in Odoo 18
How to Create a Rainbow Man Effect in Odoo 18
Celine George
 
Exploring Ocean Floor Features for Middle School
Exploring Ocean Floor Features for Middle SchoolExploring Ocean Floor Features for Middle School
Exploring Ocean Floor Features for Middle School
Marie
 
LDMMIA Free Reiki Yoga S9 Grad Level Intuition II
LDMMIA Free Reiki Yoga S9 Grad Level Intuition IILDMMIA Free Reiki Yoga S9 Grad Level Intuition II
LDMMIA Free Reiki Yoga S9 Grad Level Intuition II
LDM & Mia eStudios
 
Capitol Doctoral Presentation -June 2025.pptx
Capitol Doctoral Presentation -June 2025.pptxCapitol Doctoral Presentation -June 2025.pptx
Capitol Doctoral Presentation -June 2025.pptx
CapitolTechU
 
MATERI PPT TOPIK 1 LANDASAN FILOSOFIS PENDIDIKAN
MATERI PPT TOPIK 1 LANDASAN FILOSOFIS PENDIDIKANMATERI PPT TOPIK 1 LANDASAN FILOSOFIS PENDIDIKAN
MATERI PPT TOPIK 1 LANDASAN FILOSOFIS PENDIDIKAN
aditya23173
 
How to Manage Upselling of Subscriptions in Odoo 18
How to Manage Upselling of Subscriptions in Odoo 18How to Manage Upselling of Subscriptions in Odoo 18
How to Manage Upselling of Subscriptions in Odoo 18
Celine George
 
Hemiptera & Neuroptera: Insect Diversity.pptx
Hemiptera & Neuroptera: Insect Diversity.pptxHemiptera & Neuroptera: Insect Diversity.pptx
Hemiptera & Neuroptera: Insect Diversity.pptx
Arshad Shaikh
 

Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginners | Simplilearn

  • 2. What is Spark SQL? Spark SQL Features Spark SQL Architecture Spark SQL – DataFrame API Spark SQL – Data Source API Spark SQL – Catalyst Optimizer Running SQL Queries Spark SQL Demo What’s in it for you? SQL
  • 3. What is Spark SQL? SQL Spark SQL is Apache Spark’s module for working with structured and semi-structured data
  • 4. Click here to watch the video
  • 5. SQL Spark SQL is Apache Spark’s module for working with structured and semi-structured data It originated to overcome the limitations of Apache Hive What is Spark SQL?
  • 6. SQL Spark SQL is Apache Spark’s module for working with structured and semi-structured data It originated to overcome the limitations of Apache Hive Hive lags in performance as it uses MapReduce jobs for executing ad-hoc queries Hive does not allow you to resume a job processing if it fails in the middle Limitations What is Spark SQL?
  • 7. SQL Spark performs better than Hive in most scenarios Source: https://engineering.fb.com/ Hive ~ Spark
  • 9. SQL Integrated High Compatibility You can integrate Spark SQL and query structured data inside Spark programs You can run unmodified Hive queries on existing warehouses in Spark SQL. With existing Hive data, queries and UDFs, Spark SQL offers full compatibility Below are some essential features of Spark SQL that makes it a compelling framework for data processing and analyzing Spark SQL Features Spark SQL Spark programs SQLQueries
  • 10. SQL Scalability Standard Connectivity Spark SQL leverages RDD model as it supports large jobs and mid- query fault tolerance. For interactive and long queries, it uses the same engine You can easily connect Spark SQL with JDBC or ODBC. For connectivity for business intelligence tools, both turned as industry norms Spark SQL Features SQL SQL RDD Below are some essential features of Spark SQL that makes it a compelling framework for data processing and analyzing
  • 12. SQL DataFrame DSLDataframe DSL DataFrame API Data Source API CSV JSON JDBC DataFrame DSLSpark SQL and HQL Spark SQL Architecture
  • 13. Spark SQL has three main layers Spark SQL is Apache Spark’s module for working with structured data Language API SchemaRDD Data Sources Spark is very compatible as it supports languages like Python, HiveQL, Scala, and Java As Spark SQL works on schema, tables, and records, you can use SchemaRDD or DataFrame as a temporary table SQL Spark SQL supports multiple data sources like JSON, Cassandra database, Hive tables Spark SQL Architecture
  • 15. A DataFrame is a domain-specific language (DSL) for working with structured and semi-structured data, i.e., datasets with a schema Spark SQL – Data Frame API
  • 16. DataFrame API in Spark was designed taking inspiration from DataFrame in R programming and Pandas in Python Spark SQL – Data Frame API A DataFrame is a domain-specific language (DSL) for working with structured and semi-structured data, i.e., datasets with a schema
  • 17. Has can process the data in the size of Kilobytes to Petabytes on a single node cluster Can be easily integrated with all Big Data tools and frameworks via Spark-Core Provides API for Python, Java, Scala, and R Programming DataFrame features Spark SQL – Data Frame API DataFrame API in Spark was designed taking inspiration from DataFrame in R programming and Pandas in Python A DataFrame is a domain-specific language (DSL) for working with structured and semi-structured data, i.e., datasets with a schema
  • 18. Spark SQL – Data Source API
  • 19. Spark SQL supports operating on a variety of data sources through the DataFrame interface Spark SQL – Data Source API
  • 20. Spark SQL supports operating on a variety of data sources through the DataFrame interface It supports different files such as CSV, Hive, Avro, JSON, Parquet Spark SQL – Data Source API
  • 21. It supports different files such as CSV, Hive, Avro, JSON, Parquet It is lazily evaluated like Apache Spark Transformations and can be accessed through SQL Context and Hive Context ContextSQL Spark SQL – Data Source API Spark SQL supports operating on a variety of data sources through the DataFrame interface
  • 22. It can be easily integrated with all Big Data tools and frameworks via Spark-Core ContextSQL Spark SQL – Data Source API It supports different files such as CSV, Hive, Avro, JSON, Parquet It is lazily evaluated like Apache Spark Transformations and can be accessed through SQL Context and Hive Context Spark SQL supports operating on a variety of data sources through the DataFrame interface
  • 24. Catalyst optimizer leverages advanced programming language features (such as Scala’s pattern matching and quasi quotes) in a novel way to build an extensible query optimizer Spark SQL – Catalyst Optimizer
  • 25. It works in 4 phases: 1 Analyzing a logical plan to resolve references 2 Logical plan optimization 3 Physical planning 4 Code generation to compile parts of the query to Java bytecode Spark SQL – Catalyst Optimizer Catalyst optimizer leverages advanced programming language features (such as Scala’s pattern matching and quasi quotes) in a novel way to build an extensible query optimizer
  • 26. SQL Query SQL Query Spark SQL – Catalyst Optimizer Catalyst optimizer leverages advanced programming language features (such as Scala’s pattern matching and quasi quotes) in a novel way to build an extensible query optimizer
  • 27. SQL Query SQL Query Unresolved Logical plan Spark SQL – Catalyst Optimizer Catalyst optimizer leverages advanced programming language features (such as Scala’s pattern matching and quasi quotes) in a novel way to build an extensible query optimizer
  • 28. SQL Query SQL Query Unresolved Logical plan Logical plan Catalog Analysis Spark SQL – Catalyst Optimizer Catalyst optimizer leverages advanced programming language features (such as Scala’s pattern matching and quasi quotes) in a novel way to build an extensible query optimizer
  • 29. SQL Query SQL Query Unresolved Logical plan Logical plan Optimized Logical plan Catalog Analysis Logical Optimization Spark SQL – Catalyst Optimizer Catalyst optimizer leverages advanced programming language features (such as Scala’s pattern matching and quasi quotes) in a novel way to build an extensible query optimizer
  • 30. SQL Query SQL Query Unresolved Logical plan Logical plan Optimized Logical plan Physical plans Catalog Analysis Logical Optimization Physical Planning Spark SQL – Catalyst Optimizer Catalyst optimizer leverages advanced programming language features (such as Scala’s pattern matching and quasi quotes) in a novel way to build an extensible query optimizer
  • 31. SQL Query SQL Query Unresolved Logical plan Logical plan Optimized Logical plan Physical plans Cost Model Catalog Analysis Logical Optimization Physical Planning Spark SQL – Catalyst Optimizer Catalyst optimizer leverages advanced programming language features (such as Scala’s pattern matching and quasi quotes) in a novel way to build an extensible query optimizer
  • 32. SQL Query SQL Query Unresolved Logical plan Logical plan Optimized Logical plan Physical plans Cost Model Selected Physical Plan Catalog Analysis Logical Optimization Physical Planning Spark SQL – Catalyst Optimizer Catalyst optimizer leverages advanced programming language features (such as Scala’s pattern matching and quasi quotes) in a novel way to build an extensible query optimizer
  • 33. SQL Query SQL Query Unresolved Logical plan Logical plan Optimized Logical plan Physical plans Cost Model Selected Physical Plan RDDs Catalog Analysis Logical Optimization Physical Planning Code Generation Spark SQL – Catalyst Optimizer Catalyst optimizer leverages advanced programming language features (such as Scala’s pattern matching and quasi quotes) in a novel way to build an extensible query optimizer
  • 35. Spark SQLContext SQLContext is a class used for initializing the functionalities of Spark SQL
  • 36. SparkContext class object (sc) is required for initializing SQLContext class object The following command initializes SparkContext through spark-shell $ spark-shell Spark SQLContext SQLContext is a class used for initializing the functionalities of Spark SQL
  • 37. The following command creates a SQLContext scala> val sqlcontext = new org.apache.sql.SQLContext(sc) Spark SQLContext SparkContext class object (sc) is required for initializing SQLContext class object SQLContext is a class used for initializing the functionalities of Spark SQL The following command initializes SparkContext through spark-shell $ spark-shell
  • 39. It is the entry point to any functionality in Spark. To create a basic SparkSession, use SparkSession.builder() Source: https://spark.apache.org/ SparkSession
  • 40. Applications can create DataFrames with the help of an existing RDD using a Hive table, or from Spark data sources The following creates a DataFrame based on the content of a JSON file: https://spark.apache.org/Source: Creating DataFrames
  • 42. Structured data can be manipulated using domain-specific language provided by DataFrames https://spark.apache.org/Source: DataFrame Operations Below are some examples of structured data processing:
  • 43. https://spark.apache.org/Source: DataFrame Operations Structured data can be manipulated using domain-specific language provided by DataFrames Below are some examples of structured data processing:
  • 44. https://spark.apache.org/Source: DataFrame Operations Structured data can be manipulated using domain-specific language provided by DataFrames Below are some examples of structured data processing:
  • 46. The sql function on a SparkSession allows applications to run SQL queries programmatically and returns the result in the form of a DataFrame https://spark.apache.org/Source: Running SQL Queries