SlideShare a Scribd company logo
Presented By:
Kuldeepak Gupta
Software Consultant
Getting Started with
Spark Scala
Lack of etiquette and manners is a huge turn off.
KnolX Etiquettes
Punctuality
Respect Knolx session timings, you
are requested not to join sessions
after a 5 minutes threshold post
the session start time.
Feedback
Make sure to submit a constructive
feedback for all sessions as it is
very helpful for the presenter.
Mute
Please keep your window on mute
Avoid Disturbance
Avoid leaving your window
unmuted after asking a question
Agenda
What, When & Why
Introduction to Apache Spark
01
Master-slave architecture
Spark Architecture
02
Situations where spark is helpful.
Use-cases for Spark
03
Components & API in Spark eco-system
Spark Eco-System
04
Spark Scala in Action
Demonstration
05
Introduction to Spark
c
What is Spark
LEARN NOW
● A General Purpose Distributed Data Processing
Engine.
● One of the most popular big data distributed
processing framework.
● A multi-language engine for executing data
engineering, data science, and machine
learning on single-node machine.
c
Why Spark
LEARN NOW
● Supported Language (Java, Python, Scala, R)
● Support multiple languages and integrations
with other popular products.
● Offers much less reading and writing to and
from the disk.
c
When Spark
LEARN NOW
● Implements a full server- and client-side HTTP
stack on top of akka-actor and akka-stream.
● Works with Distributed data (S3, XD, HDFS),
NoSQL databases (HBase, Cassandra,
MongoDB).
● Machine Learning and Fog Computing.
Spark
Architecture
Master Slave Architecture
Well defined layered architecture, components and layers are loosely coupled.
Cluster Manager
Spark Driver
● Control the execution of
Spark Application.
● Maintains all states of
Spark Cluster.
● Interface with Cluster
Manager.
Spark Executor
● Process that perform the
tasks assigned by the
Spark driver.
● Take the tasks assigned
by the driver, run them,
and report back their
state.
● Responsible for
maintaining a cluster of
machines that will run
your Spark Application.
● Have its own “Driver” and
“Worker” abstractions.
Getting Started with Spark Scala
Use-Cases for
Apache Spark
Ideal situation to use Spark
Batch and
Streaming
Supports
both batch
and real time
processing.
Big Data in
Cloud
Easy to setup
Spark with
Data lake
technologies
Finance
Industry
Analyse the
text inside the
regulatory
filling of their
own reports.
E-Commerce
Sector
Giants like
Ebay, Alibaba
uses Spark.
Spark
Eco-System
Components in Spark Ecosystem
SparkR
06
Spark Core
01
Spark SQL
02
Spark Streaming
03
Spark MLLib
04
Spark GraphX
05
Demonstration
Thank You !

More Related Content

PPTX
xPatterns - Spark Summit 2014
PDF
Understanding and Improving Code Generation
PDF
Spark Summit EU talk by Oscar Castaneda
PDF
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
PPTX
Big data Processing with Apache Spark & Scala
PDF
Performance of Spark vs MapReduce
PPTX
Lessons learned from embedding Cassandra in xPatterns
PPTX
Dr. Elephant – Achieving Quicker, Easier, and Cost-Effective Big Data Analyti...
xPatterns - Spark Summit 2014
Understanding and Improving Code Generation
Spark Summit EU talk by Oscar Castaneda
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Big data Processing with Apache Spark & Scala
Performance of Spark vs MapReduce
Lessons learned from embedding Cassandra in xPatterns
Dr. Elephant – Achieving Quicker, Easier, and Cost-Effective Big Data Analyti...

What's hot (20)

PPTX
Spark for big data analytics
PDF
Dr. Elephant: Achieving Quicker, Easier, and Cost-Effective Big Data Analytic...
PDF
AI at Scale
PPTX
Spark, Tachyon and Mesos internals
PPTX
Bring the Spark To Your Eyes
PDF
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
PDF
Spark Summit EU talk by Yiannis Gkoufas
PDF
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
PPTX
Spark to Production @Windward
PDF
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
PDF
Rapids erase the waiting hassle by Andrada Olteanu
PDF
Spark Summit EU talk by Debasish Das and Pramod Narasimha
PDF
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
PPTX
Building an intelligent big data application in 30 minutes
PDF
Introduction to apache spark
PDF
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
PDF
Introduction to apache spark
PDF
Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira
PPTX
Spark Summit EU talk by Kaarthik Sivashanmugam
PDF
Spark Summit EU talk by Patrick Baier and Stanimir Dragiev
Spark for big data analytics
Dr. Elephant: Achieving Quicker, Easier, and Cost-Effective Big Data Analytic...
AI at Scale
Spark, Tachyon and Mesos internals
Bring the Spark To Your Eyes
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Spark Summit EU talk by Yiannis Gkoufas
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark to Production @Windward
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
Rapids erase the waiting hassle by Andrada Olteanu
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Building an intelligent big data application in 30 minutes
Introduction to apache spark
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
Introduction to apache spark
Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Patrick Baier and Stanimir Dragiev
Ad

Similar to Getting Started with Spark Scala (20)

PPTX
An Introduction to Apache Spark
PDF
Spark Concepts Cheat Sheet_Interview_Question.pdf
PDF
Apache spark-the-definitive-guide-excerpts-r1
PPTX
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
PPTX
Spark with anjbnn hfkkjn hbkjbu h jhbk.pptx
PPTX
Apache Spark Fundamentals
PPTX
Scala & Spark Online Training
PDF
Started with-apache-spark
PPTX
Learn Apache Spark: A Comprehensive Guide
PPTX
What is Spark
PPTX
Getting Started with Apache Spark (Scala)
PPTX
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
PDF
Spark Streaming
PPTX
Big Data Processing Using Spark.pptx
PPT
An Introduction to Apache spark with scala
PPTX
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
PPTX
Apache spark
PPTX
CLOUD_COMPUTING_MODULE5_RK_BIG_DATA.pptx
PDF
Apache spark
PPTX
Spark_Talha.pptx
An Introduction to Apache Spark
Spark Concepts Cheat Sheet_Interview_Question.pdf
Apache spark-the-definitive-guide-excerpts-r1
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Spark with anjbnn hfkkjn hbkjbu h jhbk.pptx
Apache Spark Fundamentals
Scala & Spark Online Training
Started with-apache-spark
Learn Apache Spark: A Comprehensive Guide
What is Spark
Getting Started with Apache Spark (Scala)
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
Spark Streaming
Big Data Processing Using Spark.pptx
An Introduction to Apache spark with scala
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
Apache spark
CLOUD_COMPUTING_MODULE5_RK_BIG_DATA.pptx
Apache spark
Spark_Talha.pptx
Ad

More from Knoldus Inc. (20)

PPTX
Angular Hydration Presentation (FrontEnd)
PPTX
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
PPTX
Self-Healing Test Automation Framework - Healenium
PPTX
Kanban Metrics Presentation (Project Management)
PPTX
Java 17 features and implementation.pptx
PPTX
Chaos Mesh Introducing Chaos in Kubernetes
PPTX
GraalVM - A Step Ahead of JVM Presentation
PPTX
Nomad by HashiCorp Presentation (DevOps)
PPTX
Nomad by HashiCorp Presentation (DevOps)
PPTX
DAPR - Distributed Application Runtime Presentation
PPTX
Introduction to Azure Virtual WAN Presentation
PPTX
Introduction to Argo Rollouts Presentation
PPTX
Intro to Azure Container App Presentation
PPTX
Insights Unveiled Test Reporting and Observability Excellence
PPTX
Introduction to Splunk Presentation (DevOps)
PPTX
Code Camp - Data Profiling and Quality Analysis Framework
PPTX
AWS: Messaging Services in AWS Presentation
PPTX
Amazon Cognito: A Primer on Authentication and Authorization
PPTX
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
PPTX
Managing State & HTTP Requests In Ionic.
Angular Hydration Presentation (FrontEnd)
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
Self-Healing Test Automation Framework - Healenium
Kanban Metrics Presentation (Project Management)
Java 17 features and implementation.pptx
Chaos Mesh Introducing Chaos in Kubernetes
GraalVM - A Step Ahead of JVM Presentation
Nomad by HashiCorp Presentation (DevOps)
Nomad by HashiCorp Presentation (DevOps)
DAPR - Distributed Application Runtime Presentation
Introduction to Azure Virtual WAN Presentation
Introduction to Argo Rollouts Presentation
Intro to Azure Container App Presentation
Insights Unveiled Test Reporting and Observability Excellence
Introduction to Splunk Presentation (DevOps)
Code Camp - Data Profiling and Quality Analysis Framework
AWS: Messaging Services in AWS Presentation
Amazon Cognito: A Primer on Authentication and Authorization
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
Managing State & HTTP Requests In Ionic.

Recently uploaded (20)

PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
Big Data Technologies - Introduction.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Spectroscopy.pptx food analysis technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Advanced IT Governance
PDF
Electronic commerce courselecture one. Pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Sensors and Actuators in IoT Systems using pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Empathic Computing: Creating Shared Understanding
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Mobile App Security Testing_ A Comprehensive Guide.pdf
NewMind AI Monthly Chronicles - July 2025
Big Data Technologies - Introduction.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Advanced Soft Computing BINUS July 2025.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Spectroscopy.pptx food analysis technology
The Rise and Fall of 3GPP – Time for a Sabbatical?
Advanced IT Governance
Electronic commerce courselecture one. Pdf
NewMind AI Weekly Chronicles - August'25 Week I
Sensors and Actuators in IoT Systems using pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
MYSQL Presentation for SQL database connectivity
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Per capita expenditure prediction using model stacking based on satellite ima...
Empathic Computing: Creating Shared Understanding
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...

Getting Started with Spark Scala

  • 1. Presented By: Kuldeepak Gupta Software Consultant Getting Started with Spark Scala
  • 2. Lack of etiquette and manners is a huge turn off. KnolX Etiquettes Punctuality Respect Knolx session timings, you are requested not to join sessions after a 5 minutes threshold post the session start time. Feedback Make sure to submit a constructive feedback for all sessions as it is very helpful for the presenter. Mute Please keep your window on mute Avoid Disturbance Avoid leaving your window unmuted after asking a question
  • 3. Agenda What, When & Why Introduction to Apache Spark 01 Master-slave architecture Spark Architecture 02 Situations where spark is helpful. Use-cases for Spark 03 Components & API in Spark eco-system Spark Eco-System 04 Spark Scala in Action Demonstration 05
  • 5. c What is Spark LEARN NOW ● A General Purpose Distributed Data Processing Engine. ● One of the most popular big data distributed processing framework. ● A multi-language engine for executing data engineering, data science, and machine learning on single-node machine.
  • 6. c Why Spark LEARN NOW ● Supported Language (Java, Python, Scala, R) ● Support multiple languages and integrations with other popular products. ● Offers much less reading and writing to and from the disk.
  • 7. c When Spark LEARN NOW ● Implements a full server- and client-side HTTP stack on top of akka-actor and akka-stream. ● Works with Distributed data (S3, XD, HDFS), NoSQL databases (HBase, Cassandra, MongoDB). ● Machine Learning and Fog Computing.
  • 9. Master Slave Architecture Well defined layered architecture, components and layers are loosely coupled. Cluster Manager Spark Driver ● Control the execution of Spark Application. ● Maintains all states of Spark Cluster. ● Interface with Cluster Manager. Spark Executor ● Process that perform the tasks assigned by the Spark driver. ● Take the tasks assigned by the driver, run them, and report back their state. ● Responsible for maintaining a cluster of machines that will run your Spark Application. ● Have its own “Driver” and “Worker” abstractions.
  • 12. Ideal situation to use Spark Batch and Streaming Supports both batch and real time processing. Big Data in Cloud Easy to setup Spark with Data lake technologies Finance Industry Analyse the text inside the regulatory filling of their own reports. E-Commerce Sector Giants like Ebay, Alibaba uses Spark.
  • 14. Components in Spark Ecosystem SparkR 06 Spark Core 01 Spark SQL 02 Spark Streaming 03 Spark MLLib 04 Spark GraphX 05