SlideShare a Scribd company logo
Spark as a Service with Azure Databricks
Spark as a Service with Azure Databricks
•
•
•
Spark as a Service with Azure Databricks
Spark as a Service with Azure Databricks
Spark as a Service with Azure Databricks
Apache Spark Core APIs
RDDs, DataFrame, Datasets
Spark SQL
GraphX
(graph)
Structured
Streaming
Mllib
(machine
learning)
Spark: The Definitive Guide
Spark as a Service with Azure Databricks
Spark as a Service with Azure Databricks
Source: http://spark.apache.org/
Structured
Streaming
Advanced
Analytics
Libraries &
Ecosystem
Low Level APIs
Structure APIs
Datasets DataFrame SQL
RDDs Distributed Variables
RDD
RDD
RDD
RDDRDD
Transformations ValueActions
Transformations Actions
select show
distinct count
groupBy collect
sum save
orderBy first
filter
limit
summarize
… and much more
Driver
Cluster Manager
Executor
Spark Session
User code
Executor Executor
Distributed Data Structure
Partition Partition Partition
Partition Partition Partition
Spark as a Service with Azure Databricks
Managed Apache Spark platform optimized for Azure
Microsoft Azure
Spark as a Service with Azure Databricks
Spark as a Service with Azure Databricks
Optimized Databricks Runtime Engine
DATABRICKS I/O SERVERLESS
Collaborative Workspace
Cloud storage
Data warehouses
Hadoop storage
IoT / streaming data
Rest APIs
Machine learning models
BI tools
Data exports
Data warehouses
AZURE DATABRICKS
Enhance Productivity
Deploy Production Jobs & Workflows
APACHE SPARK
MULTI-STAGE PIPELINES
DATA ENGINEER
JOB SCHEDULER NOTIFICATION & LOGS
DATA SCIENTIST BUSINESS ANALYST
Build on secure & trusted cloud Scale without limits
Cosmos DB
Kafka on HDInsight
Event Hubs
Power BI
SQL DW
Data Factory
O R C H E S T R A T I O N
Storage (Azure) Azure Data Lake
S T O R A G E
I N G E S T V I S U A L I Z E
S E C U R E Azure Active Directory
A Z U RE DATA BRIC KS
Spark as a Service with Azure Databricks
Spark as a Service with Azure Databricks
DBFS
Storage blob
CLI
Spark as a Service with Azure Databricks
https://movielens.org/
F. Maxwell Harper and Joseph A. Konstan. 2015.
The MovieLens Datasets: History and Context.
ACM Transactions on Interactive Intelligent
Systems (TiiS) 5, 4, Article 19 (December 2015), 19
pages. DOI=http://dx.doi.org/10.1145/2827872
Spark as a Service with Azure Databricks
Apache Spark Core APIs
RDDs, DataFrame, Datasets
Spark SQL
GraphX
(graph)
Structured
Streaming
Mllib
(machine
learning)
Spark as a Service with Azure Databricks
Apache Spark Core APIs
RDDs, DataFrame, Datasets
Spark SQL
GraphX
(graph)
Structured
Streaming
Mllib
(machine
learning)
Spark as a Service with Azure Databricks
Spark as a Service with Azure Databricks



Apache Spark Core APIs
RDDs, DataFrame, Datasets
Spark SQL
GraphX
(graph)
Structured
Streaming
Mllib
(machine
learning)
Spark as a Service with Azure Databricks
Spark as a Service with Azure Databricks
 Collaborative Workspace
Deploy Production Jobs & Workflows
MULTI-STAGE PIPELINES
DATA ENGINEER
JOB SCHEDULER NOTIFICATION & LOGS
DATA SCIENTIST BUSINESS ANALYST
Spark as a Service with Azure Databricks
Spark as a Service with Azure Databricks
Collaborative Workspace
Deploy Production Jobs & Workflows
MULTI-STAGE PIPELINES
DATA ENGINEER
JOB SCHEDULER NOTIFICATION & LOGS
DATA SCIENTIST BUSINESS ANALYST
Spark as a Service with Azure Databricks
Spark as a Service with Azure Databricks
https://github.com/devlace/azure-databricks-
recommendation-system
Official Apache Spark website
Azure Databricks Documentation
[Book] Spark: The Definitive Guide
Spark as a Service with Azure Databricks

More Related Content

PPTX
Introduction to Azure Databricks
PDF
Building Lakehouses on Delta Lake with SQL Analytics Primer
PDF
DevOps for Databricks
PPTX
Azure data bricks by Eugene Polonichko
PPTX
Microsoft Azure Databricks
PPTX
Azure Databricks - An Introduction (by Kris Bock)
PPTX
Databricks Fundamentals
PDF
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Introduction to Azure Databricks
Building Lakehouses on Delta Lake with SQL Analytics Primer
DevOps for Databricks
Azure data bricks by Eugene Polonichko
Microsoft Azure Databricks
Azure Databricks - An Introduction (by Kris Bock)
Databricks Fundamentals
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...

What's hot (20)

PDF
Azure Data Factory v2
PDF
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
PDF
Databricks Delta Lake and Its Benefits
PDF
Enabling a Data Mesh Architecture with Data Virtualization
PPTX
Azure data factory
PDF
Future of Data Engineering
PDF
Modernizing to a Cloud Data Architecture
PDF
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
PPTX
Delta lake and the delta architecture
PDF
Introduction SQL Analytics on Lakehouse Architecture
PPTX
Data Lakehouse Symposium | Day 4
PDF
dbt Python models - GoDataFest by Guillermo Sanchez
PDF
From Data Warehouse to Lakehouse
PPTX
Introduction to Data Engineering
PDF
Moving to Databricks & Delta
PDF
Using Databricks as an Analysis Platform
PDF
Azure Data Factory V2; The Data Flows
PPTX
Building a modern data warehouse
PPTX
Azure Data Factory
PPTX
Modernize & Automate Analytics Data Pipelines
Azure Data Factory v2
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Databricks Delta Lake and Its Benefits
Enabling a Data Mesh Architecture with Data Virtualization
Azure data factory
Future of Data Engineering
Modernizing to a Cloud Data Architecture
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Delta lake and the delta architecture
Introduction SQL Analytics on Lakehouse Architecture
Data Lakehouse Symposium | Day 4
dbt Python models - GoDataFest by Guillermo Sanchez
From Data Warehouse to Lakehouse
Introduction to Data Engineering
Moving to Databricks & Delta
Using Databricks as an Analysis Platform
Azure Data Factory V2; The Data Flows
Building a modern data warehouse
Azure Data Factory
Modernize & Automate Analytics Data Pipelines
Ad

Similar to Spark as a Service with Azure Databricks (20)

PPTX
Building Advanced Analytics Pipelines with Azure Databricks
PPTX
MongoDB and Azure Databricks
PDF
Spark and scala course content | Spark and scala course online training
PPTX
Azure Databricks - An Introduction 2019 Roadshow.pptx
PPTX
Apache Spark on HDinsight Training
PDF
Apache spark - Architecture , Overview & libraries
PDF
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
PPTX
Processing Large Data with Apache Spark -- HasGeek
PPTX
Apache Spark: Lightning Fast Cluster Computing
PPTX
Azure Databricks is Easier Than You Think
PPTX
Azure Databricks & Spark @ Techorama 2018
PPTX
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
PDF
Austin Data Meetup 092014 - Spark
PPTX
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
PPTX
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
PDF
Serverless Data Platform
PPTX
Spark from the Surface
PDF
Apache Spark Overview @ ferret
PPTX
The Roadmap for SQL Server 2019
PDF
20170126 big data processing
Building Advanced Analytics Pipelines with Azure Databricks
MongoDB and Azure Databricks
Spark and scala course content | Spark and scala course online training
Azure Databricks - An Introduction 2019 Roadshow.pptx
Apache Spark on HDinsight Training
Apache spark - Architecture , Overview & libraries
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Processing Large Data with Apache Spark -- HasGeek
Apache Spark: Lightning Fast Cluster Computing
Azure Databricks is Easier Than You Think
Azure Databricks & Spark @ Techorama 2018
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Austin Data Meetup 092014 - Spark
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Serverless Data Platform
Spark from the Surface
Apache Spark Overview @ ferret
The Roadmap for SQL Server 2019
20170126 big data processing
Ad

Recently uploaded (20)

PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Designing Intelligence for the Shop Floor.pdf
PPTX
assetexplorer- product-overview - presentation
PPTX
history of c programming in notes for students .pptx
PDF
Tally Prime Crack Download New Version 5.1 [2025] (License Key Free
PPTX
Monitoring Stack: Grafana, Loki & Promtail
DOCX
Greta — No-Code AI for Building Full-Stack Web & Mobile Apps
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Autodesk AutoCAD Crack Free Download 2025
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
iTop VPN Free 5.6.0.5262 Crack latest version 2025
PPTX
Advanced SystemCare Ultimate Crack + Portable (2025)
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Digital Systems & Binary Numbers (comprehensive )
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
Design an Analysis of Algorithms I-SECS-1021-03
Operating system designcfffgfgggggggvggggggggg
Designing Intelligence for the Shop Floor.pdf
assetexplorer- product-overview - presentation
history of c programming in notes for students .pptx
Tally Prime Crack Download New Version 5.1 [2025] (License Key Free
Monitoring Stack: Grafana, Loki & Promtail
Greta — No-Code AI for Building Full-Stack Web & Mobile Apps
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Autodesk AutoCAD Crack Free Download 2025
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
iTop VPN Free 5.6.0.5262 Crack latest version 2025
Advanced SystemCare Ultimate Crack + Portable (2025)
Navsoft: AI-Powered Business Solutions & Custom Software Development
Digital Systems & Binary Numbers (comprehensive )
How to Choose the Right IT Partner for Your Business in Malaysia
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
Design an Analysis of Algorithms II-SECS-1021-03
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf

Spark as a Service with Azure Databricks