SlideShare a Scribd company logo
Building a Versatile Analytics Pipeline
On Top Of Apache Spark
Misha Chernetsov, Grammarly
Spark Summit 2017
June 6, 2017
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail Chernetsov
Data Team Lead @ Grammarly
Building Analytics Pipelines (5 years)
Coding on JVM (12 years), Scala + Spark (3 years)
About Me: Misha Chernetsov
@chernetsov
Tool that helps us better understand:
● Who are our users?
● How do they interact with the product?
● How do they get in, engage, pay, and how long do they stay?
Analytics @ Consumer Product Company
We want our decisions to be
data-driven
Everyone: product managers, marketing, engineers, support...
Analytics @ Consumer Product Company
data
analytics
report
Analytics @ Consumer Product Company
Calendar Day
Number of
unique active
users by day
Example Report 1 – Daily Active Users
dummy data!dummy data!
Example Report 2 – Comparison of Cohort Retention Over Time
dummy data!dummy data!
Ads
Email
Social
Number of
users who
bought a
subscription.
Split by traffic
source type
(where user
came from)
Calendar Day
Example Report 3 – Payer Conversions By Traffic Source
dummy data!dummy data!
● Landing page visit
○ URL with UTM tags
○ Referrer
● Subscription purchased
○ Is first in subscription
Example: Data
Everything is an Event
Example: Data
{
"eventName": "page-visit",
"url": "...?utm_medium=ad",
…
}
{
"eventName": "subscribe",
"period": "12 months",
…
}
Enrich and/or Join
Example: Data
{
"eventName": "page-visit",
"url": "...?utm_medium=ad",
…
}
{
"eventName": "subscribe",
"period": "12 months",
…
}
Slice by Plot
capture enrich index query
Analytics @ Consumer Product Company
capture enrich index query
Use 3rd Party?
1. Integrated Event Analytics
2. UI over your DB
Reports are not tailored for your
needs, limited capability.
Pre-aggregation / enriching
is still on you.
Hard to achieve accuracy and trust.
capture enrich index query
Build Step 1: Capture
● Always up, resilient
● Spikes / back pressure
● Buffer for delayed processing
Kafka
Capture
REST
{
"eventName": "page-visit",
"url": "...?utm_medium=paid",
…
}
Long-term
Storage
StreamKafka
Save To Long-Term Storage
Cassandra
micro-batch
Kafka
Save To Long-Term Storage
val rdd = KafkaUtils.createRDD[K, V](...)
rdd.saveToCassandra("raw")
capture enrich index query
Build Step 2: Enrich
Enrichment 1: User Attribution
Enrichment 1: User Attribution
{
"eventName": "page-visit",
"url": "...?utm_medium=ad",
"fingerprint": "abc",
…
}
{
"eventName": "subscribe",
"userId": 123,
"fingerprint": "abc",
…
}
Enrichment 1: User Attribution
{
"eventName": "page-visit",
"url": "...?utm_medium=ad",
"fingerprint": "abc",
…
}
{
"eventName": "subscribe",
"userId": 123,
"fingerprint": "abc",
…
}
"attributedUserId": 123,
t
Non-authenticated:
userId = null
Authenticated:
userId = 123
fingerprint = abc
(All events from a
given browser)
Enrichment 1: User Attribution
t
Non-authenticated:
userId = null
attributedUserId = 123
Authenticated:
userId = 123
fingerprint = abc
(All events from a
given browser)
Enrichment 1: User Attribution
Authenticated:
userId = 756
Enrichment 1: User Attribution
tfingerprint = abc
(All events from a
given browser)
Authenticated:
userId = 123
Heuristics to
attribute those
Authenticated:
userId = 756
Enrichment 1: User Attribution
tfingerprint = abc
(All events from a
given browser)
Authenticated:
userId = 123
Heuristics to
attribute those
rdd.mapPartitions { iterator =>
val buffer = new ArrayBuffer()
iterator
.takeWhile(_.userId.isEmpty)
.foreach(buffer.append)
val userId = iterator.head.userId
buffer.map(_.setAttributedUserId(userId)) ++ iterator
}
Enrichment 1: User Attribution
rdd.mapPartitions { iterator =>
val buffer = new ArrayBuffer()
iterator
.takeWhile(_.userId.isEmpty)
.foreach(buffer.append)
val userId = iterator.head.userId
buffer.map(_.setAttributedUserId(userId)) ++ iterator
}
Enrichment 1: User Attribution
rdd.mapPartitions { iterator =>
val buffer = new ArrayBuffer()
iterator
.takeWhile(_.userId.isEmpty)
.foreach(buffer.append)
val userId = iterator.head.userId
buffer.map(_.setAttributedUserId(userId)) ++ iterator
}
Enrichment 1: User Attribution
rdd.mapPartitions { iterator =>
val buffer = new ArrayBuffer()
iterator
.takeWhile(_.userId.isEmpty)
.foreach(buffer.append)
val userId = iterator.head.userId
buffer.map(_.setAttributedUserId(userId)) ++ iterator
}
Enrichment 1: User Attribution
rdd.mapPartitions { iterator =>
val buffer = new ArrayBuffer()
iterator
.takeWhile(_.userId.isEmpty)
.foreach(buffer.append)
val userId = iterator.head.userId
buffer.map(_.setAttributedUserId(userId)) ++ iterator
}
Enrichment 1: User Attribution
Can grow big and
OOM your worker for
outliers who use
Grammarly without
ever registering
By default we
should operate
in User Memory
(small fraction).
Spark & Memory
User Memory
100% - spark.memory.fraction = 25%
Spark Memory
spark.memory.fraction = 75%
Let’s get into
Spark Memory
and use its
safety features.
rdd.mapPartitions { iterator =>
val buffer = new SpillableBuffer()
iterator
.takeWhile(_.userId.isEmpty)
.foreach(buffer.append)
val userId = iterator.head.userId
buffer.map(_.setAttributedUserId(userId)) ++ iterator
}
Enrichment 1: User Attribution
Can safely grow in
mem while enough
free Spark Mem. Spills
to disk otherwise.
Spark Memory Manager & Spillable Collection
Memory Disk
Spark Memory Manager & Spillable Collection
Memory Disk
Spark Memory Manager & Spillable Collection
Memory Disk
Spark Memory Manager & Spillable Collection
Memory
×2
Disk
Spark Memory Manager & Spillable Collection
Memory Disk
Spark Memory Manager & Spillable Collection
Memory Disk
Spark Memory Manager & Spillable Collection
Memory
Spill to Disk
Disk
Spark Memory Manager & Spillable Collection
Memory Disk
Spill to Disk
Spark Memory Manager & Spillable Collection
Memory Disk
trait SizeTracker {
def afterUpdate(): Unit = { … }
def estimateSize(): Long = { … }
}
Call on every append.
Periodically estimates size
and saves samples.
Extrapolates
SizeTracker
trait Spillable {
abstract def spill(inMemCollection: C): Unit
def maybeSpill(currentMemory: Long, inMemCollection: C) {
try x2 if needed
}
}
Spillable
call on every append to collection
public long acquireExecutionMemory(long required, …)
public void releaseExecutionMemory(long size, …)
TaskMemoryManager
● Be safe with outliers
● Get outside User Memory (25%), use Spark Memory (75%)
● Spark APIs: Could be a bit friendlier and high level
Custom Spillable Collection
Enrichment 2: Calculable Props
Enrichment Phase 2: Calculable Props
{
"eventName": "page-visit",
"url": "...?utm_medium=ad",
"fingerprint": "abc",
"attributedUserId": 123,
…
}
{
"eventName": "subscribe",
"userId": 123,
"fingerprint": "abc",
…
}
Enrichment Phase 2: Calculable Props
{
"eventName": "page-visit",
"url": "...?utm_medium=ad",
"fingerprint": "abc",
"attributedUserId": 123,
…
}
{
"eventName": "subscribe",
"userId": 123,
"fingerprint": "abc",
"firstUtmMedium": "ad",
…
}
val firstUtmMedium: CalcProp[String] =
(E  "url").as[Url]
.map(_.param("utm_source"))
.forEvent("page-visit")
.first
Enrichment Phase 2: Calculable Props Engine & DSL
● Type-safe, functional, composable
● Familiar: similar to Scala collections API
● Batch & Stream (incremental)
Enrichment Phase 2: Calculable Props Engine & DSL
Enrichment Pipeline with Spark
Raw
Kafka
Spark Pipeline
Stream:
Save Raw Kafka
Stream:
User Attr.
User-attributed
Kafka
Stream:
Calc Props
Enriched and
Queryable
Batch:
User Attr.
Batch:
Calc Props
batch
micro-batchmicro-batchmicro-batch
Cassandra
Kafka
Spark Pipeline
Kafka
Cassandra
Kafka
Parquet on
AWS S3
batch
● Connectors for everything
● Great for batch
○ Shuffle with spilling
○ Failure recovery
● Great for streaming
○ Fast
○ Low overhead
Spark Pipeline
batch
micro-batchmicro-batchmicro-batch
Cassandra
Kafka
Spark Pipeline
Kafka
Cassandra
Kafka
Parquet on
AWS S3
batch
job
Multiple Output Destinations
Kafka Kafka
CassandraCassandra
val rdd: RDD[T]
rdd.sendToKafka(“topic_x”)
rdd.saveToCassandra(“table_foo”)
rdd.saveToCassandra(“table_bar”)
Multiple Output Destinations: Try 1
rdd.saveToCassandra(...)
rdd.forEachPartition(...)
sc.runJob(...)
Multiple Output Destinations: Try 1
job
Multiple Output Destinations: Try 1
Kafka Kafka
CassandraCassandra
job
Multiple Output Destinations: Try 1 = 3 Jobs
Kafka Kafka
job
Kafka
Table 1
job
Kafka
Table 2
val rdd: RDD[T]
rdd.cache()
rdd.sendToKafka(“topic_x”)
rdd.saveToCassandra(“table_foo”)
rdd.saveToCassandra(“table_bar”)
Multiple Output Destinations: Try 2
job
Multiple Output Destinations: Try 2 = Read Once, 3 Jobs
Kafka Kafka
job
Cache
Table 1
job
Cache
Table 2
rdd.forEachPartition { iterator =>
val writer = new BufferedWriter(
new OutputStreamWriter(new FileOutStream())
)
iterator.forEach { el =>
writer.writeln(el)
}
writer.close() // makes sure this writes
}
Writer
rdd.forEachPartition { iterator =>
val writer = new BufferedWriter(
new OutputStreamWriter(new FileOutStream())
)
iterator.forEach { el =>
writer.writeln(el)
}
writer.close() // makes sure this writes
}
Writer
rdd.forEachPartition { iterator =>
val writer = new BufferedWriter(
new OutputStreamWriter(new FileOutStream())
)
iterator.forEach { el =>
writer.writeln(el)
}
writer.close() // makes sure this writes
}
Writer
rdd.forEachPartition { iterator =>
val writer = new BufferedWriter(
new OutputStreamWriter(new FileOutStream(...))
)
iterator.forEach { el =>
writer.writeln(el)
}
writer.close() // makes sure this writes
}
● Buffer
● Non-blocking
● Idempotent / Dedupe
Writer
andWriteToX = rdd.mapPartitions { iterator =>
val writer = new XWriter()
val writingIterator = iterator.map { el =>
writer.write(el)
}.closing(() => writer.close)
}
AndWriter
andWriteToX = rdd.mapPartitions { iterator =>
val writer = new XWriter()
val writingIterator = iterator.map { el =>
writer.write(el)
}.closing(() => writer.close)
}
AndWriter
andWriteToX = rdd.mapPartitions { iterator =>
val writer = new XWriter()
val writingIterator = iterator.map { el =>
writer.write(el)
}.closing(() => writer.close)
}
AndWriter
val rdd: RDD[T]
rdd.andSaveToCassandra(“table_foo”)
.andSaveToCassandra(“table_bar”)
.sendToKafka(“topic_x”)
Multiple Output Destinations: Try 3
job
Multiple Output Destinations: Try 3
Kafka Kafka
CassandraCassandra
● Kafka
● Cassandra
● HDFS
Important! Each andWriter will consume resources
● Memory (buffers)
● IO
And Writer
capture enrich index query
Build Step 3: Index
Index
● Parquet on AWS S3
● Custom partitioning: By eventName and time interval
● Append changes, compact + merge on the fly when querying
● Randomized names to maximize S3 parallelism
● Use s3a for max performance and and tweak for S3 read-after-write
consistency
● Support flexible schema, even with conflicts!
Some Stats
● Thousands of
events
per second
● Terabytes of
compressed
data
capture enrich index query
Build Step 4: Query
● DataFrames
● Spark SQL Scala dsl / Pure SQL
● Zeppelin
Hardcore Query
● Plot by day
● Unique visitors
● Filter by country
● Split by traffic source (top 20)
● Time from 2 weeks ago to today
Casual Query
Option 1: SQL
Quickly gets complex
Too expensive
to build, extend
and support
Option 2: UI
SEGMENT “eventName”
WHERE foo = “bar” AND x.y IN (“a”, “b”, “c”)
UNIQUE
BY m IS NOT NULL
TIME from 2 months ago to today
STEP 1 month SPAN 1 week
Option 3: Custom Query Language
SEGMENT “eventName”
WHERE foo = “bar” AND x.y IN (“a”, “b”, “c”)
UNIQUE
BY m IS NOT NULL
TIME from 2 months ago to today
STEP 1 month SPAN 1 week
Option 3: Custom Query Language
Expressions
● Segment, Funnel, Retention
● UI & as DataFrame in Zeppelin
● Spark <= 1.6 – Scala Parser Combinators
● Reuse most complex part of expression parser
● Relatively extensible
Option 3: Custom Query Language
Option 3: Custom Query Language
● Custom versatile analytics is doable and enjoyable
● Spark is a great platform to build analytics on top of
○ Enrichment Pipeline: Batch / Streaming, Query, ML
● Would be cool to see even deep internals slightly more extensible
Conclusion
We are hiring!
olivia@grammarly.com
https://www.grammarly.com/jobs
Thank you!
Questions?

More Related Content

PDF
Data Streaming Ecosystem Management at Booking.com
PDF
[164] pinpoint
PDF
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
PDF
Getting up to speed with Kafka Connect: from the basics to the latest feature...
PPTX
Brandon obrien streaming_data
PDF
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
PPTX
Unified Batch & Stream Processing with Apache Samza
PDF
Introduction to OpenID Connect
Data Streaming Ecosystem Management at Booking.com
[164] pinpoint
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Getting up to speed with Kafka Connect: from the basics to the latest feature...
Brandon obrien streaming_data
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Unified Batch & Stream Processing with Apache Samza
Introduction to OpenID Connect

What's hot (20)

PDF
Dynamic Reconfiguration of Apache ZooKeeper
PPTX
Solr Search Engine: Optimize Is (Not) Bad for You
PDF
Kafka High Availability in multi data center setup with floating Observers wi...
PDF
Training Week: Introduction to Neo4j Bloom
PDF
Kafka 101 and Developer Best Practices
PPTX
Apache Kafka at LinkedIn
PPTX
Spark sql meetup
PDF
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
PPTX
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
PPTX
SharePoint Einführung und Anwenderschulung
PDF
pfSense firewall workshop guide
PDF
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
PDF
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
PDF
Meetup SF - Amundsen
PDF
[215] Druid로 쉽고 빠르게 데이터 분석하기
PDF
Integrating Fiware Orion, Keyrock and Wilma
PDF
美团数据平台之Kafka应用实践和优化
PDF
High-speed Database Throughput Using Apache Arrow Flight SQL
PDF
Top 5 Event Streaming Use Cases for 2021 with Apache Kafka
PDF
[아이펀팩토리] 2018 데브데이 서버위더스 _04 리눅스 게임 서버 성능 분석
Dynamic Reconfiguration of Apache ZooKeeper
Solr Search Engine: Optimize Is (Not) Bad for You
Kafka High Availability in multi data center setup with floating Observers wi...
Training Week: Introduction to Neo4j Bloom
Kafka 101 and Developer Best Practices
Apache Kafka at LinkedIn
Spark sql meetup
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
SharePoint Einführung und Anwenderschulung
pfSense firewall workshop guide
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Meetup SF - Amundsen
[215] Druid로 쉽고 빠르게 데이터 분석하기
Integrating Fiware Orion, Keyrock and Wilma
美团数据平台之Kafka应用实践和优化
High-speed Database Throughput Using Apache Arrow Flight SQL
Top 5 Event Streaming Use Cases for 2021 with Apache Kafka
[아이펀팩토리] 2018 데브데이 서버위더스 _04 리눅스 게임 서버 성능 분석
Ad

Similar to Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail Chernetsov (20)

PDF
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PDF
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
PDF
Scalding big ADta
PPTX
Keeping Spark on Track: Productionizing Spark for ETL
PDF
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
PDF
So you think you can stream.pptx
PDF
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
PDF
Spark Meetup
PDF
A Tale of Two APIs: Using Spark Streaming In Production
PDF
Unified Big Data Processing with Apache Spark (QCON 2014)
PPTX
Apache Spark Workshop
PDF
Real-Time Spark: From Interactive Queries to Streaming
PDF
3 Dundee-Spark Overview for C* developers
PDF
Introduction to apache kafka, confluent and why they matter
PDF
Productionizing your Streaming Jobs
PDF
Incrementalism: An Industrial Strategy For Adopting Modern Automation
PDF
Introduction to apache kafka
PDF
Unified Big Data Processing with Apache Spark
PDF
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
PDF
Awesome Banking API's
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Scalding big ADta
Keeping Spark on Track: Productionizing Spark for ETL
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
So you think you can stream.pptx
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Meetup
A Tale of Two APIs: Using Spark Streaming In Production
Unified Big Data Processing with Apache Spark (QCON 2014)
Apache Spark Workshop
Real-Time Spark: From Interactive Queries to Streaming
3 Dundee-Spark Overview for C* developers
Introduction to apache kafka, confluent and why they matter
Productionizing your Streaming Jobs
Incrementalism: An Industrial Strategy For Adopting Modern Automation
Introduction to apache kafka
Unified Big Data Processing with Apache Spark
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Awesome Banking API's
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Data Lakehouse Symposium | Day 2
PPTX
Data Lakehouse Symposium | Day 4
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Learn to Use Databricks for Data Science
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Massive Data Processing in Adobe Using Delta Lake
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake

Recently uploaded (20)

PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Introduction to Knowledge Engineering Part 1
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
Introduction to Data Science and Data Analysis
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
Fluorescence-microscope_Botany_detailed content
Introduction to Knowledge Engineering Part 1
ISS -ESG Data flows What is ESG and HowHow
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Data_Analytics_and_PowerBI_Presentation.pptx
SAP 2 completion done . PRESENTATION.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Introduction to Data Science and Data Analysis
.pdf is not working space design for the following data for the following dat...
oil_refinery_comprehensive_20250804084928 (1).pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Qualitative Qantitative and Mixed Methods.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Miokarditis (Inflamasi pada Otot Jantung)
Clinical guidelines as a resource for EBP(1).pdf
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction-to-Cloud-ComputingFinal.pptx

Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail Chernetsov