SlideShare a Scribd company logo
KafkaandKafkaStreamsinthe
GlobalSchibstedDataPlatform
Fredrik Vraalsen, 16.10.2018
•Data Platform Architect

City of Oslo
•Former Data Engineer

Data Platform
FredrikVraalsen
WHOAMI?
SCHIBSTED
SCHIBSTED
22 countries
200 million users/month
20 billion pageviews/month
SCHIBSTEDPRODUCTS&TECHNOLOGY
ABITOFHISTORY…
DATAPIPELINE
Collector
Kinesis
Batch
Storage S3
Piper
STREAMPROCESSING
Collector
Kinesis
Batch
Storage S3
Piper
INCOMINGEVENTS
THEISSUES
• “High” latency (~30 seconds)
• Delivery guarantees
• On-boarding
• Manual configuration
• Homegrown solution
VISION
• Data-driven applications
• Self-serve
• GDPR
• Performance
• State of the art
WHY ?
STREAMPROCESSING
• Lightweight library
• Streams and Tables
• High-level DSL
• Low level API
KAFKASTREAMS
YGGDRASILWASBORN
OLDSTREAMINGPIPELINE
Storage Piper
NEWSTREAMINGPIPELINE
Storage Piper YggdrasilStorage
GETTINGDATAIN&OUT
https://www.confluent.io/blog/announcing-kafka-connect-building-large-scale-low-latency-data-pipelines/
GETTINGDATAIN&OUT
http://kafka.apache.org/documentation.html
GETTINGDATAIN&OUT
DuratroYggdrasilStorage
Event
firehose
Sink
topics
3RDPARTYANALYTICS
DATAQUALITY
DATADRIVENAPPLICATIONS
https://www.slideshare.net/DataStax/c-for-deep-learning-andrew-jefferson-tracktable-cassandra-summit-2016
https://www.flickr.com/photos/rahulrodriguez/14683524180
https://pixabay.com/en/map-photoshop-geolocation-journey-947471/
GROWINGPAINS
SCALINGUP
BUMPYRIDE
https://pixabay.com/no/veien-pukler-fremover-veiskilt-246/
CHALLENGES&EXPERIENCES
http://www.publicdomainpictures.net/view-image.php?image=6884
https://pixabay.com/en/software-testing-service-762486/
https://pixabay.com/no/brett-l%C3%A6r-note-ferdigheter-597190/
99,5%
99,99976%
CHALLENGES&EXPERIENCES
http://www.publicdomainpictures.net/view-image.php?image=6884
https://pixabay.com/en/software-testing-service-762486/
https://pixabay.com/no/brett-l%C3%A6r-note-ferdigheter-597190/
100%
99,99992%
SELFSERVE
SELFSERVE
• Challenge: Transformations and routing
• Multiple configurations
• Who maintains?
• Required Scala knowledge
{
"time": round(parse-time(.published, "yyyy-MM-dd'T'HH:mm:ssX") * 1000),
"device_manufacturer": .device.manufacturer,
"device_model": .device.model,
"language": .device.acceptLanguage,
"os_name": .device.osType,
"os_version": .device.osVersion,
"platform": .device.platformType,
"user_properties": {
"is_logged_in" : boolean(.actor."spt:userId")
}
}
SELFSERVE
• JSLT – DSL for JSON transformation & queries
https://github.com/schibsted/jslt
SELFSERVE
Kafka and Kafka Streams in the Global Schibsted Data Platform
Kafka and Kafka Streams in the Global Schibsted Data Platform
Data Platform – Oslo Origo
Oslo Origo
• Digital transformation
• Smarter services
• Data driven
City of Oslo
673,469
53,000
50+
200,000
Data Platform – Oslo Origo
https://www.flickr.com/photos/boaski/8079390195
Data Platform – Oslo Origo
• Creating value from our data
• Awareness of opportunities
• Simple and safe data access
• Insights and decision making
• Collaboration and sharing
Kafka and Kafka Streams in the Global Schibsted Data Platform
THANKYOU!
Fredrik Vraalsen
@fredriv
fredrik@vraalsen.no

More Related Content

PDF
Divide & Conquer - Logging Architecture in Distributed Ecosystems with Elasti...
PPTX
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
PDF
Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...
PPTX
Google Cloud and Data Pipeline Patterns
PPTX
Introduction to knime
PPTX
Integrate 2017 unlock azure hybrid integration with biz talk - ws
PDF
Alex Nauda [Nobl9] | How Not to Build an SLO Platform | InfluxDays NA 2021
PDF
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
Divide & Conquer - Logging Architecture in Distributed Ecosystems with Elasti...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...
Google Cloud and Data Pipeline Patterns
Introduction to knime
Integrate 2017 unlock azure hybrid integration with biz talk - ws
Alex Nauda [Nobl9] | How Not to Build an SLO Platform | InfluxDays NA 2021
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...

What's hot (20)

PDF
The Expert Guide to Fast Data
PPTX
A Walkthrough of InfluxCloud 2.0 by Tim Hall
PDF
Martin Moucka [Red Hat] | How Red Hat Uses gNMI, Telegraf and InfluxDB to Gai...
PDF
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
PDF
Scylla Summit 2022: An Odyssey to ScyllaDB and Apache Kafka
PDF
How to use hybrid cloud to migrate and deploy unified business applications i...
PDF
DataXDay - Real-Time Access log analysis
PDF
"Smooth Operator" [Bay Area NewSQL meetup]
PPTX
Concept to reality: An advanced agile integration blueprint
PDF
Low-latency real-time data processing at giga-scale with Kafka | John DesJard...
PPTX
Achieving Real-Time Analytics at Hermes | Zulf Qureshi, HVR and Dr. Stefan Ro...
PPTX
How to Build Real-Time Streaming Analytics with an In-memory, Scale-out SQL D...
PDF
Scylla Summit 2022: Scalable and Sustainable Supply Chains with DLT and ScyllaDB
PDF
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
PDF
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
PDF
Why you really want SQL in a Real-Time Enterprise Environment
PDF
How to Gain Visibility into Containers, VM’s and Multi-Cloud Environments Usi...
PPT
Kafka Summit NYC 2017 - Simplifying Omni-Channel Retail at Scale
PDF
[WSO2Con USA 2018] Microservices, Containers, and Beyond
PDF
Digital Transformation in Healthcare with Kafka—Building a Low Latency Data P...
The Expert Guide to Fast Data
A Walkthrough of InfluxCloud 2.0 by Tim Hall
Martin Moucka [Red Hat] | How Red Hat Uses gNMI, Telegraf and InfluxDB to Gai...
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
Scylla Summit 2022: An Odyssey to ScyllaDB and Apache Kafka
How to use hybrid cloud to migrate and deploy unified business applications i...
DataXDay - Real-Time Access log analysis
"Smooth Operator" [Bay Area NewSQL meetup]
Concept to reality: An advanced agile integration blueprint
Low-latency real-time data processing at giga-scale with Kafka | John DesJard...
Achieving Real-Time Analytics at Hermes | Zulf Qureshi, HVR and Dr. Stefan Ro...
How to Build Real-Time Streaming Analytics with an In-memory, Scale-out SQL D...
Scylla Summit 2022: Scalable and Sustainable Supply Chains with DLT and ScyllaDB
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
Why you really want SQL in a Real-Time Enterprise Environment
How to Gain Visibility into Containers, VM’s and Multi-Cloud Environments Usi...
Kafka Summit NYC 2017 - Simplifying Omni-Channel Retail at Scale
[WSO2Con USA 2018] Microservices, Containers, and Beyond
Digital Transformation in Healthcare with Kafka—Building a Low Latency Data P...
Ad

Similar to Kafka and Kafka Streams in the Global Schibsted Data Platform (20)

PDF
PLNOG 8: Kazimierz Jantas - Innowacyjne rozwiązania dla IT
PPTX
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
PPTX
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
PDF
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
PDF
World’s Fastest Image Serving Technology
PDF
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
PDF
Eliminating Volatile Latencies Inside Rakuten’s NoSQL Migration
PDF
HBase Meetup @ Cask HQ 09/25
PDF
Building real time data-driven products
PPTX
What is the Oracle PaaS Cloud for Developers (Oracle Cloud Day, The Netherlan...
PPTX
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
PDF
TechEvent 2019: DBaaS from Swisscom Cloud powered by Trivadis; Konrad Häfeli ...
PDF
What's New in Upcoming Apache Spark 2.3
PDF
Accelerate Big Data Application Development with Cascading
PPTX
Red Hat Storage Day LA - Performance and Sizing Software Defined Storage
PDF
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
PDF
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
PDF
Wasp2 - IoT and Streaming Platform
PDF
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
PDF
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
PLNOG 8: Kazimierz Jantas - Innowacyjne rozwiązania dla IT
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
World’s Fastest Image Serving Technology
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Eliminating Volatile Latencies Inside Rakuten’s NoSQL Migration
HBase Meetup @ Cask HQ 09/25
Building real time data-driven products
What is the Oracle PaaS Cloud for Developers (Oracle Cloud Day, The Netherlan...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
TechEvent 2019: DBaaS from Swisscom Cloud powered by Trivadis; Konrad Häfeli ...
What's New in Upcoming Apache Spark 2.3
Accelerate Big Data Application Development with Cascading
Red Hat Storage Day LA - Performance and Sizing Software Defined Storage
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
Wasp2 - IoT and Streaming Platform
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Ad

More from Fredrik Vraalsen (10)

PDF
Building applications with Serverless Framework and AWS Lambda - JavaZone 2019
PDF
Building applications with Serverless Framework and AWS Lambda
PDF
Scala intro workshop
PDF
Event stream processing using Kafka streams
PDF
Hjelp, vi skal kode funksjonelt i Java!
PDF
Java 8 DOs and DON'Ts - javaBin Oslo May 2015
PDF
Functional programming in Java 8 - workshop at flatMap Oslo 2014
PDF
Java 8 - Return of the Java
PDF
Java 8 to the rescue!?
ODP
Git i praksis - erfaringer med overgang fra ClearCase til Git
Building applications with Serverless Framework and AWS Lambda - JavaZone 2019
Building applications with Serverless Framework and AWS Lambda
Scala intro workshop
Event stream processing using Kafka streams
Hjelp, vi skal kode funksjonelt i Java!
Java 8 DOs and DON'Ts - javaBin Oslo May 2015
Functional programming in Java 8 - workshop at flatMap Oslo 2014
Java 8 - Return of the Java
Java 8 to the rescue!?
Git i praksis - erfaringer med overgang fra ClearCase til Git

Recently uploaded (20)

PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Machine learning based COVID-19 study performance prediction
PDF
Empathic Computing: Creating Shared Understanding
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
1. Introduction to Computer Programming.pptx
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
August Patch Tuesday
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
SOPHOS-XG Firewall Administrator PPT.pptx
cloud_computing_Infrastucture_as_cloud_p
Machine learning based COVID-19 study performance prediction
Empathic Computing: Creating Shared Understanding
Advanced methodologies resolving dimensionality complications for autism neur...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Assigned Numbers - 2025 - Bluetooth® Document
Heart disease approach using modified random forest and particle swarm optimi...
Univ-Connecticut-ChatGPT-Presentaion.pdf
NewMind AI Weekly Chronicles - August'25-Week II
Mobile App Security Testing_ A Comprehensive Guide.pdf
A comparative analysis of optical character recognition models for extracting...
MIND Revenue Release Quarter 2 2025 Press Release
1. Introduction to Computer Programming.pptx
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Digital-Transformation-Roadmap-for-Companies.pptx
TLE Review Electricity (Electricity).pptx
Per capita expenditure prediction using model stacking based on satellite ima...
August Patch Tuesday
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...

Kafka and Kafka Streams in the Global Schibsted Data Platform