SlideShare a Scribd company logo
Using Apache Cassandra and
Apache Kafka to Scale Next Gen
Applications
Adam Zegelin
Founding Software Engineer, Instaclustr
1.Xxxxxxxxx
xxxxx
Introduction
• Adam Zegelin
• Co-founded Instaclustr 5 years ago
• In Canberra, Australia
• Current focus is Cassandra on Kubenetes
• Instaclustr
• Managed Apache Cassandra, Spark and Kafka in the ☁️
 AWS, GCP, Azure & IBM
 3000 nodes under management
 24×7×365 support
• Consulting
 Schema & application design
 Workshops & Training
• 2nd-level on-call support for on-premise deployments
Agenda
• Introduction to Cassandra and Kafka
• Real-world Use Cases
• Worldpay
• Lendi
• Instaclustr
• Partitioning: the key to scale
• Fitting and architecting for your use case
• Linearly Scalable
• Always Available
• Multi-Region Data
Store
• Apache Cassandra is the leading NoSQL operational
database for high-scale and high-reliability applications.
• Shared nothing peer-to-peer architecture provides
reliability up to 100% (with Instaclustr SLAs).
• replicated data and multiple nodes capable of fulfilling queries
 Node outage? Service just keeps running
• full online maintenance and in-place upgrades
• Low latency for operational applications
• Sub-10ms P95 reads and writes achievable
• Native active-active multi data center support
• Geographic distribution (to meet latency requirements)
• Disaster resilience
• Workload isolation (analytics)
• Cassandra is a data storage system, not an
analytics/query engine or place to run logic
Typical Use Cases
• High write to read ratio
• Data is rarely updated
• Including explicit deletes
• The Primary Key is known at read time
• Limited filtering & aggregation
• No JOINs or referential integrity
• Transaction logging
• Time series data
• IoT status and event history
• Health tracker data
• Order & package statuses & tracking
• Weather service history
• Messages and email envelopes
Queuing, Pub/Sub and
Streaming at Scale
• Apache Kafka is a distributed streaming platform
• Publish and subscribe to streams of records
 Similar to a message queue or EMS
• Store streams of records
 Fault-tolerant
 Durable
• Process streams of records
 as they occur
 randomly, any position in the stream
• Replicated architecture
• High-level similarities to Cassandra
• Scalability
• Reliability
Typical Use Cases
• As a message bus
• Loose coupling between producers and consumers
• Basis for micro-services
• As a commit log
• A store of logical transactions
• Populating analytical data stores or edge caches
• As a buffer
• Manage backpressure & workload spikes
And when combined with Kafka Streams/Spark Streaming…
• As the basis of a streaming architecture
• (near) real-time analytics
• Data processing pipelines
Typical Use Cases
cont’d
• Website activity tracking
• Page views
• Searches
• Other user actions
• Metrics
• Operational monitoring data
• Log aggregation
• Centralized logging
• Event sourcing
• Application state changes
• “we don't just want to see where we are, we also want to know
how we got there”
Case study
• Payment processor
• spun out of RBS in 2010
• merged with Vantive in US in Jan 2018 for USD 10.4B to form
WorldPay Inc.
• Processes
• >40 Million transactions per day
• for 400,000 merchants
• 42% of all UK non-cash transactions
Case study
cont’d
• Re-architecting of WorldPay’s XML Payment API
• facilitates ~40M transactions per month
• New architecture based on open source technologies
• including Cassandra and Kafka
• to provide scalability, availability and reduced costs
• New Idempotency Service
• first project to use the new architecture
• provides capabilities to ensure payments are not repeated
Case study
cont’d
• Challenges
• Tight deployment timeframe
• Very high availability expectations
• Low latency requirements
• Utilises Cassandra to provide highest levels of availability
and scalability
• 18 node cluster
• 3 AWS regions (in Europe)
• Leverages Cassandras tuneable consistency
 QUORUM = strong consistency across regions
 still able to operate with a whole region unavailable
 Latency is tolerable (restricted to EU)
• Simple data model with atomic reads/writes
 fits well with Cassandra capability
Case study
cont’d
• Worked with Instaclustr to accelerate development and
time to stable service:
• Consulting engagement assisted with data model design
• Cassandra cluster run on Instaclustr managed service
 production ready in weeks
• Initial preference was to run on-prem
• security compliance
• did not expect cloud to meet latency requirements
• However, timeframes did not allow establishment of
internal deployment
• Used Instaclustr’s managed Cassandra service on AWS for
initial go-live.
• Now satisfied as a long-term solution
Case study
• Australia’s leading online home loan lender
• Processing over 90% of Australia’s online lending enquiries.
• Re-architecture of their platform following a major
funding round
• customer and data-centric
Case study
cont’d
• Integration-heavy environment
• Bespoke interfaces with banks, etc.
• Moving to a micro-services architecture
• Kafka as a message bus
• New architecture
• Decoupled application code from embedded data sets from
various business applications
• Unified data models from the various point solutions and
market segments
• Enabled extensive scale
 supports rapid and large growth in data as the consumer base
grows
Case study
• Cassandra
• Storage for monitoring metrics & events
• Custom collector
• RabbitMQ transport
 Will eventually move to Kafka as the transport
• Metrics are processed by Riemann
 Raises PagerDuty alerts, tickets, emails
 Writes to Cassandra
• Kafka
• Centralised logging
• Events are collected by fluentd
• Pumped into LogStash via Kafka
• Indexed via ElasticSearch
• Viewed with Kibana
Partitioning
The key to scale
• Partitioning
• using a key in your data to split the data across multiple
servers
• Manual partitioning is possible but painful
• Cassandra and Kafka make partitioning transparent
• needs conscious consideration
1.Xxxxxxxxx
xxxxx
Cassandra Cluster
Cluster
Data Center (optional)
Rack (optional, recommended)
Node
1.Xxxxxxxxx
xxxxx
Partitioning
Partitioning
Partitioning
1.Xxxxxxxxx
xxxxx
Cassandra Partitions
Queuing and Streaming at Scale
1.Xxxxxxxxx
xxxxxQueuing and Streaming at Scale
● Broker
○ Node/server/VM
● Topic
○ Logical grouping of data (category/feed/name)
○ Settings:
○ Replication
○ Partition count
○ Retention
○ Compaction
○ …
Kafka Brokers, Topics and Partitions
1.Xxxxxxxxx
xxxxxQueuing and Streaming at Scale
Partition
○ Subset of messages in a topic
■ Have a single master broker
■ Guarantee ordered delivery within that
subset
○ Number of partitions is set on topic creation
Kafka Topics and Partitions (cont’d)
1.Xxxxxxxxx
xxxxxQueuing and Streaming at Scale
• Messages are mapped to a partition by the Producer
• Randomly/round-robin
• Hash of record key
• Consumers are members of Consumer Groups
• Consumer Groups register to consume records from
Topics
• Each Consumer in a Consumer Group is the exclusive
consumer of a “fair share” of partitions in the topic.
Kafka Partitions in Action
Fitting and
architecting
for your
use case
Cassandra
• Big data
• one or more individually big (>1TB) tables
• Need to pre-determine read pattern
• at least to partition key
• Very low cost writes
• great for high write / read ratio use cases
• Ideal for small reads
• 1, 10, 100, 1000 rows at a time
• No limits to horizontal scaling (data size or ops/sec)
• provided you can find a partition that fits.
• No relational integrity
• No Foreign Keys, no JOIN’s
• Limited filtering, aggregation
Fitting and
architecting
for your
use case
Kafka
• Big data
• 5k+ message/topic/second
• Not transactional
• unlike traditional MQ tech
• although guaranteed once delivery now available
• Kafka Streams very powerful tool for analysis and
mutations on data streams
Adam Zegelin
adam@instaclustr.com
Founding Software
Engineer
Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications

More Related Content

What's hot (20)

PDF
Kafka in Context, Cloud, & Community (Simon Elliston Ball, Cloudera) Kafka Su...
HostedbyConfluent
 
PDF
Introduction to Stream Processing
Guido Schmutz
 
PDF
Building Event Driven (Micro)services with Apache Kafka
Guido Schmutz
 
PPTX
Streaming and Social Media
Joe Olson
 
PDF
Event Hub (i.e. Kafka) in Modern Data Architecture
Guido Schmutz
 
PDF
Building event-driven Microservices with Kafka Ecosystem
Guido Schmutz
 
PDF
Event Broker (Kafka) in a Modern Data Architecture
Guido Schmutz
 
PDF
Ingesting streaming data into Graph Database
Guido Schmutz
 
PDF
Introduction to Stream Processing
Guido Schmutz
 
PDF
Kafka as an Event Store - is it Good Enough?
Guido Schmutz
 
PDF
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Guido Schmutz
 
PDF
Big Data Architectures
Guido Schmutz
 
PDF
Streaming Visualisation
Guido Schmutz
 
PDF
The Bridge to Cloud (Peter Gustafsson, Confluent) London 2019 Confluent Strea...
confluent
 
PDF
Modernising Change - Lime Point - Confluent - Kong
confluent
 
PDF
Architecting Microservices Applications with Instant Analytics
confluent
 
PDF
Building event-driven (Micro)Services with Apache Kafka
Guido Schmutz
 
PDF
Microservices with Kafka Ecosystem
Guido Schmutz
 
PDF
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
confluent
 
PDF
Building Event-Driven Services with Apache Kafka
confluent
 
Kafka in Context, Cloud, & Community (Simon Elliston Ball, Cloudera) Kafka Su...
HostedbyConfluent
 
Introduction to Stream Processing
Guido Schmutz
 
Building Event Driven (Micro)services with Apache Kafka
Guido Schmutz
 
Streaming and Social Media
Joe Olson
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Guido Schmutz
 
Building event-driven Microservices with Kafka Ecosystem
Guido Schmutz
 
Event Broker (Kafka) in a Modern Data Architecture
Guido Schmutz
 
Ingesting streaming data into Graph Database
Guido Schmutz
 
Introduction to Stream Processing
Guido Schmutz
 
Kafka as an Event Store - is it Good Enough?
Guido Schmutz
 
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Guido Schmutz
 
Big Data Architectures
Guido Schmutz
 
Streaming Visualisation
Guido Schmutz
 
The Bridge to Cloud (Peter Gustafsson, Confluent) London 2019 Confluent Strea...
confluent
 
Modernising Change - Lime Point - Confluent - Kong
confluent
 
Architecting Microservices Applications with Instant Analytics
confluent
 
Building event-driven (Micro)Services with Apache Kafka
Guido Schmutz
 
Microservices with Kafka Ecosystem
Guido Schmutz
 
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
confluent
 
Building Event-Driven Services with Apache Kafka
confluent
 

Similar to Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications (20)

PPTX
Cassandra - A Basic Introduction Guide
Mohammed Fazuluddin
 
PPTX
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Lviv Startup Club
 
PDF
Better, faster, cheaper infrastructure with apache cloud stack and riak cs redux
John Burwell
 
PDF
Apache Cassandra in the Real World
Jeremy Hanna
 
PDF
Data Pipelines with Spark & DataStax Enterprise
DataStax
 
PPTX
Unit -3 _Cassandra-CRUD Operations_Practice Examples
chayapathiar1
 
PPTX
Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces
ssuser9d6aac
 
PDF
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Johnny Miller
 
PDF
Hacking apache cloud stack
Nitin Mehta
 
PDF
[RightScale Webinar] Architecting Databases in the cloud: How RightScale Doe...
RightScale
 
PPTX
cassandra_presentation_final
SergioBruno21
 
PPTX
Solving Office 365 Big Challenges using Cassandra + Spark
Anubhav Kale
 
PDF
Lessons learnt from building a globally distributed database service from the...
J On The Beach
 
PDF
NoSQL – Data Center Centric Application Enablement
DATAVERSITY
 
PPTX
NephoScale Elastic Networking
NephoScale
 
PDF
Building real time data-driven products
Lars Albertsson
 
PDF
Application Development with Apache Cassandra as a Service
WSO2
 
PPTX
cybersecurity notes for mca students for learning
VitsRangannavar
 
PPTX
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Fwdays
 
PDF
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Helena Edelson
 
Cassandra - A Basic Introduction Guide
Mohammed Fazuluddin
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Lviv Startup Club
 
Better, faster, cheaper infrastructure with apache cloud stack and riak cs redux
John Burwell
 
Apache Cassandra in the Real World
Jeremy Hanna
 
Data Pipelines with Spark & DataStax Enterprise
DataStax
 
Unit -3 _Cassandra-CRUD Operations_Practice Examples
chayapathiar1
 
Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces
ssuser9d6aac
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Johnny Miller
 
Hacking apache cloud stack
Nitin Mehta
 
[RightScale Webinar] Architecting Databases in the cloud: How RightScale Doe...
RightScale
 
cassandra_presentation_final
SergioBruno21
 
Solving Office 365 Big Challenges using Cassandra + Spark
Anubhav Kale
 
Lessons learnt from building a globally distributed database service from the...
J On The Beach
 
NoSQL – Data Center Centric Application Enablement
DATAVERSITY
 
NephoScale Elastic Networking
NephoScale
 
Building real time data-driven products
Lars Albertsson
 
Application Development with Apache Cassandra as a Service
WSO2
 
cybersecurity notes for mca students for learning
VitsRangannavar
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Fwdays
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Helena Edelson
 
Ad

More from Data Con LA (20)

PPTX
Data Con LA 2022 Keynotes
Data Con LA
 
PPTX
Data Con LA 2022 Keynotes
Data Con LA
 
PDF
Data Con LA 2022 Keynote
Data Con LA
 
PPTX
Data Con LA 2022 - Startup Showcase
Data Con LA
 
PPTX
Data Con LA 2022 Keynote
Data Con LA
 
PDF
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA
 
PPTX
Data Con LA 2022 - AI Ethics
Data Con LA
 
PDF
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA
 
PDF
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA
 
PDF
Data Con LA 2022 - Real world consumer segmentation
Data Con LA
 
PPTX
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA
 
PPTX
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA
 
PDF
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA
 
PDF
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA
 
PDF
Data Con LA 2022 - Intro to Data Science
Data Con LA
 
PDF
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA
 
PPTX
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA
 
PPTX
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA
 
PPTX
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA
 
PPTX
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA
 
Data Con LA 2022 Keynotes
Data Con LA
 
Data Con LA 2022 Keynotes
Data Con LA
 
Data Con LA 2022 Keynote
Data Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA
 
Data Con LA 2022 Keynote
Data Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA
 
Ad

Recently uploaded (20)

PDF
GDG Cloud Southlake #44: Eyal Bukchin: Tightening the Kubernetes Feedback Loo...
James Anderson
 
PPTX
Enabling the Digital Artisan – keynote at ICOCI 2025
Alan Dix
 
PDF
The Future of Product Management in AI ERA.pdf
Alyona Owens
 
PPTX
Mastering Authorization: Integrating Authentication and Authorization Data in...
Hitachi, Ltd. OSS Solution Center.
 
PDF
My Journey from CAD to BIM: A True Underdog Story
Safe Software
 
PDF
FME as an Orchestration Tool with Principles From Data Gravity
Safe Software
 
PDF
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
PPTX
Smarter Governance with AI: What Every Board Needs to Know
OnBoard
 
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
PDF
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
PPTX
Reimaginando la Ciberdefensa: De Copilots a Redes de Agentes
Cristian Garcia G.
 
PDF
Bridging CAD, IBM TRIRIGA & GIS with FME: The Portland Public Schools Case
Safe Software
 
PDF
Enhancing Environmental Monitoring with Real-Time Data Integration: Leveragin...
Safe Software
 
PDF
Next level data operations using Power Automate magic
Andries den Haan
 
PPSX
Usergroup - OutSystems Architecture.ppsx
Kurt Vandevelde
 
PDF
Java 25 and Beyond - A Roadmap of Innovations
Ana-Maria Mihalceanu
 
PPTX
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
 
PPTX
The birth and death of Stars - earth and life science
rizellemarieastrolo
 
PDF
''Taming Explosive Growth: Building Resilience in a Hyper-Scaled Financial Pl...
Fwdays
 
PDF
5 Things to Consider When Deploying AI in Your Enterprise
Safe Software
 
GDG Cloud Southlake #44: Eyal Bukchin: Tightening the Kubernetes Feedback Loo...
James Anderson
 
Enabling the Digital Artisan – keynote at ICOCI 2025
Alan Dix
 
The Future of Product Management in AI ERA.pdf
Alyona Owens
 
Mastering Authorization: Integrating Authentication and Authorization Data in...
Hitachi, Ltd. OSS Solution Center.
 
My Journey from CAD to BIM: A True Underdog Story
Safe Software
 
FME as an Orchestration Tool with Principles From Data Gravity
Safe Software
 
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
Smarter Governance with AI: What Every Board Needs to Know
OnBoard
 
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
Reimaginando la Ciberdefensa: De Copilots a Redes de Agentes
Cristian Garcia G.
 
Bridging CAD, IBM TRIRIGA & GIS with FME: The Portland Public Schools Case
Safe Software
 
Enhancing Environmental Monitoring with Real-Time Data Integration: Leveragin...
Safe Software
 
Next level data operations using Power Automate magic
Andries den Haan
 
Usergroup - OutSystems Architecture.ppsx
Kurt Vandevelde
 
Java 25 and Beyond - A Roadmap of Innovations
Ana-Maria Mihalceanu
 
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
 
The birth and death of Stars - earth and life science
rizellemarieastrolo
 
''Taming Explosive Growth: Building Resilience in a Hyper-Scaled Financial Pl...
Fwdays
 
5 Things to Consider When Deploying AI in Your Enterprise
Safe Software
 

Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications

  • 1. Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications Adam Zegelin Founding Software Engineer, Instaclustr
  • 2. 1.Xxxxxxxxx xxxxx Introduction • Adam Zegelin • Co-founded Instaclustr 5 years ago • In Canberra, Australia • Current focus is Cassandra on Kubenetes • Instaclustr • Managed Apache Cassandra, Spark and Kafka in the ☁️  AWS, GCP, Azure & IBM  3000 nodes under management  24×7×365 support • Consulting  Schema & application design  Workshops & Training • 2nd-level on-call support for on-premise deployments
  • 3. Agenda • Introduction to Cassandra and Kafka • Real-world Use Cases • Worldpay • Lendi • Instaclustr • Partitioning: the key to scale • Fitting and architecting for your use case
  • 4. • Linearly Scalable • Always Available • Multi-Region Data Store • Apache Cassandra is the leading NoSQL operational database for high-scale and high-reliability applications. • Shared nothing peer-to-peer architecture provides reliability up to 100% (with Instaclustr SLAs). • replicated data and multiple nodes capable of fulfilling queries  Node outage? Service just keeps running • full online maintenance and in-place upgrades • Low latency for operational applications • Sub-10ms P95 reads and writes achievable • Native active-active multi data center support • Geographic distribution (to meet latency requirements) • Disaster resilience • Workload isolation (analytics) • Cassandra is a data storage system, not an analytics/query engine or place to run logic
  • 5. Typical Use Cases • High write to read ratio • Data is rarely updated • Including explicit deletes • The Primary Key is known at read time • Limited filtering & aggregation • No JOINs or referential integrity • Transaction logging • Time series data • IoT status and event history • Health tracker data • Order & package statuses & tracking • Weather service history • Messages and email envelopes
  • 6. Queuing, Pub/Sub and Streaming at Scale • Apache Kafka is a distributed streaming platform • Publish and subscribe to streams of records  Similar to a message queue or EMS • Store streams of records  Fault-tolerant  Durable • Process streams of records  as they occur  randomly, any position in the stream • Replicated architecture • High-level similarities to Cassandra • Scalability • Reliability
  • 7. Typical Use Cases • As a message bus • Loose coupling between producers and consumers • Basis for micro-services • As a commit log • A store of logical transactions • Populating analytical data stores or edge caches • As a buffer • Manage backpressure & workload spikes And when combined with Kafka Streams/Spark Streaming… • As the basis of a streaming architecture • (near) real-time analytics • Data processing pipelines
  • 8. Typical Use Cases cont’d • Website activity tracking • Page views • Searches • Other user actions • Metrics • Operational monitoring data • Log aggregation • Centralized logging • Event sourcing • Application state changes • “we don't just want to see where we are, we also want to know how we got there”
  • 9. Case study • Payment processor • spun out of RBS in 2010 • merged with Vantive in US in Jan 2018 for USD 10.4B to form WorldPay Inc. • Processes • >40 Million transactions per day • for 400,000 merchants • 42% of all UK non-cash transactions
  • 10. Case study cont’d • Re-architecting of WorldPay’s XML Payment API • facilitates ~40M transactions per month • New architecture based on open source technologies • including Cassandra and Kafka • to provide scalability, availability and reduced costs • New Idempotency Service • first project to use the new architecture • provides capabilities to ensure payments are not repeated
  • 11. Case study cont’d • Challenges • Tight deployment timeframe • Very high availability expectations • Low latency requirements • Utilises Cassandra to provide highest levels of availability and scalability • 18 node cluster • 3 AWS regions (in Europe) • Leverages Cassandras tuneable consistency  QUORUM = strong consistency across regions  still able to operate with a whole region unavailable  Latency is tolerable (restricted to EU) • Simple data model with atomic reads/writes  fits well with Cassandra capability
  • 12. Case study cont’d • Worked with Instaclustr to accelerate development and time to stable service: • Consulting engagement assisted with data model design • Cassandra cluster run on Instaclustr managed service  production ready in weeks • Initial preference was to run on-prem • security compliance • did not expect cloud to meet latency requirements • However, timeframes did not allow establishment of internal deployment • Used Instaclustr’s managed Cassandra service on AWS for initial go-live. • Now satisfied as a long-term solution
  • 13. Case study • Australia’s leading online home loan lender • Processing over 90% of Australia’s online lending enquiries. • Re-architecture of their platform following a major funding round • customer and data-centric
  • 14. Case study cont’d • Integration-heavy environment • Bespoke interfaces with banks, etc. • Moving to a micro-services architecture • Kafka as a message bus • New architecture • Decoupled application code from embedded data sets from various business applications • Unified data models from the various point solutions and market segments • Enabled extensive scale  supports rapid and large growth in data as the consumer base grows
  • 15. Case study • Cassandra • Storage for monitoring metrics & events • Custom collector • RabbitMQ transport  Will eventually move to Kafka as the transport • Metrics are processed by Riemann  Raises PagerDuty alerts, tickets, emails  Writes to Cassandra • Kafka • Centralised logging • Events are collected by fluentd • Pumped into LogStash via Kafka • Indexed via ElasticSearch • Viewed with Kibana
  • 16. Partitioning The key to scale • Partitioning • using a key in your data to split the data across multiple servers • Manual partitioning is possible but painful • Cassandra and Kafka make partitioning transparent • needs conscious consideration
  • 17. 1.Xxxxxxxxx xxxxx Cassandra Cluster Cluster Data Center (optional) Rack (optional, recommended) Node
  • 22. 1.Xxxxxxxxx xxxxxQueuing and Streaming at Scale ● Broker ○ Node/server/VM ● Topic ○ Logical grouping of data (category/feed/name) ○ Settings: ○ Replication ○ Partition count ○ Retention ○ Compaction ○ … Kafka Brokers, Topics and Partitions
  • 23. 1.Xxxxxxxxx xxxxxQueuing and Streaming at Scale Partition ○ Subset of messages in a topic ■ Have a single master broker ■ Guarantee ordered delivery within that subset ○ Number of partitions is set on topic creation Kafka Topics and Partitions (cont’d)
  • 24. 1.Xxxxxxxxx xxxxxQueuing and Streaming at Scale • Messages are mapped to a partition by the Producer • Randomly/round-robin • Hash of record key • Consumers are members of Consumer Groups • Consumer Groups register to consume records from Topics • Each Consumer in a Consumer Group is the exclusive consumer of a “fair share” of partitions in the topic. Kafka Partitions in Action
  • 25. Fitting and architecting for your use case Cassandra • Big data • one or more individually big (>1TB) tables • Need to pre-determine read pattern • at least to partition key • Very low cost writes • great for high write / read ratio use cases • Ideal for small reads • 1, 10, 100, 1000 rows at a time • No limits to horizontal scaling (data size or ops/sec) • provided you can find a partition that fits. • No relational integrity • No Foreign Keys, no JOIN’s • Limited filtering, aggregation
  • 26. Fitting and architecting for your use case Kafka • Big data • 5k+ message/topic/second • Not transactional • unlike traditional MQ tech • although guaranteed once delivery now available • Kafka Streams very powerful tool for analysis and mutations on data streams

Editor's Notes

  • #11: Lower throughput system