Using MongoDB with Kafka
Percona Live Online
20-21 October 2020
Antonios Giannopoulos
Senior Database Administrator
Pedro Albuquerque
Principal Database Engineer
Agenda
● Definitions
● Use cases
● Using MongoDB as a source
● Using MongoDB as a sink
● Real world use case: Transferwise
● MongoDB to Kafka Connectors
● Takeaways
What is MongoDB?
● Document-oriented Database
● Flexible JSON-style schema
Use-Cases:
● Pretty much any workload
● After version 4.0/4.2 supports ACID transactions
● Frequent schema changes
What is Apache Kafka?
● Distributed event streaming platform
Use-Cases:
● Publish and subscribe to stream of events
● Async RPC-style calls between services
● Log replay
● CQRS and Event Sourcing
● Real-time analytics
How can they work
together?
Use cases - Topologies
MongoDB as a
sink
MongoDB as a
source
MongoDB as a
source/sink
MongoDB as a Source
Selective Replication/EL/ETL
MongoDB doesn’t support selective Replication
Oplog or Change Streams (prefered method)
Kafka cluster, with one topic per collection
MongoDB to Kafka connectors
Debezium
Supports both Replica-set and Sharded clusters
Uses the oplog to capture and create events
Selective Replication: [database|collection].[include|exclude].list
EL: field.exclude.list & field.renames
snapshot.mode = initial | never
tasks.max
initial.sync.max.threads
MongoDB Kafka Source
Connector
- Supports both Replica-set and Sharded clusters
- Uses MongoDB Change Streams to create events
- Selective Replication:
- mongodb db.collection -> db.collection kafka topic
- Multi-source replication:
- multiple collections to single kafka topic
- EL: Filter or modify change events with MongoDB aggregation
pipeline
- Sync historical data (copy.existing=true)
- copy.existing.max.threads
MongoDB as a Sink
Throttling
Throttling* (is a forbidden word but) is extremely useful:
- During MongoDB scaling
- Planned or unplanned maintenances
- Unexpected growth events
- Provides workload priorities
The need for throttling: MongoDB 4.2 Flow control
You can configure Flow Control on the Replica-Set level
(Config settings: enableFlowControl, flowControlTargetLagSeconds)
Kafka provides a more flexible “flow control” that you can easily manage
* Throttling may not be suitable for every workloads
Throttling
The aim is to rate limit write operations
Kafka supports higher write throughput & scales faster
Kafka scales:
- Adding partitions
- Add brokers
- Add clusters
- Minimal application changes
MongoDB scales as well:
- Adding shards
- Balancing takes time
- Balancing affects performance
Throttling
Quotas can be applied to (user, client-id), user or client-id groups
producer_byte_rate : The total rate limit for the user’s producers without a client-id quota override
consumer_byte_rate : The total rate limit for the user’s consumers without a client-id quota override
Static changes: /config/users/ & /config/clients (watch out the override order)
Dynamic changes:
> bin/kafka-configs.sh --bootstrap-server <host>:<port> --describe --entity-type users|clients --entity-name user|client-id
> bin/kafka-configs.sh --bootstrap-server <host>:<port> --alter --add-config
'producer_byte_rate=1024,consumer_byte_rate=2048' --entity-type users|clients --entity-name user|client-id
Throttling
Evaluate a MongoDB metric - Read/Write Queues , Latency etc
> db.serverStatus().globalLock.currentQueue.writers
0
Prometheus Alert Manager
- Tons of integrations
- Groups alerts
- Notify on resolution
Consumer
Producer
kafka-configs.sh
PROD
or your
favorite
integration...
Prometheus monitors Production
Workload isolation
Kafka handles specific workloads better
An successful event website (for example: Percona Live 2020)
- Contains a stream of social media interactions
- Kafka serves the raw stream - all interactions
- MongoDB serves aggregated data - for example top tags
Raw steam is native for Kafka as its a commit-log
MongoDB rich aggregation framework provides aggregated data
Workload isolation
Continuous aggregations
Useful for use-cases that raw data are useless (or not very useful)
Kafka streams is your friend - Windowing
Examples:
Meteo stations sending metrics every second
MongoDB serves the min(),max() for every hour
Website statistics - counters
MongoDB gets updated every N seconds with hits summary
MongoDB gets updated with hits per minute/hour
Journal
Data recovery is a usual request in the databases world
Human error, application bugs, hardware failures are some reasons
Kafka can help on partial recovery or point in time recovery
A partial data recovery may require restore of a full backup
Restore changes from a full backup, Replay the changes from Kafka
Journal
TransferWise:
Activity Service
● Customer action
● Many types
● Different status
● Variety of categories
● Repository of all activities
● List of customer’s actions
● Activity list
● Ability to search and filter
Processors
TransferWise:
Activity Service
Balance
Plastic
Transfer
Activity
Updates
Activity
Group
Aggrs
Activity
Deletes
Activity
Updates
Consumer
Activity
Group
Aggrs
Consumer
Activity
Deletes
Consumer
Updates
Processor
Aggrs
Processor
Deletes
Processor
Producers ConsumersTopics
spring-kafka
Producer configuration
private ProducerFactory<Object, Object> producerFactory(KafkaProperties kafkaProperties) {
return new DefaultKafkaProducerFactory<>(
Map.of(
ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaProperties.getServers(),
ProducerConfig.CLIENT_ID_CONFIG, kafkaProperties.getClientId(),
ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, JsonSerializer.class,
ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, JsonSerializer.class
)
);
}
public KafkaTemplate<Object, Object> kafkaTemplate(KafkaProperties kafkaProperties) {
return new KafkaTemplate<>(producerFactory(kafkaProperties));
}
spring-kafka
Send message
public void send(String key, Object value, Runnable successCallback) {
String jsonBody = value.getClass() == String.class ? (String) value : JSON_SERIALIZER.writeAsJson(value);
kafkaTemplate.send(topic, key, jsonBody)
.addCallback(new ListenableFutureCallback<>() {
@Override
public void onFailure(Throwable ex) {
log.error("Failed sending message with key {} to {}", key, topic);
}
@Override
public void onSuccess(SendResult<String, String> result) {
successCallback.run();
}
});
}
spring-kafka
Consumer configuration
@EnableKafka
private ConsumerFactory<String, String> consumerFactory(KafkaProperties kafkaProperties) {
return new DefaultKafkaConsumerFactory<>(
Map.of(
ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaProperties.getServers(),
ConsumerConfig.CLIENT_ID_CONFIG, kafkaProperties.getClientId(),
ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, JsonDeserializer.class,
ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, JsonDeserializer.class
));}
ConcurrentKafkaListenerContainerFactory<String, String> factory = buildListenerContainerFactory(objectMapper,
kafkaProperties);
KafkaRetryConfig retryConfig = new KafkaRetryConfig(KafkaProducerFactory.kafkaTemplate(kafkaProperties));
@KafkaListener(topics = "${activity-service.kafka.topics.activityUpdates}", containerFactory =
ActivityUpdatesKafkaListenersConfig.ACTIVITY_UPDATES_KAFKA_LISTENER_FACTORY)
TransferWise:
Activity Service
Balance
Plastic
Transfer
Activity
Updates
Activity
Group
Aggrs
Activity
Deletes
Activity
Updates
Consumer
Activity
Group
Aggrs
Consumer
Activity
Deletes
Consumer
Updates
Processor
Aggrs
Processor
Deletes
Processor
MongoDB Kafka Sink Connector
name=mongodb-sink-example
topics=topicA,topicB
connector.class=com.mongodb.kafka.connect.MongoSinkConnector
tasks.max=1
# Specific global MongoDB Sink Connector configuration
connection.uri=mongodb://mongod1:27017,mongod2:27017,mongod3:27017
database=perconalive
collection=slides
MongoDB Kafka Sink
connector: Configuration
MongoDB Kafka Sink
connector: Configuration
# Message types
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://localhost:8081
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost:8081
MongoDB Kafka Sink
connector: Configuration
## Document manipulation settings
[key|value].projection.type=AllowList
[key|value].projection.list=name,age,address.post_code
## Id Strategy
document.id.strategy=com.mongodb.kafka.connect.sink.processor.id.strategy.BsonOidStrategy
post.processor.chain=com.mongodb.kafka.connect.sink.processor.DocumentIdAdder
MongoDB Kafka Sink
connector: Configuration
## Dead letter queue
errors.tolerance=all
errors.log.enable=true
errors.log.include.messages=true
errors.deadletterqueue.topic.name=perconalive.deadletterqueue
errors.deadletterqueue.context.headers.enable=true
Recap/Takeaways
There are tons of use-cases for MongoDB & Kafka
We described couple of use-cases
● Selective replication/ETL
● Throttling/Journaling/Workload Isolation
Kafka has a rich ecosystem that can expand the use-cases
Connectors is your friend, but you can build your own connector
Large orgs like TransferWise use MongoDB & Kafka for complex projects
- Thank you!!! -
- Q&A -
Big thanks to:
John Moore, Principal Engineer @Eventador
Diego Furtado, Senior Software Engineer @TransferWise
for their guidance

More Related Content

PDF
How to upgrade to MongoDB 4.0 - Percona Europe 2018
PPTX
Sharded cluster tutorial
PPTX
Sharding in MongoDB 4.2 #what_is_new
PDF
Upgrading to MongoDB 4.0 from older versions
PDF
Triggers in MongoDB
PPTX
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2
PDF
Elastic 101 tutorial - Percona Europe 2018
PDF
Managing data and operation distribution in MongoDB
How to upgrade to MongoDB 4.0 - Percona Europe 2018
Sharded cluster tutorial
Sharding in MongoDB 4.2 #what_is_new
Upgrading to MongoDB 4.0 from older versions
Triggers in MongoDB
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2
Elastic 101 tutorial - Percona Europe 2018
Managing data and operation distribution in MongoDB

What's hot (20)

PPTX
MongoDB - Sharded Cluster Tutorial
PPTX
MongoDB - External Authentication
PPTX
Triggers In MongoDB
PPTX
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
PPTX
MongoDB Chunks - Distribution, Splitting, and Merging
PPTX
MongoDB – Sharded cluster tutorial - Percona Europe 2017
PPTX
PDF
Managing Data and Operation Distribution In MongoDB
PDF
Like loggly using open source
PDF
Fluentd meetup
PDF
Fluentd vs. Logstash for OpenStack Log Management
PDF
Redis modules 101
PPT
ELK stack at weibo.com
PDF
Sessionization with Spark streaming
PPTX
MongoDB Scalability Best Practices
PPTX
Tuning Elasticsearch Indexing Pipeline for Logs
PDF
Collect distributed application logging using fluentd (EFK stack)
KEY
Building Scalable, Distributed Job Queues with Redis and Redis::Client
PPTX
Attack monitoring using ElasticSearch Logstash and Kibana
PDF
Using Apache Spark to Solve Sessionization Problem in Batch and Streaming
MongoDB - Sharded Cluster Tutorial
MongoDB - External Authentication
Triggers In MongoDB
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB Chunks - Distribution, Splitting, and Merging
MongoDB – Sharded cluster tutorial - Percona Europe 2017
Managing Data and Operation Distribution In MongoDB
Like loggly using open source
Fluentd meetup
Fluentd vs. Logstash for OpenStack Log Management
Redis modules 101
ELK stack at weibo.com
Sessionization with Spark streaming
MongoDB Scalability Best Practices
Tuning Elasticsearch Indexing Pipeline for Logs
Collect distributed application logging using fluentd (EFK stack)
Building Scalable, Distributed Job Queues with Redis and Redis::Client
Attack monitoring using ElasticSearch Logstash and Kibana
Using Apache Spark to Solve Sessionization Problem in Batch and Streaming
Ad

Similar to Using MongoDB with Kafka - Use Cases and Best Practices (20)

PDF
Kafka syed academy_v1_introduction
PDF
Introduction to apache kafka
PDF
Introduction to Apache Kafka
PDF
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
PDF
Apache Kafka - From zero to hero
PDF
Kafka zero to hero
PPTX
Streaming in Practice - Putting Apache Kafka in Production
PDF
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
PPTX
Kafka RealTime Streaming
PDF
Devoxx university - Kafka de haut en bas
PDF
Kafka internals
PDF
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
PDF
Building zero data loss pipelines with apache kafka
PPTX
Kafka.pptx (uploaded from MyFiles SomnathDeb_PC)
PPTX
Kafka 0.9, Things you should know
PDF
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
PDF
Kafka used at scale to deliver real-time notifications
PDF
Data pipeline with kafka
PDF
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
PDF
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Kafka syed academy_v1_introduction
Introduction to apache kafka
Introduction to Apache Kafka
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Apache Kafka - From zero to hero
Kafka zero to hero
Streaming in Practice - Putting Apache Kafka in Production
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Kafka RealTime Streaming
Devoxx university - Kafka de haut en bas
Kafka internals
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Building zero data loss pipelines with apache kafka
Kafka.pptx (uploaded from MyFiles SomnathDeb_PC)
Kafka 0.9, Things you should know
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka used at scale to deliver real-time notifications
Data pipeline with kafka
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Ad

More from Antonios Giannopoulos (6)

PDF
Comparing Geospatial Implementation in MongoDB, Postgres, and Elastic
PDF
Percona Live 2017 ­- Sharded cluster tutorial
PPTX
How sitecore depends on mongo db for scalability and performance, and what it...
PDF
Antonios Giannopoulos Percona 2016 WiredTiger Configuration Variables
PPTX
Introduction to Polyglot Persistence
PDF
MongoDB Sharding Fundamentals
Comparing Geospatial Implementation in MongoDB, Postgres, and Elastic
Percona Live 2017 ­- Sharded cluster tutorial
How sitecore depends on mongo db for scalability and performance, and what it...
Antonios Giannopoulos Percona 2016 WiredTiger Configuration Variables
Introduction to Polyglot Persistence
MongoDB Sharding Fundamentals

Recently uploaded (20)

PDF
How Tridens DevSecOps Ensures Compliance, Security, and Agility
PDF
novaPDF Pro 11.9.482 Crack + License Key [Latest 2025]
DOCX
How to Use SharePoint as an ISO-Compliant Document Management System
PDF
Visual explanation of Dijkstra's Algorithm using Python
PPTX
Computer Software - Technology and Livelihood Education
PDF
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
PPTX
Introduction to Windows Operating System
PDF
Practical Indispensable Project Management Tips for Delivering Successful Exp...
DOCX
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
PDF
Microsoft Office 365 Crack Download Free
PDF
E-Commerce Website Development Companyin india
PDF
Internet Download Manager IDM Crack powerful download accelerator New Version...
PDF
Guide to Food Delivery App Development.pdf
PDF
PDF-XChange Editor Plus 10.7.0.398.0 Crack Free Download Latest 2025
PDF
AI Guide for Business Growth - Arna Softech
PPTX
Python is a high-level, interpreted programming language
PDF
Introduction to Ragic - #1 No Code Tool For Digitalizing Your Business Proces...
PPTX
Download Adobe Photoshop Crack 2025 Free
PPTX
Tech Workshop Escape Room Tech Workshop
PDF
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
How Tridens DevSecOps Ensures Compliance, Security, and Agility
novaPDF Pro 11.9.482 Crack + License Key [Latest 2025]
How to Use SharePoint as an ISO-Compliant Document Management System
Visual explanation of Dijkstra's Algorithm using Python
Computer Software - Technology and Livelihood Education
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
Introduction to Windows Operating System
Practical Indispensable Project Management Tips for Delivering Successful Exp...
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
Microsoft Office 365 Crack Download Free
E-Commerce Website Development Companyin india
Internet Download Manager IDM Crack powerful download accelerator New Version...
Guide to Food Delivery App Development.pdf
PDF-XChange Editor Plus 10.7.0.398.0 Crack Free Download Latest 2025
AI Guide for Business Growth - Arna Softech
Python is a high-level, interpreted programming language
Introduction to Ragic - #1 No Code Tool For Digitalizing Your Business Proces...
Download Adobe Photoshop Crack 2025 Free
Tech Workshop Escape Room Tech Workshop
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access

Using MongoDB with Kafka - Use Cases and Best Practices

  • 1. Using MongoDB with Kafka Percona Live Online 20-21 October 2020
  • 2. Antonios Giannopoulos Senior Database Administrator Pedro Albuquerque Principal Database Engineer
  • 3. Agenda ● Definitions ● Use cases ● Using MongoDB as a source ● Using MongoDB as a sink ● Real world use case: Transferwise ● MongoDB to Kafka Connectors ● Takeaways
  • 4. What is MongoDB? ● Document-oriented Database ● Flexible JSON-style schema Use-Cases: ● Pretty much any workload ● After version 4.0/4.2 supports ACID transactions ● Frequent schema changes
  • 5. What is Apache Kafka? ● Distributed event streaming platform Use-Cases: ● Publish and subscribe to stream of events ● Async RPC-style calls between services ● Log replay ● CQRS and Event Sourcing ● Real-time analytics
  • 6. How can they work together?
  • 7. Use cases - Topologies MongoDB as a sink MongoDB as a source MongoDB as a source/sink
  • 8. MongoDB as a Source
  • 9. Selective Replication/EL/ETL MongoDB doesn’t support selective Replication Oplog or Change Streams (prefered method) Kafka cluster, with one topic per collection MongoDB to Kafka connectors
  • 10. Debezium Supports both Replica-set and Sharded clusters Uses the oplog to capture and create events Selective Replication: [database|collection].[include|exclude].list EL: field.exclude.list & field.renames snapshot.mode = initial | never tasks.max initial.sync.max.threads
  • 11. MongoDB Kafka Source Connector - Supports both Replica-set and Sharded clusters - Uses MongoDB Change Streams to create events - Selective Replication: - mongodb db.collection -> db.collection kafka topic - Multi-source replication: - multiple collections to single kafka topic - EL: Filter or modify change events with MongoDB aggregation pipeline - Sync historical data (copy.existing=true) - copy.existing.max.threads
  • 12. MongoDB as a Sink
  • 13. Throttling Throttling* (is a forbidden word but) is extremely useful: - During MongoDB scaling - Planned or unplanned maintenances - Unexpected growth events - Provides workload priorities The need for throttling: MongoDB 4.2 Flow control You can configure Flow Control on the Replica-Set level (Config settings: enableFlowControl, flowControlTargetLagSeconds) Kafka provides a more flexible “flow control” that you can easily manage * Throttling may not be suitable for every workloads
  • 14. Throttling The aim is to rate limit write operations Kafka supports higher write throughput & scales faster Kafka scales: - Adding partitions - Add brokers - Add clusters - Minimal application changes MongoDB scales as well: - Adding shards - Balancing takes time - Balancing affects performance
  • 15. Throttling Quotas can be applied to (user, client-id), user or client-id groups producer_byte_rate : The total rate limit for the user’s producers without a client-id quota override consumer_byte_rate : The total rate limit for the user’s consumers without a client-id quota override Static changes: /config/users/ & /config/clients (watch out the override order) Dynamic changes: > bin/kafka-configs.sh --bootstrap-server <host>:<port> --describe --entity-type users|clients --entity-name user|client-id > bin/kafka-configs.sh --bootstrap-server <host>:<port> --alter --add-config 'producer_byte_rate=1024,consumer_byte_rate=2048' --entity-type users|clients --entity-name user|client-id
  • 16. Throttling Evaluate a MongoDB metric - Read/Write Queues , Latency etc > db.serverStatus().globalLock.currentQueue.writers 0 Prometheus Alert Manager - Tons of integrations - Groups alerts - Notify on resolution Consumer Producer kafka-configs.sh PROD or your favorite integration... Prometheus monitors Production
  • 17. Workload isolation Kafka handles specific workloads better An successful event website (for example: Percona Live 2020) - Contains a stream of social media interactions - Kafka serves the raw stream - all interactions - MongoDB serves aggregated data - for example top tags Raw steam is native for Kafka as its a commit-log MongoDB rich aggregation framework provides aggregated data
  • 19. Continuous aggregations Useful for use-cases that raw data are useless (or not very useful) Kafka streams is your friend - Windowing Examples: Meteo stations sending metrics every second MongoDB serves the min(),max() for every hour Website statistics - counters MongoDB gets updated every N seconds with hits summary MongoDB gets updated with hits per minute/hour
  • 20. Journal Data recovery is a usual request in the databases world Human error, application bugs, hardware failures are some reasons Kafka can help on partial recovery or point in time recovery A partial data recovery may require restore of a full backup Restore changes from a full backup, Replay the changes from Kafka
  • 22. TransferWise: Activity Service ● Customer action ● Many types ● Different status ● Variety of categories ● Repository of all activities ● List of customer’s actions ● Activity list ● Ability to search and filter
  • 24. spring-kafka Producer configuration private ProducerFactory<Object, Object> producerFactory(KafkaProperties kafkaProperties) { return new DefaultKafkaProducerFactory<>( Map.of( ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaProperties.getServers(), ProducerConfig.CLIENT_ID_CONFIG, kafkaProperties.getClientId(), ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, JsonSerializer.class, ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, JsonSerializer.class ) ); } public KafkaTemplate<Object, Object> kafkaTemplate(KafkaProperties kafkaProperties) { return new KafkaTemplate<>(producerFactory(kafkaProperties)); }
  • 25. spring-kafka Send message public void send(String key, Object value, Runnable successCallback) { String jsonBody = value.getClass() == String.class ? (String) value : JSON_SERIALIZER.writeAsJson(value); kafkaTemplate.send(topic, key, jsonBody) .addCallback(new ListenableFutureCallback<>() { @Override public void onFailure(Throwable ex) { log.error("Failed sending message with key {} to {}", key, topic); } @Override public void onSuccess(SendResult<String, String> result) { successCallback.run(); } }); }
  • 26. spring-kafka Consumer configuration @EnableKafka private ConsumerFactory<String, String> consumerFactory(KafkaProperties kafkaProperties) { return new DefaultKafkaConsumerFactory<>( Map.of( ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaProperties.getServers(), ConsumerConfig.CLIENT_ID_CONFIG, kafkaProperties.getClientId(), ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, JsonDeserializer.class, ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, JsonDeserializer.class ));} ConcurrentKafkaListenerContainerFactory<String, String> factory = buildListenerContainerFactory(objectMapper, kafkaProperties); KafkaRetryConfig retryConfig = new KafkaRetryConfig(KafkaProducerFactory.kafkaTemplate(kafkaProperties)); @KafkaListener(topics = "${activity-service.kafka.topics.activityUpdates}", containerFactory = ActivityUpdatesKafkaListenersConfig.ACTIVITY_UPDATES_KAFKA_LISTENER_FACTORY)
  • 28. name=mongodb-sink-example topics=topicA,topicB connector.class=com.mongodb.kafka.connect.MongoSinkConnector tasks.max=1 # Specific global MongoDB Sink Connector configuration connection.uri=mongodb://mongod1:27017,mongod2:27017,mongod3:27017 database=perconalive collection=slides MongoDB Kafka Sink connector: Configuration
  • 29. MongoDB Kafka Sink connector: Configuration # Message types key.converter=io.confluent.connect.avro.AvroConverter key.converter.schema.registry.url=http://localhost:8081 value.converter=io.confluent.connect.avro.AvroConverter value.converter.schema.registry.url=http://localhost:8081
  • 30. MongoDB Kafka Sink connector: Configuration ## Document manipulation settings [key|value].projection.type=AllowList [key|value].projection.list=name,age,address.post_code ## Id Strategy document.id.strategy=com.mongodb.kafka.connect.sink.processor.id.strategy.BsonOidStrategy post.processor.chain=com.mongodb.kafka.connect.sink.processor.DocumentIdAdder
  • 31. MongoDB Kafka Sink connector: Configuration ## Dead letter queue errors.tolerance=all errors.log.enable=true errors.log.include.messages=true errors.deadletterqueue.topic.name=perconalive.deadletterqueue errors.deadletterqueue.context.headers.enable=true
  • 32. Recap/Takeaways There are tons of use-cases for MongoDB & Kafka We described couple of use-cases ● Selective replication/ETL ● Throttling/Journaling/Workload Isolation Kafka has a rich ecosystem that can expand the use-cases Connectors is your friend, but you can build your own connector Large orgs like TransferWise use MongoDB & Kafka for complex projects
  • 33. - Thank you!!! - - Q&A - Big thanks to: John Moore, Principal Engineer @Eventador Diego Furtado, Senior Software Engineer @TransferWise for their guidance