SlideShare a Scribd company logo
S U R I N D E R
2 N D M A R C H 2 0 2 2
Apache Ignite
Agenda
 Setting up context
 Cache Evolution
 Apache Ignite
 Data Queries
 Compute
 Data Partitioning
 Eviction policies
 Performance Comparison
Stream Consuming Application
Too many read, write and updates to
database
Limited connections
Can slow down stream under load
Stream Consuming Application: 1
Cache serves as first data layer
Manage persisting data to database
Processing much faster due to no direct DB access
Stream Consuming Application cont…
Cache serves as first class in memory data database
Manage persisting data to native storage
No DB connections, mechanism overhead
Cache Evolution
Cache Evolution
 Distributed caches
 Shared cache for app instances
 Beyond local RAM capacity
 Ease of maintenance
 No auto sync with DB(yes/no) ?
 In App caches
 Cache results
 More responsive application
 Reduce load on DB
 Limited to local RAM size
Cache Evolution : Data grids
 Benefits
 Distributed caches with brains
 Compute capabilities
 DB Read/Write through
 Collocated processing
 Better scalability
Cache Evolution : In memory computing
 Memory centric storage
 Scalable to store data in TBs
 Sql, transactions support
 Collocate related data
 DB Read/Write through
 Pluggable to ext databases
 Native storage on disk
 No Ram warm up
 Compute capabilities
 Map Reduce
 Collocated processing
 Better scalability
What is Apache Ignite ?
 A distributed cache
 A Distributed in memory data grid
 A Distributed in memory database
 High-performance computing with in-memory
 ANSI 99 SQL Compliant
 Transactional operations
 SQL transactions in beta
Ignite cluster
 Group of nodes
 Types:
 Server : stores data, baseline node
 Thick client node : doesn’t store data
 Thin client node : not part of cluster
 Attribute based grouping possible
 Scalable
 Fault tolerant
 Data consistency
 Demo
Data Grid
 Distributed In-Memory Caching
 Read/Write through
 Data Consistency
 Off-Heap Storage
 Distributed SQL
 ACID Support
 Transactions
Keep required backup
Everyone knows
everything
Cache Modes
Cache Queries…
 Scan Query : Return data matching BiPredicate
 Predicate sent to each node,
 Node scan its cache
 Data consolidated by requested node
 Sql Query : load data based on sql given
 Needs indexing to be enabled
 Registering indexing in config
 Annotations for fields visibility
 Other queries:
 Text Query
 Index query
 Continuous query
Data Partitioning
 Partitioned caches
 Backups
 Ensures data availability in node failures
 Read from backup node when primary node leaves
 Demo
Demo Queries
 Scan Query
 Sql Query
 Data collocation
 Next week : this slide onwards
Data collocation
 Collocate related data for performance
 All Employees of dept. can be stored together
 Affinity on dept. attribute
 Only key attribute can be used in affinity key
 Performant CRUD operations
 Avoids network trips
 Reduced latency
 Can cause hot nodes if used inappropriately
Compute Tasks
 Run distributed computations on grid
 Tasks can be run on selected nodes
 Ignite manages the task management
 E.g. node specific aggregates
 List each dept.. students stored on each node
 Can be parallelized
Continuous Queries
 Exactly once processing semantic
 3 basic components
 Cache to monitor updates
 Remote filter to look for data changes
 Local listener to act upon data changes
 Optional initial query to process initial data
 Used to capture data changes on cache
 Use case: Reacting to cache entry change
 Listen for particular state of cache value
 Process the state
 Move to next state
Eviction Policies
 On Heap [cache level]
 LRU : Recommended when in doubt
 FIFO : It ignores the element access order
 Sorted : Sorted according to key for order
 Off Heap [data region level]
 Random LRU:
 Random-2 LRU
 Persistence On [Page replacement]
 Random-LRU
 Segmented-LRU
 Clock
Persistent Store
 CacheStoreAdapter extendable
 Read through
 Write through
 Write behind
 Works behind the cache API’s
Data Distribution
 Why distributing data ?
 Data size can go beyond node limits
 Load beyond node processing limits
 Solutions:
 partition the dataset
 Migrate to distributed database
 Both will have set of nodes : topology
Data Distribution Soln.
 Distribution Requirements:
 Algorithm
 Distribution Uniformity
 Minimal disruption
 Approaches:
 Mod N
 Consistent Hashing
 Rendezvous(HRW)
Data Distribution in Ignite
 Mapping partition to node
 Rendezvous Hashing
 Cluster changes moves partitions
 Mapping key to partition
 Mod N
 Partitions are fixed
 1024 by default
Data Rebalancing
 Used when new node join the grid
 In memory grids start rebalancing immediately
 Enabled manually when persistence is enabled
 Possibly more backups than configured in such scenarios
 Rebalance Modes
 SYNC: cache calls blocked until rebalancing is completed
 ASYNC: rebalancing happen in background. Cache respond immediately
 NONE : No rebalancing, cache loaded on demand when required or explicitly loading
Partition Map Exchange
 Triggered when partitions need to
moved across nodes
 A node joins/leaves the cluster
 New cache is created/stopped
 An index is created etc.
 Cluster waits for ongoing
operations
 Oldest/youngest node is
coordinator
Native Storage Architecture
 Work directory
 Binary data : internal metadata
 Marshaler : marshaler info
 DB
 Lock file : used to ensure node lock
 node dir.(s) : cache partitions
 cp dir. (checkpoint start end markers)
 WAL dir.
 node(s) dir. : wal segments
 Archive dir.
 Node(s) dir. : wal segments
Dirty Pages
 Pages are always on disk, optionally in RAM
 Each cache update is written to RAM and
appended to WAL
 Cache operation cause dirty pages
 Dirty pages are accumulated in RAM
 Checkpoint: batch of dirty pages written to
disk
 WAL file cleared after checkpoint
 Updates between checkpoints are logged
 Nodes crashes between checkpoints ?
 WAL to the rescue
Apache Ignite ~ Cassandra
 Insert and Update performance is
comparable
 Read and mixed(read + update) are 2x+
better in ignite
 Cassandra UPADTE outperforms under high
load
 Cassandra demands upfront query patterns
 Major model changes/new tables if
 Query changes required
 New queries with different requirements needed
 Ignite support collocated/non collocated
joins and hence
 Queries can be created just like old school sql
 No major changes required except creating few
indexes if needed
 Check reference slide for more
Next steps
 Read docs
 Get hands dirty with ignite
 Explore queries
 Ignite compute tasks
 Native persistence
 Third party persistence
References
 https://ignite.apache.org/docs/latest/
 https://www.youtube.com/watch?v=eMs_2vEsbBk
 https://dzone.com/articles/apache-ignite-client-connectors-variety
 https://apacheignite.readme.io/docs/leader-election
 https://cwiki.apache.org/confluence/display/IGNITE/%28Partition+Map%29+Exc
hange+-+under+the+hood

 https://data-science-blog.com/blog/2020/09/25/in-memory-data-grid-vs-
distributed-cache-which-is-best/
 https://hazelcast.com/blog/imdg-vs-imdb-a-business-level-perspective/
 https://www.gridgain.com/resources/blog/apacher-ignitetm-and-apacher-
cassandratm-benchmarks-power-in-memory-computing
Questions

More Related Content

PPTX
Introduction to Apache Kafka
PPTX
PostgreSQL and CockroachDB SQL
PDF
Introduction to the Disruptor
PPTX
Securing Hadoop with Apache Ranger
PDF
Time Series Data with InfluxDB
PDF
How netflix manages petabyte scale apache cassandra in the cloud
PPTX
Apache Kafka 0.8 basic training - Verisign
PDF
Introduction to Apache Kafka
PostgreSQL and CockroachDB SQL
Introduction to the Disruptor
Securing Hadoop with Apache Ranger
Time Series Data with InfluxDB
How netflix manages petabyte scale apache cassandra in the cloud
Apache Kafka 0.8 basic training - Verisign

What's hot (20)

PPTX
Graylog Engineering - Design Your Architecture
PPTX
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
PPTX
Transparent Encryption in HDFS
PDF
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
PDF
Building Data Lakes with Apache Airflow
PDF
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
PDF
The journey toward a self-service data platform at Netflix - sf 2019
PPTX
Apache Pulsar First Overview
PDF
Learn to Use Databricks for the Full ML Lifecycle
PPTX
Introduction to Kafka Cruise Control
PDF
Spark overview
PDF
Securing Kafka
PPTX
Apache hive
PDF
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
PDF
Elasticsearch in Netflix
KEY
Introduction to memcached
PPTX
Analyzing 1.2 Million Network Packets per Second in Real-time
PDF
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
PPTX
Moving Beyond Lambda Architectures with Apache Kudu
PPTX
Kafka Connect - debezium
Graylog Engineering - Design Your Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Transparent Encryption in HDFS
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Building Data Lakes with Apache Airflow
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
The journey toward a self-service data platform at Netflix - sf 2019
Apache Pulsar First Overview
Learn to Use Databricks for the Full ML Lifecycle
Introduction to Kafka Cruise Control
Spark overview
Securing Kafka
Apache hive
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
Elasticsearch in Netflix
Introduction to memcached
Analyzing 1.2 Million Network Packets per Second in Real-time
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Moving Beyond Lambda Architectures with Apache Kudu
Kafka Connect - debezium
Ad

Similar to Apache ignite as in-memory computing platform (20)

PDF
The next-phase-of-distributed-systems-with-apache-ignite
PPTX
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
PDF
OSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabric
PDF
Apache Ignite
PDF
Nike tech-talk-intro-to-apache-ignite
PPTX
Apache ignite v1.3
PPTX
IMCSummite 2016 Breakout - Nikita Ivanov - Apache Ignite 2.0 Towards a Conver...
PPTX
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
PDF
Spark Summit EU talk by Christos Erotocritou
PDF
In-Memory Computing Essentials
PPTX
In-Memory Computing Essentials for Software Engineers
PPTX
Apache ignite Datagrid
PPTX
GemFire In-Memory Data Grid
PDF
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
PDF
Lambda architecture
PPTX
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
PPT
An Engineer's Intro to Oracle Coherence
PPTX
From cache to in-memory data grid. Introduction to Hazelcast.
PPTX
GemFire In Memory Data Grid
PDF
How we broke Apache Ignite by adding persistence, by Stephen Darlington (Grid...
The next-phase-of-distributed-systems-with-apache-ignite
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
OSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabric
Apache Ignite
Nike tech-talk-intro-to-apache-ignite
Apache ignite v1.3
IMCSummite 2016 Breakout - Nikita Ivanov - Apache Ignite 2.0 Towards a Conver...
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
Spark Summit EU talk by Christos Erotocritou
In-Memory Computing Essentials
In-Memory Computing Essentials for Software Engineers
Apache ignite Datagrid
GemFire In-Memory Data Grid
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
Lambda architecture
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
An Engineer's Intro to Oracle Coherence
From cache to in-memory data grid. Introduction to Hazelcast.
GemFire In Memory Data Grid
How we broke Apache Ignite by adding persistence, by Stephen Darlington (Grid...
Ad

Recently uploaded (20)

PPTX
Big Data Technologies - Introduction.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Cloud computing and distributed systems.
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
cuic standard and advanced reporting.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
Big Data Technologies - Introduction.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Programs and apps: productivity, graphics, security and other tools
Cloud computing and distributed systems.
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Review of recent advances in non-invasive hemoglobin estimation
cuic standard and advanced reporting.pdf
Spectroscopy.pptx food analysis technology
NewMind AI Weekly Chronicles - August'25 Week I
Spectral efficient network and resource selection model in 5G networks
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation

Apache ignite as in-memory computing platform

  • 1. S U R I N D E R 2 N D M A R C H 2 0 2 2 Apache Ignite
  • 2. Agenda  Setting up context  Cache Evolution  Apache Ignite  Data Queries  Compute  Data Partitioning  Eviction policies  Performance Comparison
  • 3. Stream Consuming Application Too many read, write and updates to database Limited connections Can slow down stream under load
  • 4. Stream Consuming Application: 1 Cache serves as first data layer Manage persisting data to database Processing much faster due to no direct DB access
  • 5. Stream Consuming Application cont… Cache serves as first class in memory data database Manage persisting data to native storage No DB connections, mechanism overhead
  • 7. Cache Evolution  Distributed caches  Shared cache for app instances  Beyond local RAM capacity  Ease of maintenance  No auto sync with DB(yes/no) ?  In App caches  Cache results  More responsive application  Reduce load on DB  Limited to local RAM size
  • 8. Cache Evolution : Data grids  Benefits  Distributed caches with brains  Compute capabilities  DB Read/Write through  Collocated processing  Better scalability
  • 9. Cache Evolution : In memory computing  Memory centric storage  Scalable to store data in TBs  Sql, transactions support  Collocate related data  DB Read/Write through  Pluggable to ext databases  Native storage on disk  No Ram warm up  Compute capabilities  Map Reduce  Collocated processing  Better scalability
  • 10. What is Apache Ignite ?  A distributed cache  A Distributed in memory data grid  A Distributed in memory database  High-performance computing with in-memory  ANSI 99 SQL Compliant  Transactional operations  SQL transactions in beta
  • 11. Ignite cluster  Group of nodes  Types:  Server : stores data, baseline node  Thick client node : doesn’t store data  Thin client node : not part of cluster  Attribute based grouping possible  Scalable  Fault tolerant  Data consistency  Demo
  • 12. Data Grid  Distributed In-Memory Caching  Read/Write through  Data Consistency  Off-Heap Storage  Distributed SQL  ACID Support  Transactions
  • 13. Keep required backup Everyone knows everything Cache Modes
  • 14. Cache Queries…  Scan Query : Return data matching BiPredicate  Predicate sent to each node,  Node scan its cache  Data consolidated by requested node  Sql Query : load data based on sql given  Needs indexing to be enabled  Registering indexing in config  Annotations for fields visibility  Other queries:  Text Query  Index query  Continuous query
  • 15. Data Partitioning  Partitioned caches  Backups  Ensures data availability in node failures  Read from backup node when primary node leaves  Demo
  • 16. Demo Queries  Scan Query  Sql Query  Data collocation  Next week : this slide onwards
  • 17. Data collocation  Collocate related data for performance  All Employees of dept. can be stored together  Affinity on dept. attribute  Only key attribute can be used in affinity key  Performant CRUD operations  Avoids network trips  Reduced latency  Can cause hot nodes if used inappropriately
  • 18. Compute Tasks  Run distributed computations on grid  Tasks can be run on selected nodes  Ignite manages the task management  E.g. node specific aggregates  List each dept.. students stored on each node  Can be parallelized
  • 19. Continuous Queries  Exactly once processing semantic  3 basic components  Cache to monitor updates  Remote filter to look for data changes  Local listener to act upon data changes  Optional initial query to process initial data  Used to capture data changes on cache  Use case: Reacting to cache entry change  Listen for particular state of cache value  Process the state  Move to next state
  • 20. Eviction Policies  On Heap [cache level]  LRU : Recommended when in doubt  FIFO : It ignores the element access order  Sorted : Sorted according to key for order  Off Heap [data region level]  Random LRU:  Random-2 LRU  Persistence On [Page replacement]  Random-LRU  Segmented-LRU  Clock
  • 21. Persistent Store  CacheStoreAdapter extendable  Read through  Write through  Write behind  Works behind the cache API’s
  • 22. Data Distribution  Why distributing data ?  Data size can go beyond node limits  Load beyond node processing limits  Solutions:  partition the dataset  Migrate to distributed database  Both will have set of nodes : topology
  • 23. Data Distribution Soln.  Distribution Requirements:  Algorithm  Distribution Uniformity  Minimal disruption  Approaches:  Mod N  Consistent Hashing  Rendezvous(HRW)
  • 24. Data Distribution in Ignite  Mapping partition to node  Rendezvous Hashing  Cluster changes moves partitions  Mapping key to partition  Mod N  Partitions are fixed  1024 by default
  • 25. Data Rebalancing  Used when new node join the grid  In memory grids start rebalancing immediately  Enabled manually when persistence is enabled  Possibly more backups than configured in such scenarios  Rebalance Modes  SYNC: cache calls blocked until rebalancing is completed  ASYNC: rebalancing happen in background. Cache respond immediately  NONE : No rebalancing, cache loaded on demand when required or explicitly loading
  • 26. Partition Map Exchange  Triggered when partitions need to moved across nodes  A node joins/leaves the cluster  New cache is created/stopped  An index is created etc.  Cluster waits for ongoing operations  Oldest/youngest node is coordinator
  • 27. Native Storage Architecture  Work directory  Binary data : internal metadata  Marshaler : marshaler info  DB  Lock file : used to ensure node lock  node dir.(s) : cache partitions  cp dir. (checkpoint start end markers)  WAL dir.  node(s) dir. : wal segments  Archive dir.  Node(s) dir. : wal segments
  • 28. Dirty Pages  Pages are always on disk, optionally in RAM  Each cache update is written to RAM and appended to WAL  Cache operation cause dirty pages  Dirty pages are accumulated in RAM  Checkpoint: batch of dirty pages written to disk  WAL file cleared after checkpoint  Updates between checkpoints are logged  Nodes crashes between checkpoints ?  WAL to the rescue
  • 29. Apache Ignite ~ Cassandra  Insert and Update performance is comparable  Read and mixed(read + update) are 2x+ better in ignite  Cassandra UPADTE outperforms under high load  Cassandra demands upfront query patterns  Major model changes/new tables if  Query changes required  New queries with different requirements needed  Ignite support collocated/non collocated joins and hence  Queries can be created just like old school sql  No major changes required except creating few indexes if needed  Check reference slide for more
  • 30. Next steps  Read docs  Get hands dirty with ignite  Explore queries  Ignite compute tasks  Native persistence  Third party persistence
  • 31. References  https://ignite.apache.org/docs/latest/  https://www.youtube.com/watch?v=eMs_2vEsbBk  https://dzone.com/articles/apache-ignite-client-connectors-variety  https://apacheignite.readme.io/docs/leader-election  https://cwiki.apache.org/confluence/display/IGNITE/%28Partition+Map%29+Exc hange+-+under+the+hood   https://data-science-blog.com/blog/2020/09/25/in-memory-data-grid-vs- distributed-cache-which-is-best/  https://hazelcast.com/blog/imdg-vs-imdb-a-business-level-perspective/  https://www.gridgain.com/resources/blog/apacher-ignitetm-and-apacher- cassandratm-benchmarks-power-in-memory-computing

Editor's Notes

  • #21: https://ignite.apache.org/docs/latest/memory-configuration/replacement-policies