Building Apps with Distributed In-Memory Computing Using Apache Geode

Building Apps with
Distributed In-Memory Computing
using Apache Geode
Nitin Lamba
@nlamba9
(incubating)
William Markito
@william_markito

Introduction (Nitin)
• WHAT? Overview & history
• WHY? Relevance & Differentiators
• HOW? Features & Basic Concepts
• SEE! Quick start
Hands-on (William)
• LEARN: Advanced Concepts - Persistence, f(x), PDX, …
• SHOW: Demos (Docker, PDX)
Resources
Q & A
Agenda
2

A distributed, memory-based data
management platform for data
oriented apps that need:
• high performance, scalability,
resiliency and continuous
availability
• fast access to critical data sets
• location-aware distributed data
processing
• event-driven data architecture
What is GEODE?
5

High-level Architecture
6
Powerful app development kit
• APIs: Java & REST
• Adapters: Redis, Lucene*, Spark*, …
Multiple persistence options
• Filesystem, RDBMS or HDFS*
• Sync: read-through, write-through
• Async: write-behind
Durable <K,V> cache/ store
• Data replicated or partitioned
• Redundant storage in-memory/ disk
• Flexible data retention policiesÎ
!
Locator
Server
Server
Server
Server
+""""
" 
$
%
%
%
&& &
% % % % % % % %
&&
A Peer-2-Peer
Distributed System
REST
!
* Experimental and waiting community feedback

• 1000+ systems in production (real customers)
• Cutting edge use cases
Incubating but ROCK solid…
7
<2000 2004 2008 2012 2016
Early drivers
• Data Volumes
• Margins/ transactions
• IT maintenance costs
• Elasticity needs
Real-time needs
• Real-time response
• Time to market needs
• Flexible Data Models
• Persistent+In-memory
Global Data
• Visibility across DC
• Fast Ingest
• Device to enterprise
• Uptime (always on)
Open Source!
• Apache Incubation
• Gemﬁre > Geode
• M1 release
• 1st Geode Summit
Financial
Services
US DoD
Trade Clearing
Travel Portal
Online
Gambling
Telcos
Manufacturing
Auto Insurance
Payroll processing
Rail systems

…with both SCALE and SPEED, …
8
40K
Transactions
per second
3TB
Data
in-memory
17B
Records
in-memory
120K
Concurrent
users

… and impacting a LOT of people!
9
China Railway 
Corporation
Indian
Railways
19%
17%
36%
of the world population

Built for PERFORMANCE…
10
Operationspersecond
0
200,000
400,000
600,000
800,000
YCSB Workloads
AReads
AUpdates
BReads
BUpdates
CReads
DInserts
DReads
FReads
FUpdates
Cassandra
Geode

…and horizontal, consistent SCALABILITY!
11
Horizontal scaling for reads, consistent latency and CPU
0
4.5
9
13.5
18
Speedup
0
1.25
2.5
3.75
5
Server Hosts
2 4 6 8 10
speedup latency (ms) CPU %
• Scaled from 256 clients and 2 servers to 1280 clients and 10 servers
• Partitioned region with redundancy and 1K data size

• Minimize copying
• Minimize contention points
• Run user code in-process
• Partitioning & parallelism
• Avoid disk seeks
• Automated benchmarks
What makes it go FAST?
12

• Cache
• Region
• Member
• Client Cache
• Persistence
• Functions
• Events & Listeners
• High Availability
• Serialization
Let’s talk about a few (basic) CONCEPTS…
13

• In-memory storage and
management for your data
• Conﬁgurable through XML,
Java API or CLI
• Collection of Region
What is a CACHE?
14
Region
Region
Region
Cache
JVM

• Distributed java.util.Map on
steroids (Key/Value)
• Consistent API regardless of where
or how data is stored
• Observable (reactive)
• Highly available, redundant on
cache Member (s).
What is a REGION?
15
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim

• Local, Replicated or Partitioned
• In-memory or persistent
• Redundant
• LRU
• Overﬂow
Region: Types & Options
16
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
LOCAL
LOCAL_HEAP_LRU
LOCAL_OVERFLOW
LOCAL_PERSISTENT
LOCAL_PERSISTENT_OVERFLOW
PARTITION
PARTITION_HEAP_LRU
PARTITION_OVERFLOW
PARTITION_PERSISTENT
PARTITION_PERSISTENT_OVERFLOW
PARTITION_PROXY
PARTITION_PROXY_REDUNDANT
PARTITION_REDUNDANT
PARTITION_REDUNDANT_HEAP_LRU
PARTITION_REDUNDANT_OVERFLOW
PARTITION_REDUNDANT_PERSISTENT
PARTITION_REDUNDANT_PERSISTENT_OVERFLOW
REPLICATE
REPLICATE_HEAP_LRU
REPLICATE_OVERFLOW
REPLICATE_PERSISTENT
REPLICATE_PERSISTENT_OVERFLOW
REPLICATE_PROXY

• Durability
• WAL for efﬁcient writing
• Consistent recovery
• Compaction
Persistent Regions
17
Modify
k1->v5
Create
k6->v6
Create
k2->v2
Create
k4->v4
Oplog2.crf
Member
1
Modify
k4->v7Oplog3.crf
Put k4->v7
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
Server 1 Server N

• A process that has a connection to
the system
• A process that has created a cache
• Embeddable within your
application
What is a MEMBER?
18
Client
Locator
Server

• A process connected to the
Geode server(s)
• Can have a local copy of the data
• Run OQL queries on local data
• Can be notiﬁed about events on
the servers
What is a CLIENT CACHE?
19
Application
GemFire Server
Region
Region
RegionClient Cache

• Clone & Build
•
• Start Services
• Create & Monitor Region
How to START? Easy as !!
20
git clone https://github.com/apache/incubator-geode
cd incubator-geode 
./gradlew build -Dskip.tests=true
cd gemfire-assembly/build/install/apache-geode
./bin/gfsh
gfsh> start locator --name=locator
gfsh> start server --name=server
gfsh> create region --name=myRegion —type=REPLICATE
gfsh> start [pulse | jconsole]
1
2
3
'
1 2 3

• Cache
• Region
• Member
• Client Cache
• Persistence
• Functions
• Events & Listeners
• High Availability
• Serialization
More (advanced) CONCEPTS…
22

Persistence - Shared Nothing
23
Server 3Server 2Server 1

24
B1
B3
B2
B1
B3
B2
Primary
Secondary

25
B1
B3
B2
B1
B3
B2
Primary
Secondary

26
B1
B3
B2
B1
B3
B2
Primary
Secondary

27
B1
B3
B2
B1
B3
B2
Primary
Secondary

28
B1
B3
B2
B1
B3
B2
Primary
Secondary
B3
B2
Server 1 waits for others when it starts

29
B1
B3
B2
B1
B3
B2
Primary
Secondary
Fetches missed operations on restart

Persistence - Operational Logs
30
Create
k1->v1
Create  
k2->v2
Modify 
k1->v3
Create  
k4->v4
Modify
k1->v5
Create  
k6->v6
Member 1
Put k6->v6
Oplog2.crf
Oplog1.crf
Append to
operation log

Persistence - Operational Logs: Compaction
31
Create
k1->v1
Create  
k2->v2
Modify 
k1->v3
Create  
k4->v4
Modify
k1->v5
Create  
k6->v6
Member 1
Put k6->v6
Oplog2.crf
Oplog1.crf
Append to
operation log
Copy live
data forward

• Used for distributed concurrent
processing  
(Map/Reduce, stored procedure)
• Highly available
• Data oriented
• Member oriented
Functions
32
Submit (f1)
f1 , f2 , … fn
Execute 
Functions

Functions
33
Server Server
FunctionService.onRegion.withFilter.execute
ResultCollector.getResult
Server Distributed System
execute
Server
Server
6
1
result
execute
execute
result
result
2
5
3
4
3 4
Server
Partitioned Region
Data Store - X
Partitioned Region
Data Store - Y
Partitioned Region
Data Store - Z
Partitioned Region
Data Accessor
Partitioned Region
Data Accessor
filter = Keys X, Y
Client Region

• Register Interest
• Individual Keys OR RegEx for Keys
• Updates Local Copy
• Examples:
• region.registerInterest(“key-1”);
• region1.registerInterestRegex(“[a-z]+“);
• Continuous Query
• Receive Notiﬁcation when Query condition met on server
• Example:
• SELECT * FROM /tradeOrder t WHERE t.price > 100.00
Can be DURABLE
Events & Notiﬁcations
34

• CacheWriter / CacheListener
• AsyncEventListener (queue / batch)
• Parallel or Serial
• Conﬂation
Listeners
35

Fixed or Flexible schema?
37
id name age pet_id
or
{
id : 1,
name : “Fred”,
age : 42,
pet : {
name : “Barney”,
type : “dino”
}
}

Efﬁcient for queries
39
{
id : 1,
name : “Fred”,
age : 42,
pet : {
name : “Barney”,
type : “dino”
}
}
SELECT p.name FROM /Person p WHERE p.pet.type = “dino”
single field
deserialization

But HOW to serialize data?
40
Benchmark: https://github.com/eishay/jvm-serializers

Schema Evolution
41
Member A Member B
Distributed Type Definitions
v2v1
Application #1
Application #2
v2 objects preserve data
from missing fields
v1 objects use default values to
fill in new fields
PDX provides forwards and backwards
compatibility, no code required

Code
• New features
• Bug ﬁxes
• Writing tests
Documentation
• Wiki
• Web site
• User guide
How to CONTRIBUTE?
43
Community
• Join the mailing list
• Ask or answer
• Join our HipChat
• Become a speaker
• Finding bugs
• Testing an RC/Beta

Website
http://geode.incubator.apache.org/
JIRA
https://issues.apache.org/jira/browse/GEODE
Wiki
cwiki.apache.org/conﬂuence/display/GEODE
GitHub
https://github.com/apache/incubator-geode
Mailing lists
mail-archives.apache.org/mod_mbox/incubator-geode-dev/
Where to BEGIN?
44

45
Thank you!
http://geode.incubator.apache.org
https://github.com/Pivotal-Open-Source-Hub

Building Apps with Distributed In-Memory Computing Using Apache Geode

Recommended

More Related Content

What's hot (20)

Viewers also liked (19)

Similar to Building Apps with Distributed In-Memory Computing Using Apache Geode (20)

More from PivotalOpenSourceHub (20)

Recently uploaded (20)

Building Apps with Distributed In-Memory Computing Using Apache Geode