SlideShare a Scribd company logo
CrateDB & PostgreSQL
OldSQL to NewSQL
11th July 2017
@claus__m
About
~2yrs at Crate.io
DevRel/Field Engineering/Support/
Integrations/…
Speaking
Conferences, meetups, ...
Working with customers
Consulting, pre- and post-sales
@claus__m
Agenda
Failures
What, how, and when?
PostgreSQL
Concept overview
CrateDB
Concept overview
Discussion
NewSQL or not? Benefits and drawbacks.
Use Cases
Wrap up
@claus__m
OldSQL to NewSQL
Failures
@claus__m
Database Failures
Consequences
Data loss
Lost updates, dirty reads, ...
Service interruptions
Services can’t work without their database
Slow performance
Users may lose interest
Pressure
DBAs in the spotlight
@claus__m
What Makes Databases
Fail?
Overloaded
Insufficient hardware (RAM, CPU, disk),
swapping, inefficient queries
Failure
Hardware may fail on many levels: e.g.
Network, disk, RAM
Platform
Configuration errors, updates, resource
sharing, bugs
People
Malicious intent, sloppiness, ...
@claus__m
Overloaded
Insufficient hardware (RAM, CPU, disk),
swapping, inefficient queries
Failure
Hardware may fail on many levels: e.g.
Network, disk, RAM
Platform
Configuration errors, updates, resource
sharing, bugs
People
Malicious intent, sloppiness, ...
@claus__m
What Makes Databases
Fail?
Overview
Concepts and other things
Index and data
How the database creates indices, stores and
retrieves data
Search and scans
How the data is found
Replication and high availability
Distribution and achieving zero downtime
@claus__m
Assessment
PostgreSQL
@claus__m
Overview
Multi-process System
fork() to clone processes from postmaster to
postgres instances with shared memory
Technology
C/C++ based natively compiled
Optimization
Cost-based optimizer
Transactional
ACID compliant
@claus__m
Index And Data
Tree-based
An in-memory B-Tree, defined in CREATE
TABLE or ALTER TABLE
In Memory & On Disk
8K data pages in shared buffer cache and on
disk
Item Pointers
Only major changes are reflected in the index
(e.g. INSERT/DELETES)
@claus__m
@claus__mhttp://use-the-index-luke.com/sql/anatomy/the-tree
Searches And Scans
Sequential
Go over every block and execute a predicate
Index-based
Find something using an index on that column,
or a full index scan
Bitmap-based
Mark matches in boolean queries for results
@claus__m
Replication And
High Availability
Disk based
By sharing a disk or continuously cloning a disk
Log-shipping
Send the write-ahead-log to the standby server,
which can answer reads
Master/Master
Sends rows to the other master, can answer
reads and writes, locks rows/tables
Client-sharding
Shard the data on a client/proxy and route
accordingly
@claus__m
CrateDB
@claus__m
Overview
Multi-threaded System
Thread-pools to read/write Lucene segments
Technology
Java/JVM based
Optimization
Naive optimization on query levels
Eventually Consistent
Atomic operations per row, optimistic
concurrency only
Distributed By Default
Transparent partitioning and sharding @claus__m
Index And Data
Inverted index
Term dictionary where field values point to
rows (posting list)
Field cache
“Inverted inverted index”, column names point
to the possible values and their rows
On disk, cached in memory
Immutable segments on disk, binary search in
each segment, cached with mmap() into ram
pages
@claus__m
Example Posting List
@claus__m
Index And Data
@claus__m
Shards
Compounds of multiple immutable segments,
merged occasionally
Rows are documents, columns are fields
Vector space model to weight and score
searches (_score field)
Multi-threaded index access
Shards are multiple segments, each is read
with a thread
Replication And
High Availability
Shared nothing architecture
Every node handles every task
Shard-based
Replicas are copies of shards that are
distributed in the cluster evenly
Consistency
Elected leader maintains and distributes a
consistent cluster state
CAP
Tuneable consistency with synchronous inserts
@claus__m
Discussion
@claus__m
PostgreSQL: Strengths
Single-Node-Performance
Predictable and fast
SQL Sophistication
Lots of features, many of them heavily
optimized
Transactions
ACID compliance, concurrency control
@claus__m
PostgreSQL: Weaknesses
Distribution
High availability or working with huge data sets
requires 3rd party software, partitioning
Ingest speed
ACID compliance slows down inserts
Operational Complexity/DevOps Readiness
Highly controllable features make it hard to
manage
Schema Flexibility
Schema evolution management required
@claus__m
CrateDB: Strengths
Distribution
Distributed by nature, with tunable consistency
Ingest speed
Solid insert speeds with bulk inserts
Operational Complexity/DevOps Readiness
High flexibility, containerization, sane defaults
Schema Flexibility
Schema evolution on the fly
Built-in Search
Fulltext capabilities
@claus__m
CrateDB: Weaknesses
Single-Node-Performance
Distribution overhead requires a certain cluster
size to be efficient
SQL Features
Many features are yet missing or hard to do in
a distributed system
Transactions
No ACID compliance, eventual
consistency/optimistic concurrency requires
client-side handling
@claus__m
Use Cases
@claus__m
Use Cases: PostgreSQL
ORMs
Broad integration in various object-relational
mappers in frameworks (hibernate, …)
Transaction-based workloads
Single, high-value transactions
Extensive SQL compliance
Required support for views, stored procedures,
…
Small data sets
Hundreds of MBs to several GB
@claus__m
Use Cases: CrateDB
DevOps
Flexible schemas, ad-hoc queries, easy
maintenance
Analytics, machine learning
Large scale inserts/queries, high concurrency,
SQL
Fulltext search
Built-in tools for text-mining/analysis, built on
the de-facto standard of search
@claus__m
Thanks!
Links
https://github.com/crate
https://crate.io
Follow us on twitter
@crateio @claus__m
Next webinar: Scale your SQL database
with Docker, 27th July
Q & A

More Related Content

PDF
Overview of no sql
PDF
HPTS 2011: The NoSQL Ecosystem
PPT
No sql landscape_nosqltips
PDF
Introduction to NoSQL
PPTX
Navigating NoSQL in cloudy skies
PPTX
NoSQL databases
PDF
NoSQL Databases
PPTX
NoSQL databases - An introduction
Overview of no sql
HPTS 2011: The NoSQL Ecosystem
No sql landscape_nosqltips
Introduction to NoSQL
Navigating NoSQL in cloudy skies
NoSQL databases
NoSQL Databases
NoSQL databases - An introduction

What's hot (20)

PDF
NoSQL databases
PPTX
PPT
No SQL and MongoDB - Hyderabad Scalability Meetup
PPTX
NOSQL Databases types and Uses
PPTX
Introduction to Cassandra (June 2010)
PPTX
Introduction to NoSQL
PPTX
Selecting best NoSQL
PPSX
A Seminar on NoSQL Databases.
PPTX
Apache Cassandra
PPTX
PDF
NOSQL- Presentation on NoSQL
PPTX
Cassandra an overview
PPTX
Why no sql ? Why Couchbase ?
PPTX
NoSQL and MapReduce
PPT
Cassandra architecture
PPTX
No SQL- The Future Of Data Storage
PDF
NoSql Introduction
PDF
Backbone using Extensible Database APIs over HTTP
ODP
Nonrelational Databases
PPTX
An Intro to NoSQL Databases
NoSQL databases
No SQL and MongoDB - Hyderabad Scalability Meetup
NOSQL Databases types and Uses
Introduction to Cassandra (June 2010)
Introduction to NoSQL
Selecting best NoSQL
A Seminar on NoSQL Databases.
Apache Cassandra
NOSQL- Presentation on NoSQL
Cassandra an overview
Why no sql ? Why Couchbase ?
NoSQL and MapReduce
Cassandra architecture
No SQL- The Future Of Data Storage
NoSql Introduction
Backbone using Extensible Database APIs over HTTP
Nonrelational Databases
An Intro to NoSQL Databases
Ad

Similar to OldSQL to NewSQL (20)

PPTX
nosql.pptx
PPT
NO SQL: What, Why, How
PDF
Understanding and building big data Architectures - NoSQL
PPTX
NoSql Database
PDF
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
PPTX
Learning Cassandra NoSQL
PPTX
GIDS 2016 Understanding and Building No SQLs
PPTX
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
PPT
5266732.ppt
PPT
Schemaless Databases
PPTX
Cassandra internals
PPT
No sql
PDF
Types of Databases
PPTX
No sq lv2
PPTX
Basics of Distributed Systems - Distributed Storage
PDF
NoSQL Basics - A Quick Tour
PPT
05 No SQL Sudarshan.ppt
PPT
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
PPT
No SQL Databases.ppt
PPTX
Exploring NoSQL and implementing through Cassandra
nosql.pptx
NO SQL: What, Why, How
Understanding and building big data Architectures - NoSQL
NoSql Database
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Learning Cassandra NoSQL
GIDS 2016 Understanding and Building No SQLs
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
5266732.ppt
Schemaless Databases
Cassandra internals
No sql
Types of Databases
No sq lv2
Basics of Distributed Systems - Distributed Storage
NoSQL Basics - A Quick Tour
05 No SQL Sudarshan.ppt
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases.ppt
Exploring NoSQL and implementing through Cassandra
Ad

More from Claus Matzinger (7)

PDF
Rust Munich February 2018: Rust on VSTS
PDF
CrateDB 101: Geospatial data
PDF
CrateDB 101: Sensor data
PDF
Getting the most out of your containerized database
PDF
Sensordaten analysieren mit Docker, CrateDB und Grafana
PDF
Open Machine Data Analysis Stack with Docker, CrateDB, and Grafana @Chadev+Lunch
PDF
Containerized DBs in a Machine Data environment with Crate.io
Rust Munich February 2018: Rust on VSTS
CrateDB 101: Geospatial data
CrateDB 101: Sensor data
Getting the most out of your containerized database
Sensordaten analysieren mit Docker, CrateDB und Grafana
Open Machine Data Analysis Stack with Docker, CrateDB, and Grafana @Chadev+Lunch
Containerized DBs in a Machine Data environment with Crate.io

Recently uploaded (20)

PPTX
Reimagine Home Health with the Power of Agentic AI​
PPTX
Monitoring Stack: Grafana, Loki & Promtail
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPTX
Patient Appointment Booking in Odoo with online payment
PPTX
Oracle Fusion HCM Cloud Demo for Beginners
PDF
iTop VPN 6.5.0 Crack + License Key 2025 (Premium Version)
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PPTX
Why Generative AI is the Future of Content, Code & Creativity?
PPTX
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
PPTX
assetexplorer- product-overview - presentation
PDF
Designing Intelligence for the Shop Floor.pdf
PDF
Autodesk AutoCAD Crack Free Download 2025
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Digital Systems & Binary Numbers (comprehensive )
PDF
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
PPTX
history of c programming in notes for students .pptx
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
DOCX
Greta — No-Code AI for Building Full-Stack Web & Mobile Apps
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Download FL Studio Crack Latest version 2025 ?
Reimagine Home Health with the Power of Agentic AI​
Monitoring Stack: Grafana, Loki & Promtail
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Patient Appointment Booking in Odoo with online payment
Oracle Fusion HCM Cloud Demo for Beginners
iTop VPN 6.5.0 Crack + License Key 2025 (Premium Version)
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Why Generative AI is the Future of Content, Code & Creativity?
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
assetexplorer- product-overview - presentation
Designing Intelligence for the Shop Floor.pdf
Autodesk AutoCAD Crack Free Download 2025
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Digital Systems & Binary Numbers (comprehensive )
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
history of c programming in notes for students .pptx
wealthsignaloriginal-com-DS-text-... (1).pdf
Greta — No-Code AI for Building Full-Stack Web & Mobile Apps
CHAPTER 2 - PM Management and IT Context
Download FL Studio Crack Latest version 2025 ?

OldSQL to NewSQL

  • 1. CrateDB & PostgreSQL OldSQL to NewSQL 11th July 2017 @claus__m
  • 2. About ~2yrs at Crate.io DevRel/Field Engineering/Support/ Integrations/… Speaking Conferences, meetups, ... Working with customers Consulting, pre- and post-sales @claus__m
  • 3. Agenda Failures What, how, and when? PostgreSQL Concept overview CrateDB Concept overview Discussion NewSQL or not? Benefits and drawbacks. Use Cases Wrap up @claus__m
  • 6. Database Failures Consequences Data loss Lost updates, dirty reads, ... Service interruptions Services can’t work without their database Slow performance Users may lose interest Pressure DBAs in the spotlight @claus__m
  • 7. What Makes Databases Fail? Overloaded Insufficient hardware (RAM, CPU, disk), swapping, inefficient queries Failure Hardware may fail on many levels: e.g. Network, disk, RAM Platform Configuration errors, updates, resource sharing, bugs People Malicious intent, sloppiness, ... @claus__m
  • 8. Overloaded Insufficient hardware (RAM, CPU, disk), swapping, inefficient queries Failure Hardware may fail on many levels: e.g. Network, disk, RAM Platform Configuration errors, updates, resource sharing, bugs People Malicious intent, sloppiness, ... @claus__m What Makes Databases Fail?
  • 9. Overview Concepts and other things Index and data How the database creates indices, stores and retrieves data Search and scans How the data is found Replication and high availability Distribution and achieving zero downtime @claus__m Assessment
  • 11. Overview Multi-process System fork() to clone processes from postmaster to postgres instances with shared memory Technology C/C++ based natively compiled Optimization Cost-based optimizer Transactional ACID compliant @claus__m
  • 12. Index And Data Tree-based An in-memory B-Tree, defined in CREATE TABLE or ALTER TABLE In Memory & On Disk 8K data pages in shared buffer cache and on disk Item Pointers Only major changes are reflected in the index (e.g. INSERT/DELETES) @claus__m
  • 14. Searches And Scans Sequential Go over every block and execute a predicate Index-based Find something using an index on that column, or a full index scan Bitmap-based Mark matches in boolean queries for results @claus__m
  • 15. Replication And High Availability Disk based By sharing a disk or continuously cloning a disk Log-shipping Send the write-ahead-log to the standby server, which can answer reads Master/Master Sends rows to the other master, can answer reads and writes, locks rows/tables Client-sharding Shard the data on a client/proxy and route accordingly @claus__m
  • 17. Overview Multi-threaded System Thread-pools to read/write Lucene segments Technology Java/JVM based Optimization Naive optimization on query levels Eventually Consistent Atomic operations per row, optimistic concurrency only Distributed By Default Transparent partitioning and sharding @claus__m
  • 18. Index And Data Inverted index Term dictionary where field values point to rows (posting list) Field cache “Inverted inverted index”, column names point to the possible values and their rows On disk, cached in memory Immutable segments on disk, binary search in each segment, cached with mmap() into ram pages @claus__m
  • 20. Index And Data @claus__m Shards Compounds of multiple immutable segments, merged occasionally Rows are documents, columns are fields Vector space model to weight and score searches (_score field) Multi-threaded index access Shards are multiple segments, each is read with a thread
  • 21. Replication And High Availability Shared nothing architecture Every node handles every task Shard-based Replicas are copies of shards that are distributed in the cluster evenly Consistency Elected leader maintains and distributes a consistent cluster state CAP Tuneable consistency with synchronous inserts @claus__m
  • 23. PostgreSQL: Strengths Single-Node-Performance Predictable and fast SQL Sophistication Lots of features, many of them heavily optimized Transactions ACID compliance, concurrency control @claus__m
  • 24. PostgreSQL: Weaknesses Distribution High availability or working with huge data sets requires 3rd party software, partitioning Ingest speed ACID compliance slows down inserts Operational Complexity/DevOps Readiness Highly controllable features make it hard to manage Schema Flexibility Schema evolution management required @claus__m
  • 25. CrateDB: Strengths Distribution Distributed by nature, with tunable consistency Ingest speed Solid insert speeds with bulk inserts Operational Complexity/DevOps Readiness High flexibility, containerization, sane defaults Schema Flexibility Schema evolution on the fly Built-in Search Fulltext capabilities @claus__m
  • 26. CrateDB: Weaknesses Single-Node-Performance Distribution overhead requires a certain cluster size to be efficient SQL Features Many features are yet missing or hard to do in a distributed system Transactions No ACID compliance, eventual consistency/optimistic concurrency requires client-side handling @claus__m
  • 28. Use Cases: PostgreSQL ORMs Broad integration in various object-relational mappers in frameworks (hibernate, …) Transaction-based workloads Single, high-value transactions Extensive SQL compliance Required support for views, stored procedures, … Small data sets Hundreds of MBs to several GB @claus__m
  • 29. Use Cases: CrateDB DevOps Flexible schemas, ad-hoc queries, easy maintenance Analytics, machine learning Large scale inserts/queries, high concurrency, SQL Fulltext search Built-in tools for text-mining/analysis, built on the de-facto standard of search @claus__m
  • 30. Thanks! Links https://github.com/crate https://crate.io Follow us on twitter @crateio @claus__m Next webinar: Scale your SQL database with Docker, 27th July
  • 31. Q & A