SlideShare a Scribd company logo
MariaDB 10.2 News
and
MariaDB ColumnStore
Anders Karlsson
Sales Engineer, MariaDB
anders@mariadb.com
Agenda
• About Anders Karlsson
• MariaDB 10.2 news
– MariaDB 10.2 overview
– JSON support
– Window functions
– Common Table Expressions
• MariaDB ColumnStore
– What is Columnar storage
– MariaDB ColumnStore Architecture
– MariaDB ColumnStore FAQ
• Questions and Answers
About Anders Karlsson
• I have been in the database business
for more then 30 years
• I have worked as a support engineer, porting
engineer, performance consultant, trainer
but mainly as a Sales Engineer
• I have worked for Oracle, Informix, TimesTen, MySQL etc.
• I joined MySQL in 2004 and MariaDB in 2012
• Besides work I am also a beer aficionado, have an interest
in old cars and computers of dubious quality and I try to spend
as much time as possible with my 4-year old twins, Freddy and
Freja
MariaDB Server 10.2
Overview
What’s New in MariaDB Server 10.2
Analytics SQL ● Window Functions
● Common Table Expressions (CTE)
JSON ● JSON Functions
● GeoJSON Functions
Replication ● Delayed Replication
● Restrict the speed of reading binlog from Master
● Compressed Binary Log
Database Compatibility ● Multi-Trigger Support
● CHECK Constraint Expression Support
● EXECUTE IMMEDIATE statement
● Full Support for DEFAULT
● DECIMAL increased from 30 to 38 digits
Storage Engine
Enhancements
● New MyRocks based Storage Engine
● Enhancements from MySQL InnoDB 5.7
● Enable InnoDB NUMA interleave for InnoDB
What’s New in MariaDB Server 10.2
Security ● Per User Server Load Limitations
● Enforced TLS Connections
Administration ● New functions for User Management
● Enhanced Information from EXPLAIN
● User defined variables added to Information Schema
● Binary Log based Flashback
Performance ● Indexes for non-Persistent Virtual Columns
● Support bulk inserts with MariaDB Connectors
● New Option to define a directory for InnoDB temporary files
Server-Internal
Optimisations
● Internal Use of MariaDB Connector/C
● Optimizer Enhancements
● Non-Locking ANALYZE TABLE
Other Enhancements ● Lifted limitations for Virtual Computed Columns
● Subquery-Support for Views
● Multiple Use of Temporary Tables in Query
CSV
JSON
• Not standardized
• Undefined format
• Character set issues
• Undefined escaping
• Tab, Coma separator
• Multi line data issues
• Standardized
• Format defined by standard
• Self-describing format
• UTF8 enforced
• Fixed set of known datatypes
• Object
MariaDB Server 10.2 JSON functions
• No JSON data type in MariaDB 10.2, use VARCHAR / TEXT instead
– No performance benefit from a JSON datatype
– MariaDB Server 10.2 CHECK constraints can be used to validate JSON
• There are functions to
– Validate JSON
– Extract members from a JSON object
– Add members to a JSON object
– Work with JSON arrays
– Search JSON objects
– And more…
Why I like JSON!
• JSON (JavaScript Object Notation) – But not only for objects!
• JSON is easy to use, read, write and edit
• JSON is standardized
– But not the extent to make the standard more important than the data itself
– The standard fits, mor or less, on a single web page (http://www.json.org)
– JSON is independent on the application, operating system, database system
• JSON types are few, simple and obvious
– NUMBER, UTF8 STRING, BOOLEAN, NULL, ARRAY and OBJECT
JSON Datatypes
• A JSON Object is a collection of named elements which should be unique
– {”color”: ”red”, ”size”: [”S”,”M”.”L”]}
– Non uniquely named elements are not strictly forbidden in the standard, but most
implementations
• A JSON Array is an unordered list of elements
– [”A”, 57, ”B”] is the same as [57, ”B”, ”A”]
• JSON Strings are UTF8 and UTF8 only!
• JSON Numbers should be limited to IEEE754 64-bit floating point values
– JSON in itself actually doesn’t impose a limitation
– But many implementations do, including Java Script
JSON in Practice
• Tables containing current stock for a garment reseller
– Note CHECK constraint in stock table
MariaDB> CREATE TABLE product(prod_id INTEGER NOT NULL PRIMARY KEY,
prod_name VARCHAR(256) NOT NULL);
MariaDB> CREATE TABLE stock(stock_id INTEGER NOT NULL PRIMARY KEY,
prod_id INTEGER NOT NULL,
price DECIMAL(8,2) NOT NULL,
quantity INTEGER NOT NULL,
attr_json TEXT,
CHECK(JSON_VALID(attr_json) OR attr_json IS NULL),
FOREIGN KEY(prod_id) REFERENCES product(prod_id));
JSON in Practice – Adding data
• Inserting data into a JSON column is no different than other columns, it’s not even a
separate data type.
MariaDB> INSERT INTO stock VALUES(1, 1, 20.99, 57,
'{"color": "denim", "size": ["S","M","XL"]}');
MariaDB> INSERT INTO stock VALUES(2, 2, 12.99, 123, '{"size": ["S","M","XL"],
"color schemes": [{"name": "bold", "color": ["blue", "red"]},
{"name": "plain", "color": ["white"]}]}');
MariaDB> INSERT INTO stock VALUES(3, 2, 10.99, 156, '"color": "blue"');
ERROR 4025 (23000): CONSTRAINT `CONSTRAINT_1` failed for `inventory`.`stock`
JSON in Practice – Extracting data
• The JSON_xxx functions introduced in MariaDB 10.2 allows extensive updates,
modification, validation and querying of JSON data
• The JSON extraction functions use a “JSON path” to determine what to extract
• Find all clothes that are available in size XL
MariaDB> SELECT p.prod_name, s.quantity
FROM stock s JOIN product p ON s.prod_id = p.prod_id
WHERE JSON_CONTAINS(JSON_EXTRACT(attr_json, '$.size'), '"XL"');
+-----------+----------+
| prod_name | quantity |
+-----------+----------+
| Jeans | 57 |
+-----------+----------+
JSON and Indexing
• So, can I create an index on a JSON element?
• Yes, but currently that is achieved by using virtual columns
MariaDB> ALTER TABLE stock ADD color VARCHAR(255)
AS (JSON_VALUE(JSON_EXTRACT(attr_json, '$**.color'), '$[0][0]'));
MariaDB> CREATE INDEX stock_ix1 ON stock(color);
MariaDB> EXPLAIN SELECT * FROM stock WHERE color = 'denim';
+------+-------------+-------+------+---------------+-----------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+---------------+-----------+---------+-------+------+-------------+
| 1 | SIMPLE | stock | ref | stock_ix1 | stock_ix1 | 258 | const | 1 | Using where |
+------+-------------+-------+------+---------------+-----------+---------+-------+------+-------------+
MariaDB 10.2 Window Functions
Groups within groups and ranking
Window functions
• Window functions allow you to operate on groups of rows
• Window functions allow you to create subgroups
• Window functions allow you to relate rows to eachother
MariaDB> CREATE TABLE tx(tx_no INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
acct_no INTEGER NOT NULL,
tx_amt DECIMAL(10,2) NOT NULL,
tx_date DATETIME NOT NULL);
MARIADB> INSERT INTO tx VALUES(NULL, 100, 1680.56, '2017-01-15 10:30:00');
MARIADB> INSERT INTO tx VALUES(NULL, 100, 20000.00, '2017-01-15 11:00:00');
MARIADB> INSERT INTO tx VALUES(NULL, 100, -188.60, '2017-01-16 21:00:00');
MARIADB> INSERT INTO tx VALUES(NULL, 100, -256.80, '2017-01-17 16:00:00');
MARIADB> INSERT INTO tx VALUES(NULL, 110, 20000.00, '2017-01-12 11:20:00');
…
Subgroups using Window functions
• Select from the transaction table and include the current account balance
– The usual SUM() aggregate function is used as the Window function
– The group is specified by the PARTITION keyword
– First using old style JOIN with aggregate
MariaDB> SELECT t1.acct_no, t1.tx_amt, t2.balance
FROM tx t1 JOIN
(SELECT acct_no, SUM(tx_amt) balance FROM tx GROUP BY acct_no) t2
ON t1.acct_no = t2.acct_no;
+---------+----------+----------+
| acct_no | tx_amt | balance |
+---------+----------+----------+
| 100 | 1680.56 | 39144.16 |
| 100 | 20000.00 | 39144.16 |
| 100 | -188.60 | 39144.16 |
| 100 | -256.80 | 39144.16 |
| 110 | 20000.00 | 37963.30 |
| 110 | -185.00 | 37963.30 |
…
Subgroups using Window functions
• Using Window functions instead of a JOIN with an aggregate
MariaDB> SELECT acct_no, tx_amt,
SUM(tx_amt) OVER (PARTITION BY acct_no) balance FROM tx;
+---------+----------+----------+
| acct_no | tx_amt | balance |
+---------+----------+----------+
| 100 | 1680.56 | 39144.16 |
| 100 | 20000.00 | 39144.16 |
| 100 | -188.60 | 39144.16 |
| 100 | -256.80 | 39144.16 |
| 110 | 20000.00 | 37963.30 |
| 110 | -185.00 | 37963.30 |
…
Ranking rows
• Rank transactions by amount, grouped by month
• The ROW_NUMBER(), RANK() or DENSE_RANK() provide ranking
– ROW_NUMBER() as 1,2,3,4
– RANK() as 1,2,2,4
– DENSE_RANK() as 1,2,2,3
• Ranking row by row as defined by ORDER BY
• Groups of ranking as defined by PARTITION
RANK() OVER (PARTITION BY <part> ORDER BY <rank expr>)
Rank and group rows
MariaDB> SELECT acct_no, MONTH(tx_date), tx_amt,
DENSE_RANK() OVER(PARTITION BY MONTH(tx_date) ORDER BY -tx_amt) rank
FROM tx ORDER BY MONTH(tx_date), rank;
+---------+----------------+----------+------+
| acct_no | MONTH(tx_date) | tx_amt | rank |
+---------+----------------+----------+------+
| 110 | 1 | 20000.00 | 1 |
| 100 | 1 | 20000.00 | 1 |
| 120 | 1 | 18000.00 | 2 |
| 100 | 1 | 1680.56 | 3 |
…
| 110 | 1 | -271.00 | 10 |
| 110 | 2 | 20000.00 | 1 |
| 100 | 2 | 20000.00 | 1 |
| 120 | 2 | 18000.00 | 2 |
| 110 | 2 | -170.70 | 3 |
…
SUBGROUP by
MONTH
RANK by transaction
amount
MariaDB 10.2 Common Table Expressions
Say goodbye to temporary tables
Say hello to recursive SQL
MariaDB 10.2 – Common Table Expressions
• Introduced in the SQL Standard in SQL:1999
• CTE are like temporary tables that are part of a single SQL SELECT query
• CTEs can be recursive or non-recursive
• A single SELECT may contain one or more CTEs
WITH <alias> AS (<SELECT query>)[, <alias> AS (<SELECT query>)]
SELECT ...;
WITH RECURSIVE <alias> AS (<SELECT query>)[, <alias> AS (<SELECT query>)]
SELECT ...;
MariaDB 10.2 – Non recursive CTEs
• Useful in most cases where a temporary table would be used
• For example when using different levels or types of aggregation
• SELECT the average transaction amount by month for each account
– We need to compute the total transaction amount for all accounts, group by month
– Then we need to average that across all months
• First, using temporary tables
MariaDB> CREATE TEMPORARY TABLE tmp1 AS
SELECT acct_no, MONTH(tx_date), SUM(tx_amt) amt FROM tx
GROUP BY acct_no, MONTH(tx_date);
MariaDB> SELECT acct_no, AVG(amt) FROM tmp1 GROUP BY acct_no;
+---------+--------------+
| acct_no | avg(amt) |
+---------+--------------+
| 100 | 19572.080000 |
| 110 | 18981.650000 |
| 120 | 18368.000000 |
+---------+--------------+
MariaDB 10.2 – Non recursive CTEs
• Now, using a single SQL statement using a Common Table Expression
MariaDB> WITH cte1 AS (SELECT acct_no, MONTH(tx_date), SUM(tx_amt) amt FROM tx
GROUP BY acct_no, MONTH(tx_date))
SELECT acct_no, AVG(amt) FROM cte1 GROUP BY acct_no;
+---------+--------------+
| acct_no | AVG(amt) |
+---------+--------------+
| 100 | 19572.080000 |
| 110 | 18981.650000 |
| 120 | 18368.000000 |
+---------+--------------+
MariaDB 10.2 – Recursive CTEs
• Allows a query to reference the result of "the same" query
• Has two components
– An "anchor" where the query "starts"
– A second part, using a UNION, that joins with the previous results
• The query "stops" recursion when no more rows are returned
• An outer query queries that result of the recursive query
• Sounds complicated? It's not, let's use an example
MariaDB 10.2 – Recursive CTEs
• Let's say we have a table with parts that make up other parts in a hierarchy
MariaDB> CREATE TABLE parts(part_id INTEGER NOT NULL PRIMARY KEY,
part_name VARCHAR(50) NOT NULL, part_of INTEGER,
part_count INTEGER);
MariaDB> ALTER TABLE parts ADD FOREIGN KEY(part_of)
REFERENCES parts(part_id);
MariaDB> INSERT INTO parts VALUES(1, 'Engine', NULL, NULL);
MariaDB> INSERT INTO parts VALUES(2, 'Cylinder', 1, 4);
MariaDB> INSERT INTO parts VALUES(3, 'Bolt', 2, 12);
MariaDB> INSERT INTO parts VALUES(4, 'Carburettor', 1, 2);
MariaDB> INSERT INTO parts VALUES(5, 'Screw', 4, 12);
MariaDB> INSERT INTO parts VALUES(6, 'Screw', 2, 8);
• Now, how many do I need of each of these parts to build an engine?
MariaDB 10.2 – Recursive CTEs
MariaDB> WITH RECURSIVE cte1 AS(
SELECT part_id, part_name, 1 part_count FROM parts WHERE part_name = 'Engine'
UNION
SELECT p.part_id, p.part_name, p.part_count * cte1.part_count
FROM parts p JOIN cte1 ON p.part_of = cte1.part_id)
SELECT part_name, SUM(part_count) FROM cte1 GROUP BY part_name;
• The recursive query starts with the engine of which we have stated we need 1
• The recursive query joins the part_of column to the part_id of the "parent"
• We compute the part count for each level by multiplying the count of parent component
with the count of the current component
• The outer query aggregates the part count by part
Anchor query
Recursion
specification
MariaDB ColumnStore
Column storage performance!
ColumnStore vs. Existing approaches
Limited real time analytics
Slow releases of product innovation
Expensive hardware and software
Data Warehouses
Hadoop / NoSQL
LIMITED SQL
SUPPORT
DIFFICULT TO
INSTALL/MANAGE
LIMITED TALENT POOL
DATA LAKE W/ NO DATA
MANAGEMENT
Hard to use
MariaDB ColumnStore
High performance columnar storage engine that support wide variety of analytical use
cases with SQL in a highly scalable distributed environments
Parallel query
processing for
distributed
environments
Faster, More Efficient
Queries
Single SQL
Interface for OLTP
and analytics
Easier Enterprise
Analytics
Power of SQL
and Freedom of
Open Source to
Big Data
Analytics
Better Price
Performance
Easier
Enterprise
Analytics
ANSI SQL
Single SQL Front-end
• Use a single SQL interface for analytics and OLTP
• Leverage MariaDB Security features - Encryption for
data in motion , role based access and auditability
Full ANSI SQL
• Support complex join, aggregation and window
function
Easy to manage and scale
• Eliminate needs for indexes and views
• Automated partitioning
• Linear scalable by adding new nodes as data grows
• Out of box connection with BI tools
• High compression level
Row oriented vs. Column oriented
Row-oriented: rows stored sequentially in a file
Key Fname Lname State Zip Phone Age Sales
1 Bugs Bunny NJ 11217 (123) 938-3235 34 100
2 Yosemite Sam CT 95389 (234) 375-6572 52 500
3 Daffy Duck IA 10013 (345) 227-1810 35 200
4 Elmer Fudd CT 04578 (456) 882-7323 43 10
5 Witch Hazel CT 01970 (567) 744-0991 57 250
Column-oriented: each column is stored in a separate file. Each column for a
given row is at the same offset.
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NJ
CT
IA
CT
CT
Zip
11217
95389
10013
04578
01970
Phone
(123) 938-3235
(234) 375-6572
(345) 227-1810
(456) 882-7323
(567) 744-0991
Age
34
52
35
43
57
Sales
100
500
200
10
250
MariaDB ColumnStore - Architecture
Columnar Distributed Data Storage
Local Storage | SAN | EBS | Gluster FS
BI Tool SQL Client Custom
Big Data
App
Application
MariaDB
SQL Front
End
Distributed
Query
Engine
Data
Storage
MariaDB ColumnStore – Disk storage
• Vertical Partitioning by Column
– Each column in its own column file
– Only do I/O for columns requested
Logical Layer Physical Layer
Table
Column1 ColumnN
Extent 1
(8MB~64MB
8 million rows)
Extent N
(8MB~64MB
8 million rows)
Segment
File1 (Extent)
Segment
FileN (Extent)
Server
DB Root
Blocks
(8KB)
• Horizontal Partitioning by range of
rows
– Logical grouping of 8 million rows of
each column file
– In-memory mapping of extent to
physical layer
MariaDB ColumnStore FAQ
• Is MariaDB ColumnStore also good for OLTP applications?
– No, not when using the ColumnStore Storage Engine, but the InnoDB Engine is also included
with MariaDB ColumnStore
• Does MariaDB ColumnStore support parallel processing?
– Yes, the processing on each PM (Performance Module) is parallelized
• Can I use DML, i.e. INSERT, UPDATE and DELETE with the ColumnStore Storage
Engine?
– Yes you can, but it is slow, use the cpimport tool instead. Also LOAD DATA INFILE and
INSERT INTO … SELECT is fast, as these are mapped to cpimport
• Can I use an OLTP Schema with MariaDB ColumnStore?
– Yes, but for the kind of analytical queries the ColumnStore Storage Engine is aimed at, a star
or snowflake schema is recommended
MariaDB ColumnStore FAQ
• Is data loading in MariaDB ColumnStore parallelized?
– Yes, but it depends on how you load data. Using cpimport you can load in parallel across the
PMs
• What is the compression ratio that can be achieved?
– Typically 5 – 15 times
• Can you join a ColumnStore table with an InnoDB table?
– Yes, this will always work. To acheve best performance you can have this be performed by the
PMs by having the UM push the InnoDB table contents to PMs. This is not enabled by default
and has to be configured in MariaDB ColumnStore separately
• Is ColumnStore a Storage Engine?
– Yes and no. It is a Storage Engine, but it is more than that. The PMs are completely separate,
and the MariaDB Server has had some adjustments to the optimizer and some other parts of
the MariaDB code. It is planned for ColumnStore to become a Storage Engine eventually.
MariaDB ColumnStore - Summary
• High level of compression
• Fixed data position per column
– Longer columns have extents
• No indexes
– No data loading slowdown
– Fast loading of data
– Consistent data load speeds
• Meta data stored in memory
• Scalable and distributed architecture
Questions?
Answers!
Thank you
anders@mariadb.com

More Related Content

PDF
M|18 Understanding the Query Optimizer
PDF
Using JSON with MariaDB and MySQL
PDF
JSON Support in MariaDB: News, non-news and the bigger picture
PDF
Modern solutions for modern database load: improvements in the latest MariaDB...
PPTX
BGOUG15: JSON support in MySQL 5.7
PDF
Advanced fulltext search with Sphinx
PDF
MariaDB 10.3 Optimizer - where does it stand
PDF
[D15] 最強にスケーラブルなカラムナーDBよ、Hadoopとのタッグでビッグデータの地平を目指せ!by Daisuke Hirama
M|18 Understanding the Query Optimizer
Using JSON with MariaDB and MySQL
JSON Support in MariaDB: News, non-news and the bigger picture
Modern solutions for modern database load: improvements in the latest MariaDB...
BGOUG15: JSON support in MySQL 5.7
Advanced fulltext search with Sphinx
MariaDB 10.3 Optimizer - where does it stand
[D15] 最強にスケーラブルなカラムナーDBよ、Hadoopとのタッグでビッグデータの地平を目指せ!by Daisuke Hirama

What's hot (20)

PDF
Fulltext engine for non fulltext searches
PDF
Oracle Database Advanced Querying
PDF
Window functions in MySQL 8.0
PPTX
Exploring Advanced SQL Techniques Using Analytic Functions
PDF
Swift Study #2
PPTX
Php forum2015 tomas_final
PDF
MariaDB for Developers and Operators (DevOps)
PDF
Need for Speed: Mysql indexing
PDF
Oracle Database Advanced Querying (2016)
PDF
Advanced PL/SQL Optimizing for Better Performance 2016
PDF
The art of querying – newest and advanced SQL techniques
PDF
PL/SQL New and Advanced Features for Extreme Performance
PDF
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013
PDF
MySQL Performance Schema in Action: the Complete Tutorial
PDF
OSDC 2012 | Scaling with MongoDB by Ross Lawley
PDF
The Ring programming language version 1.5.1 book - Part 43 of 180
KEY
Postgres rules
PDF
The Ring programming language version 1.7 book - Part 41 of 196
PDF
Cassandra 3.0 - JSON at scale - StampedeCon 2015
PDF
MySQL Performance Schema in Action
Fulltext engine for non fulltext searches
Oracle Database Advanced Querying
Window functions in MySQL 8.0
Exploring Advanced SQL Techniques Using Analytic Functions
Swift Study #2
Php forum2015 tomas_final
MariaDB for Developers and Operators (DevOps)
Need for Speed: Mysql indexing
Oracle Database Advanced Querying (2016)
Advanced PL/SQL Optimizing for Better Performance 2016
The art of querying – newest and advanced SQL techniques
PL/SQL New and Advanced Features for Extreme Performance
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013
MySQL Performance Schema in Action: the Complete Tutorial
OSDC 2012 | Scaling with MongoDB by Ross Lawley
The Ring programming language version 1.5.1 book - Part 43 of 180
Postgres rules
The Ring programming language version 1.7 book - Part 41 of 196
Cassandra 3.0 - JSON at scale - StampedeCon 2015
MySQL Performance Schema in Action
Ad

Similar to 5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with MariaDB ColumnStore (20)

PDF
Big Data Analytics with MariaDB ColumnStore
PDF
03 2017Emea_RoadshowMilan-WhatsNew-Mariadbserver10_2andmaxscale 2_1
PDF
Improving MariaDB’s Query Optimizer with better selectivity estimates
PDF
What’s New in MariaDB Server 10.2
PDF
Big Data Analytics with MariaDB ColumnStore
PPT
Applied Partitioning And Scaling Your Database System Presentation
PDF
What is MariaDB Server 10.3?
PPTX
In memory databases presentation
PDF
Big Data Analytics with MariaDB ColumnStore
PDF
MySQL SQL Tutorial
PDF
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
PDF
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
PDF
MariaDB ColumnStore
PDF
MariaDB ColumnStore
PDF
Optimizer percona live_ams2015
PDF
How to Use JSON in MySQL Wrong
PDF
Modern query optimisation features in MySQL 8.
PPTX
Alasql JavaScript SQL Database Library: User Manual
PDF
MySQL 5.7 Tutorial Dutch PHP Conference 2015
PDF
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
Big Data Analytics with MariaDB ColumnStore
03 2017Emea_RoadshowMilan-WhatsNew-Mariadbserver10_2andmaxscale 2_1
Improving MariaDB’s Query Optimizer with better selectivity estimates
What’s New in MariaDB Server 10.2
Big Data Analytics with MariaDB ColumnStore
Applied Partitioning And Scaling Your Database System Presentation
What is MariaDB Server 10.3?
In memory databases presentation
Big Data Analytics with MariaDB ColumnStore
MySQL SQL Tutorial
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
MariaDB ColumnStore
MariaDB ColumnStore
Optimizer percona live_ams2015
How to Use JSON in MySQL Wrong
Modern query optimisation features in MySQL 8.
Alasql JavaScript SQL Database Library: User Manual
MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
Ad

More from Kangaroot (20)

PPTX
So you think you know SUSE?
PDF
Live demo: Protect your Data
PDF
RootStack - Devfactory
PDF
Welcome at OPEN'22
PDF
EDB Postgres in Public Sector
PDF
Deploying NGINX in Cloud Native Kubernetes
PDF
Cloud demystified, what remains after the fog has lifted.
PDF
Zimbra at Kangaroot / OPEN{virtual}
PDF
NGINX Controller: faster deployments, fewer headaches
PDF
Kangaroot EDB Webinar Best Practices in Security with PostgreSQL
PDF
Do you want to start with OpenShift but don’t have the manpower, knowledge, e...
PDF
Red Hat multi-cluster management & what's new in OpenShift
PDF
There is no such thing as “Vanilla Kubernetes”
PDF
Elastic SIEM (Endpoint Security)
PDF
Hashicorp Vault - OPEN Public Sector
PDF
Kangaroot - Bechtle kadercontracten
PDF
Red Hat Enterprise Linux 8
PDF
Kangaroot open shift best practices - straight from the battlefield
PDF
Kubecontrol - managed Kubernetes by Kangaroot
PDF
OpenShift 4, the smarter Kubernetes platform
So you think you know SUSE?
Live demo: Protect your Data
RootStack - Devfactory
Welcome at OPEN'22
EDB Postgres in Public Sector
Deploying NGINX in Cloud Native Kubernetes
Cloud demystified, what remains after the fog has lifted.
Zimbra at Kangaroot / OPEN{virtual}
NGINX Controller: faster deployments, fewer headaches
Kangaroot EDB Webinar Best Practices in Security with PostgreSQL
Do you want to start with OpenShift but don’t have the manpower, knowledge, e...
Red Hat multi-cluster management & what's new in OpenShift
There is no such thing as “Vanilla Kubernetes”
Elastic SIEM (Endpoint Security)
Hashicorp Vault - OPEN Public Sector
Kangaroot - Bechtle kadercontracten
Red Hat Enterprise Linux 8
Kangaroot open shift best practices - straight from the battlefield
Kubecontrol - managed Kubernetes by Kangaroot
OpenShift 4, the smarter Kubernetes platform

Recently uploaded (20)

PDF
Getting Started with Data Integration: FME Form 101
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
Spectroscopy.pptx food analysis technology
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Tartificialntelligence_presentation.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Getting Started with Data Integration: FME Form 101
NewMind AI Weekly Chronicles - August'25-Week II
SOPHOS-XG Firewall Administrator PPT.pptx
Spectroscopy.pptx food analysis technology
Unlocking AI with Model Context Protocol (MCP)
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Accuracy of neural networks in brain wave diagnosis of schizophrenia
The Rise and Fall of 3GPP – Time for a Sabbatical?
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Advanced methodologies resolving dimensionality complications for autism neur...
MYSQL Presentation for SQL database connectivity
MIND Revenue Release Quarter 2 2025 Press Release
20250228 LYD VKU AI Blended-Learning.pptx
Machine learning based COVID-19 study performance prediction
Empathic Computing: Creating Shared Understanding
Tartificialntelligence_presentation.pptx
Assigned Numbers - 2025 - Bluetooth® Document
Programs and apps: productivity, graphics, security and other tools
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...

5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with MariaDB ColumnStore

  • 1. MariaDB 10.2 News and MariaDB ColumnStore Anders Karlsson Sales Engineer, MariaDB [email protected]
  • 2. Agenda • About Anders Karlsson • MariaDB 10.2 news – MariaDB 10.2 overview – JSON support – Window functions – Common Table Expressions • MariaDB ColumnStore – What is Columnar storage – MariaDB ColumnStore Architecture – MariaDB ColumnStore FAQ • Questions and Answers
  • 3. About Anders Karlsson • I have been in the database business for more then 30 years • I have worked as a support engineer, porting engineer, performance consultant, trainer but mainly as a Sales Engineer • I have worked for Oracle, Informix, TimesTen, MySQL etc. • I joined MySQL in 2004 and MariaDB in 2012 • Besides work I am also a beer aficionado, have an interest in old cars and computers of dubious quality and I try to spend as much time as possible with my 4-year old twins, Freddy and Freja
  • 5. What’s New in MariaDB Server 10.2 Analytics SQL ● Window Functions ● Common Table Expressions (CTE) JSON ● JSON Functions ● GeoJSON Functions Replication ● Delayed Replication ● Restrict the speed of reading binlog from Master ● Compressed Binary Log Database Compatibility ● Multi-Trigger Support ● CHECK Constraint Expression Support ● EXECUTE IMMEDIATE statement ● Full Support for DEFAULT ● DECIMAL increased from 30 to 38 digits Storage Engine Enhancements ● New MyRocks based Storage Engine ● Enhancements from MySQL InnoDB 5.7 ● Enable InnoDB NUMA interleave for InnoDB
  • 6. What’s New in MariaDB Server 10.2 Security ● Per User Server Load Limitations ● Enforced TLS Connections Administration ● New functions for User Management ● Enhanced Information from EXPLAIN ● User defined variables added to Information Schema ● Binary Log based Flashback Performance ● Indexes for non-Persistent Virtual Columns ● Support bulk inserts with MariaDB Connectors ● New Option to define a directory for InnoDB temporary files Server-Internal Optimisations ● Internal Use of MariaDB Connector/C ● Optimizer Enhancements ● Non-Locking ANALYZE TABLE Other Enhancements ● Lifted limitations for Virtual Computed Columns ● Subquery-Support for Views ● Multiple Use of Temporary Tables in Query
  • 7. CSV JSON • Not standardized • Undefined format • Character set issues • Undefined escaping • Tab, Coma separator • Multi line data issues • Standardized • Format defined by standard • Self-describing format • UTF8 enforced • Fixed set of known datatypes • Object
  • 8. MariaDB Server 10.2 JSON functions • No JSON data type in MariaDB 10.2, use VARCHAR / TEXT instead – No performance benefit from a JSON datatype – MariaDB Server 10.2 CHECK constraints can be used to validate JSON • There are functions to – Validate JSON – Extract members from a JSON object – Add members to a JSON object – Work with JSON arrays – Search JSON objects – And more…
  • 9. Why I like JSON! • JSON (JavaScript Object Notation) – But not only for objects! • JSON is easy to use, read, write and edit • JSON is standardized – But not the extent to make the standard more important than the data itself – The standard fits, mor or less, on a single web page (http://www.json.org) – JSON is independent on the application, operating system, database system • JSON types are few, simple and obvious – NUMBER, UTF8 STRING, BOOLEAN, NULL, ARRAY and OBJECT
  • 10. JSON Datatypes • A JSON Object is a collection of named elements which should be unique – {”color”: ”red”, ”size”: [”S”,”M”.”L”]} – Non uniquely named elements are not strictly forbidden in the standard, but most implementations • A JSON Array is an unordered list of elements – [”A”, 57, ”B”] is the same as [57, ”B”, ”A”] • JSON Strings are UTF8 and UTF8 only! • JSON Numbers should be limited to IEEE754 64-bit floating point values – JSON in itself actually doesn’t impose a limitation – But many implementations do, including Java Script
  • 11. JSON in Practice • Tables containing current stock for a garment reseller – Note CHECK constraint in stock table MariaDB> CREATE TABLE product(prod_id INTEGER NOT NULL PRIMARY KEY, prod_name VARCHAR(256) NOT NULL); MariaDB> CREATE TABLE stock(stock_id INTEGER NOT NULL PRIMARY KEY, prod_id INTEGER NOT NULL, price DECIMAL(8,2) NOT NULL, quantity INTEGER NOT NULL, attr_json TEXT, CHECK(JSON_VALID(attr_json) OR attr_json IS NULL), FOREIGN KEY(prod_id) REFERENCES product(prod_id));
  • 12. JSON in Practice – Adding data • Inserting data into a JSON column is no different than other columns, it’s not even a separate data type. MariaDB> INSERT INTO stock VALUES(1, 1, 20.99, 57, '{"color": "denim", "size": ["S","M","XL"]}'); MariaDB> INSERT INTO stock VALUES(2, 2, 12.99, 123, '{"size": ["S","M","XL"], "color schemes": [{"name": "bold", "color": ["blue", "red"]}, {"name": "plain", "color": ["white"]}]}'); MariaDB> INSERT INTO stock VALUES(3, 2, 10.99, 156, '"color": "blue"'); ERROR 4025 (23000): CONSTRAINT `CONSTRAINT_1` failed for `inventory`.`stock`
  • 13. JSON in Practice – Extracting data • The JSON_xxx functions introduced in MariaDB 10.2 allows extensive updates, modification, validation and querying of JSON data • The JSON extraction functions use a “JSON path” to determine what to extract • Find all clothes that are available in size XL MariaDB> SELECT p.prod_name, s.quantity FROM stock s JOIN product p ON s.prod_id = p.prod_id WHERE JSON_CONTAINS(JSON_EXTRACT(attr_json, '$.size'), '"XL"'); +-----------+----------+ | prod_name | quantity | +-----------+----------+ | Jeans | 57 | +-----------+----------+
  • 14. JSON and Indexing • So, can I create an index on a JSON element? • Yes, but currently that is achieved by using virtual columns MariaDB> ALTER TABLE stock ADD color VARCHAR(255) AS (JSON_VALUE(JSON_EXTRACT(attr_json, '$**.color'), '$[0][0]')); MariaDB> CREATE INDEX stock_ix1 ON stock(color); MariaDB> EXPLAIN SELECT * FROM stock WHERE color = 'denim'; +------+-------------+-------+------+---------------+-----------+---------+-------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+-----------+---------+-------+------+-------------+ | 1 | SIMPLE | stock | ref | stock_ix1 | stock_ix1 | 258 | const | 1 | Using where | +------+-------------+-------+------+---------------+-----------+---------+-------+------+-------------+
  • 15. MariaDB 10.2 Window Functions Groups within groups and ranking
  • 16. Window functions • Window functions allow you to operate on groups of rows • Window functions allow you to create subgroups • Window functions allow you to relate rows to eachother MariaDB> CREATE TABLE tx(tx_no INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT, acct_no INTEGER NOT NULL, tx_amt DECIMAL(10,2) NOT NULL, tx_date DATETIME NOT NULL); MARIADB> INSERT INTO tx VALUES(NULL, 100, 1680.56, '2017-01-15 10:30:00'); MARIADB> INSERT INTO tx VALUES(NULL, 100, 20000.00, '2017-01-15 11:00:00'); MARIADB> INSERT INTO tx VALUES(NULL, 100, -188.60, '2017-01-16 21:00:00'); MARIADB> INSERT INTO tx VALUES(NULL, 100, -256.80, '2017-01-17 16:00:00'); MARIADB> INSERT INTO tx VALUES(NULL, 110, 20000.00, '2017-01-12 11:20:00'); …
  • 17. Subgroups using Window functions • Select from the transaction table and include the current account balance – The usual SUM() aggregate function is used as the Window function – The group is specified by the PARTITION keyword – First using old style JOIN with aggregate MariaDB> SELECT t1.acct_no, t1.tx_amt, t2.balance FROM tx t1 JOIN (SELECT acct_no, SUM(tx_amt) balance FROM tx GROUP BY acct_no) t2 ON t1.acct_no = t2.acct_no; +---------+----------+----------+ | acct_no | tx_amt | balance | +---------+----------+----------+ | 100 | 1680.56 | 39144.16 | | 100 | 20000.00 | 39144.16 | | 100 | -188.60 | 39144.16 | | 100 | -256.80 | 39144.16 | | 110 | 20000.00 | 37963.30 | | 110 | -185.00 | 37963.30 | …
  • 18. Subgroups using Window functions • Using Window functions instead of a JOIN with an aggregate MariaDB> SELECT acct_no, tx_amt, SUM(tx_amt) OVER (PARTITION BY acct_no) balance FROM tx; +---------+----------+----------+ | acct_no | tx_amt | balance | +---------+----------+----------+ | 100 | 1680.56 | 39144.16 | | 100 | 20000.00 | 39144.16 | | 100 | -188.60 | 39144.16 | | 100 | -256.80 | 39144.16 | | 110 | 20000.00 | 37963.30 | | 110 | -185.00 | 37963.30 | …
  • 19. Ranking rows • Rank transactions by amount, grouped by month • The ROW_NUMBER(), RANK() or DENSE_RANK() provide ranking – ROW_NUMBER() as 1,2,3,4 – RANK() as 1,2,2,4 – DENSE_RANK() as 1,2,2,3 • Ranking row by row as defined by ORDER BY • Groups of ranking as defined by PARTITION RANK() OVER (PARTITION BY <part> ORDER BY <rank expr>)
  • 20. Rank and group rows MariaDB> SELECT acct_no, MONTH(tx_date), tx_amt, DENSE_RANK() OVER(PARTITION BY MONTH(tx_date) ORDER BY -tx_amt) rank FROM tx ORDER BY MONTH(tx_date), rank; +---------+----------------+----------+------+ | acct_no | MONTH(tx_date) | tx_amt | rank | +---------+----------------+----------+------+ | 110 | 1 | 20000.00 | 1 | | 100 | 1 | 20000.00 | 1 | | 120 | 1 | 18000.00 | 2 | | 100 | 1 | 1680.56 | 3 | … | 110 | 1 | -271.00 | 10 | | 110 | 2 | 20000.00 | 1 | | 100 | 2 | 20000.00 | 1 | | 120 | 2 | 18000.00 | 2 | | 110 | 2 | -170.70 | 3 | … SUBGROUP by MONTH RANK by transaction amount
  • 21. MariaDB 10.2 Common Table Expressions Say goodbye to temporary tables Say hello to recursive SQL
  • 22. MariaDB 10.2 – Common Table Expressions • Introduced in the SQL Standard in SQL:1999 • CTE are like temporary tables that are part of a single SQL SELECT query • CTEs can be recursive or non-recursive • A single SELECT may contain one or more CTEs WITH <alias> AS (<SELECT query>)[, <alias> AS (<SELECT query>)] SELECT ...; WITH RECURSIVE <alias> AS (<SELECT query>)[, <alias> AS (<SELECT query>)] SELECT ...;
  • 23. MariaDB 10.2 – Non recursive CTEs • Useful in most cases where a temporary table would be used • For example when using different levels or types of aggregation • SELECT the average transaction amount by month for each account – We need to compute the total transaction amount for all accounts, group by month – Then we need to average that across all months • First, using temporary tables MariaDB> CREATE TEMPORARY TABLE tmp1 AS SELECT acct_no, MONTH(tx_date), SUM(tx_amt) amt FROM tx GROUP BY acct_no, MONTH(tx_date); MariaDB> SELECT acct_no, AVG(amt) FROM tmp1 GROUP BY acct_no; +---------+--------------+ | acct_no | avg(amt) | +---------+--------------+ | 100 | 19572.080000 | | 110 | 18981.650000 | | 120 | 18368.000000 | +---------+--------------+
  • 24. MariaDB 10.2 – Non recursive CTEs • Now, using a single SQL statement using a Common Table Expression MariaDB> WITH cte1 AS (SELECT acct_no, MONTH(tx_date), SUM(tx_amt) amt FROM tx GROUP BY acct_no, MONTH(tx_date)) SELECT acct_no, AVG(amt) FROM cte1 GROUP BY acct_no; +---------+--------------+ | acct_no | AVG(amt) | +---------+--------------+ | 100 | 19572.080000 | | 110 | 18981.650000 | | 120 | 18368.000000 | +---------+--------------+
  • 25. MariaDB 10.2 – Recursive CTEs • Allows a query to reference the result of "the same" query • Has two components – An "anchor" where the query "starts" – A second part, using a UNION, that joins with the previous results • The query "stops" recursion when no more rows are returned • An outer query queries that result of the recursive query • Sounds complicated? It's not, let's use an example
  • 26. MariaDB 10.2 – Recursive CTEs • Let's say we have a table with parts that make up other parts in a hierarchy MariaDB> CREATE TABLE parts(part_id INTEGER NOT NULL PRIMARY KEY, part_name VARCHAR(50) NOT NULL, part_of INTEGER, part_count INTEGER); MariaDB> ALTER TABLE parts ADD FOREIGN KEY(part_of) REFERENCES parts(part_id); MariaDB> INSERT INTO parts VALUES(1, 'Engine', NULL, NULL); MariaDB> INSERT INTO parts VALUES(2, 'Cylinder', 1, 4); MariaDB> INSERT INTO parts VALUES(3, 'Bolt', 2, 12); MariaDB> INSERT INTO parts VALUES(4, 'Carburettor', 1, 2); MariaDB> INSERT INTO parts VALUES(5, 'Screw', 4, 12); MariaDB> INSERT INTO parts VALUES(6, 'Screw', 2, 8); • Now, how many do I need of each of these parts to build an engine?
  • 27. MariaDB 10.2 – Recursive CTEs MariaDB> WITH RECURSIVE cte1 AS( SELECT part_id, part_name, 1 part_count FROM parts WHERE part_name = 'Engine' UNION SELECT p.part_id, p.part_name, p.part_count * cte1.part_count FROM parts p JOIN cte1 ON p.part_of = cte1.part_id) SELECT part_name, SUM(part_count) FROM cte1 GROUP BY part_name; • The recursive query starts with the engine of which we have stated we need 1 • The recursive query joins the part_of column to the part_id of the "parent" • We compute the part count for each level by multiplying the count of parent component with the count of the current component • The outer query aggregates the part count by part Anchor query Recursion specification
  • 29. ColumnStore vs. Existing approaches Limited real time analytics Slow releases of product innovation Expensive hardware and software Data Warehouses Hadoop / NoSQL LIMITED SQL SUPPORT DIFFICULT TO INSTALL/MANAGE LIMITED TALENT POOL DATA LAKE W/ NO DATA MANAGEMENT Hard to use
  • 30. MariaDB ColumnStore High performance columnar storage engine that support wide variety of analytical use cases with SQL in a highly scalable distributed environments Parallel query processing for distributed environments Faster, More Efficient Queries Single SQL Interface for OLTP and analytics Easier Enterprise Analytics Power of SQL and Freedom of Open Source to Big Data Analytics Better Price Performance
  • 31. Easier Enterprise Analytics ANSI SQL Single SQL Front-end • Use a single SQL interface for analytics and OLTP • Leverage MariaDB Security features - Encryption for data in motion , role based access and auditability Full ANSI SQL • Support complex join, aggregation and window function Easy to manage and scale • Eliminate needs for indexes and views • Automated partitioning • Linear scalable by adding new nodes as data grows • Out of box connection with BI tools • High compression level
  • 32. Row oriented vs. Column oriented Row-oriented: rows stored sequentially in a file Key Fname Lname State Zip Phone Age Sales 1 Bugs Bunny NJ 11217 (123) 938-3235 34 100 2 Yosemite Sam CT 95389 (234) 375-6572 52 500 3 Daffy Duck IA 10013 (345) 227-1810 35 200 4 Elmer Fudd CT 04578 (456) 882-7323 43 10 5 Witch Hazel CT 01970 (567) 744-0991 57 250 Column-oriented: each column is stored in a separate file. Each column for a given row is at the same offset. Key 1 2 3 4 5 Fname Bugs Yosemite Daffy Elmer Witch Lname Bunny Sam Duck Fudd Hazel State NJ CT IA CT CT Zip 11217 95389 10013 04578 01970 Phone (123) 938-3235 (234) 375-6572 (345) 227-1810 (456) 882-7323 (567) 744-0991 Age 34 52 35 43 57 Sales 100 500 200 10 250
  • 33. MariaDB ColumnStore - Architecture Columnar Distributed Data Storage Local Storage | SAN | EBS | Gluster FS BI Tool SQL Client Custom Big Data App Application MariaDB SQL Front End Distributed Query Engine Data Storage
  • 34. MariaDB ColumnStore – Disk storage • Vertical Partitioning by Column – Each column in its own column file – Only do I/O for columns requested Logical Layer Physical Layer Table Column1 ColumnN Extent 1 (8MB~64MB 8 million rows) Extent N (8MB~64MB 8 million rows) Segment File1 (Extent) Segment FileN (Extent) Server DB Root Blocks (8KB) • Horizontal Partitioning by range of rows – Logical grouping of 8 million rows of each column file – In-memory mapping of extent to physical layer
  • 35. MariaDB ColumnStore FAQ • Is MariaDB ColumnStore also good for OLTP applications? – No, not when using the ColumnStore Storage Engine, but the InnoDB Engine is also included with MariaDB ColumnStore • Does MariaDB ColumnStore support parallel processing? – Yes, the processing on each PM (Performance Module) is parallelized • Can I use DML, i.e. INSERT, UPDATE and DELETE with the ColumnStore Storage Engine? – Yes you can, but it is slow, use the cpimport tool instead. Also LOAD DATA INFILE and INSERT INTO … SELECT is fast, as these are mapped to cpimport • Can I use an OLTP Schema with MariaDB ColumnStore? – Yes, but for the kind of analytical queries the ColumnStore Storage Engine is aimed at, a star or snowflake schema is recommended
  • 36. MariaDB ColumnStore FAQ • Is data loading in MariaDB ColumnStore parallelized? – Yes, but it depends on how you load data. Using cpimport you can load in parallel across the PMs • What is the compression ratio that can be achieved? – Typically 5 – 15 times • Can you join a ColumnStore table with an InnoDB table? – Yes, this will always work. To acheve best performance you can have this be performed by the PMs by having the UM push the InnoDB table contents to PMs. This is not enabled by default and has to be configured in MariaDB ColumnStore separately • Is ColumnStore a Storage Engine? – Yes and no. It is a Storage Engine, but it is more than that. The PMs are completely separate, and the MariaDB Server has had some adjustments to the optimizer and some other parts of the MariaDB code. It is planned for ColumnStore to become a Storage Engine eventually.
  • 37. MariaDB ColumnStore - Summary • High level of compression • Fixed data position per column – Longer columns have extents • No indexes – No data loading slowdown – Fast loading of data – Consistent data load speeds • Meta data stored in memory • Scalable and distributed architecture