SlideShare a Scribd company logo
Histograms in MariaDB,
MySQL and PostgreSQL
Sergei Petrunia, MariaDBSergei Petrunia, MariaDB
Santa Clara, California | April 24th – 27th, 2017Santa Clara, California | April 24th – 27th, 2017
2
What this talk is about
● Data statistics histograms in
– MariaDB
– MySQL (status so far)
– PostgreSQL
● This is not a competitive comparison
– Rather, a survey
Histograms and query optimizers
Click to add textClick to add text
4
Query optimizer needs data statistics
● Which query plan enumerates fewer rows
– orders->customers or customers->orders?
● It depends on row counts and condition selectivities
● Condition selectivity has a big impact on query speed
select *
from
customers join orders on customer.cust_id=orders.customer_id
where
customers.balance<1000 and
orders.total>10K
5
Data statistics has a big impact on optimizer
● A paper "How good are query optimizers, really?"
– Leis et al, VLDB 2015
● Conclusions section:
– "In contrast to cardinality estimation, the contribution of the cost
model to the overall query performance is limited."
● This matches our experience
6
Data statistics usage
● Need a *cheap* way to answer questions about
– Numbers of rows in the table
– Condition selectivity
– Column widths
– Number of distinct values
– …
● Condition selectivity is the most challenging
7
Histogram as a compact data summary
● Partition the value space into buckets
● Keep an array of (bucket_bounds, n_values)
– Takes O(#buckets) space
8
Histogram and condition selectivity
col BETWEEN ‘a’ AND ‘b’
● Sum row counts in the covered
buckets
● Partially covered bucket?
– Assume a fraction of rows match
– This is a source of inaccuracy
● More buckets – more accurate estimates
9
Histogram types
● Different strategies for choosing buckets
– Equi-width
– Equi-height
– Most Common Values
– ...
10
Equi-width histogram
● Bucket bounds pre-defined
– Equal, log-scale, etc
● Easy to understand, easy
to collect.
● Not very efficient
– Densely and sparsely-populated regions have the same
#buckets
– What if densely-populated regions had more buckets?
11
Equi-height histogram
● Pick the bucket bounds such
that each bucket has the
same #rows
– Densely populated areas get
more buckets
– Sparsely populated get fewer
buckets
● Estimation error is limited by
bucket size
– Which is now limited.
12
Most Common Values histogram
● Suitable for enum-type
domains
● All possible values fit in the
histogram
● Just a list of values and
frequencies
value1 count1
value2 count2
value3 count3
... ...
13
Histogram collection algorithms
● Equi-width
– Find (or guess) min and max value
– For each value
● Find which histogram bin it falls into
● Increment bin’s counter
● Equi-height
– Sort the values
– First value starts bin #0
– Value at n_values * (1/n_bins) starts bin #2
– Value at n_values * (2/n_bins) starts bin #3
– ...
14
Histogram collection strategies
● Scan the whole dataset
– Used by MariaDB
– Produces a “perfect” histogram
– May be expensive
● Do random sampling
– Used by PostgreSQL (MySQL going to do it, too?)
– Produces imprecise histograms
– Non-deterministic results
● Incremental updates
– hard to do, not used
15
Summary so far
● Query optimizers need condition selectivities
● These are provided by histograms
● Histograms are compact data summaries
● Histogram types
– Width-balanced
– Height-balanced (better)
– Most-Common-Values
● Histogram collection methods
– Scan the whole dataset
– Do random sampling.
Histograms in MariaDB
Click to add textClick to add text
17
Histograms in MariaDB
● Available in MariaDB 10.0
– (Stable since March, 2014)
● Used in the real world
● Good for common use cases
– has some limitations
● Sometimes are called “Engine-Independent Table Statistics”
– Although being engine-independent is not the primary point.
18
Histogram storage in MariaDB
● Are stored in mysql.column_stats table
CREATE TABLE mysql.column_stats (
db_name varchar(64) NOT NULL,
table_name varchar(64) NOT NULL,
column_name varchar(64) NOT NULL,
min_value varbinary(255) DEFAULT NULL,
max_value varbinary(255) DEFAULT NULL,
nulls_ratio decimal(12,4) DEFAULT NULL,
avg_length decimal(12,4) DEFAULT NULL,
avg_frequency decimal(12,4) DEFAULT NULL,
hist_size tinyint unsigned,
hist_type enum('SINGLE_PREC_HB','DOUBLE_PREC_HB'),
histogram varbinary(255),
PRIMARY KEY (db_name,table_name,column_name)
);
● Very compact: max 255 bytes (per column)
19
Collecting a histogram
set histogram_size=255;
set histogram_type='DOUBLE_PREC_HB';
analyze table tbl persistent for all;
analyze table tbl persistent for columns (col1, col2) indexes ();
+----------+---------+----------+-----------------------------------------+
| Table | Op | Msg_type | Msg_text |
+----------+---------+----------+-----------------------------------------+
| test.tbl | analyze | status | Engine-independent statistics collected |
| test.tbl | analyze | status | OK |
+----------+---------+----------+-----------------------------------------+
● Manual collection only
set use_stat_tables='preferably';
set optimizer_use_condition_selectivity=4;
<query>;
● Make the optimizer use it
20
Examining a histogram
select * from mysql.column_stats
where table_name='pop1980_cp' and column_name='firstname'
*************************** 1. row ***************************
db_name: babynames
table_name: pop1980_cp
column_name: firstname
min_value: Aaliyah
max_value: Zvi
nulls_ratio: 0.0000
avg_length: 6.0551
avg_frequency: 194.4642
hist_size: 32
hist_type: DOUBLE_PREC_HB
histogram: � ���C)�G�[j��fzz�z]����3�
select decode_histogram(hist_type,histogram)
from mysql.column_stats where table_name='pop1980_cp' and column_name='firstname'
*************************** 1. row ***************************
decode_histogram(hist_type,histogram):
0.00201,0.04048,0.03833,0.03877,0.04158,0.11852,0.07912,0.00218,0.00093,0.03940,
0.07710,0.00124,0.08035,0.11992,0.03877,0.03989,0.24140
21
Histograms in MariaDB - summary
● Available since MariaDB 10.0
● Special ANALYZE command to collect stats
– Does a full table scan
– May require a lot of space for big VARCHARs:
MDEV-6529 “EITS ANALYZE uses disk space inefficiently
for VARCHAR columns”
● Not used by the optimizer by default
– Special settings to get optimizer to use them.
Histograms in PostgreSQL
Click to add textClick to add text
23
Histograms in PostgreSQL
● Data statistics
– Fraction of NULL-values
– Most common value (MCV) list
– Height-balanced histogram (excludes MCV values)
– A few other parameters
● avg_length
● n_distinct_values
● ...
● Collection algorithm
– One-pass random sampling
24
Collecting histograms in PostgreSQL
-- Global parameter specifying number of buckets
-- the default is 100
set default_statistics_target=N;
-- Can also override for specific columns
alter table tbl alter column_name set statistics N;
-- Collect the statistics
analyze tablename;
# number of inserted/updated/deleted tuples to trigger an ANALYZE
autovacuum_analyze_threshold = N
# fraction of the table size to add to autovacuum_analyze_threshold
# when deciding whether to trigger ANALYZE
autovacuum_analyze_scale_factor=N.N
postgresql.conf, or per-table
25
Examining the histogram
select * from pg_stats where tablename='pop1980';
tablename | pop1980
attname | firstname
inherited | f
null_frac | 0
avg_width | 7
n_distinct | 9320
most_common_vals | {Michael,Jennifer,Christopher,Jason,David,James,
Matthew,John,Joshua,Amanda}
most_common_freqs | {0.0201067,0.0172667,0.0149067,0.0139,0.0124533,
0.01164,0.0109667,0.0107133,0.0106067,0.01028}
histogram_bounds | {Aaliyah,Belinda,Christine,Elsie,Jaron,Kamia,
Lindsay,Natasha,Robin,Steven,Zuriel}
correlation | 0.0066454
most_common_elems |
26
Histograms are collected by doing sampling
● src/backend/commands/analyze.c, std_typanalyze() refers to
● "Random Sampling for Histogram Construction: How much is enough?”
– Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya, ACM SIGMOD, 1998.
Histogram size
Rows in table (=10^6)
Max relative error in bin (=0.5)
Error probability (=0.01)
Random
sample size
● 100 buckets = 30,000 rows sample
27
Histogram sampling in PostgreSQL
● 30K rows are sampled from random locations in the table
– Does a skip scan forward
– “Randomly chosen rows in randomly chosen blocks”
● Choice of Most Common Values
– Sample values that are 25% more common than average
– Values that would take more than one histogram bucket.
– All seen values are MCVs? No histogram is built.
28
Beyond single-column histograms
● Conditions can be correlated
select ...
from order_items
where shipdate='2015-12-15' AND item_name='christmas light'
'swimsuit'
● Correlation can have a big effect
– MIN(1/n, 1/m)
– (1/n) * (1/m)
– 0
● Multi-column “histograms” are hard
● “Possible PostgreSQL 10.0 feature: multivariate statistics”
29
PostgreSQL: Conclusions
● Collects and uses both
– Height-balanced histogram
– Most Common Values list
● Uses sampling for collection
● Can run ANALYZE yourself
– Or VACUUM will do it automatically
● Multivariate stats are in the plans
30
Histogram test - MariaDB
● Real world data, people born in 1980
MariaDB [babynames]> analyze select count(*) from pop1980 where firstname='Jennifer';
+------+-------------+---------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra |
+------+-------------+---------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+
| 1 | SIMPLE | pop1980 | ALL | NULL | NULL | NULL | NULL | 3444156 | 3444156.00 | 4.69 | 1.70 | Using where |
+------+-------------+---------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+
MariaDB [babynames]> analyze select count(*) from pop1980 where firstname='Allison';
+------+-------------+---------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra |
+------+-------------+---------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+
| 1 | SIMPLE | pop1980 | ALL | NULL | NULL | NULL | NULL | 3444156 | 3444156.00 | 2.89 | 0.14 | Using where |
+------+-------------+---------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+
MariaDB [babynames]> analyze select count(*) from pop1980 where firstname='Jennice';
+------+-------------+---------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra |
+------+-------------+---------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+
| 1 | SIMPLE | pop1980 | ALL | NULL | NULL | NULL | NULL | 3444156 | 3444156.00 | 4.69 | 0.00 | Using where |
+------+-------------+---------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+
Jennifer 58,381
Allison, 4,868
Jennice, 7
2.75x
20x
?x
31
Histogram test - PostgreSQL
● Real world data, people born in 1980
Jennifer 58,381
Allison, 4,868
Jennice, 7
test=# explain analyze select count(*) from pop1980 where firstname='Jennifer';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Aggregate (cost=68456.71..68456.71 rows=1 width=8) (actual time=372.593..372.593 rows=1 loops=1)
-> Seq Scan on pop1980 (cost=0.00..68312.62 rows=57632 width=0) (actual time=0.288..366.058 rows=58591 loops=1)
Filter: ((firstname)::text = 'Jennifer'::text)
Rows Removed by Filter: 3385539
Planning time: 0.098 ms
Execution time: 372.625 ms
test=# explain analyze select count(*) from pop1980 where firstname='Allison';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Aggregate (cost=68313.66..68313.67 rows=1 width=8) (actual time=372.415..372.415 rows=1 loops=1)
-> Seq Scan on pop1980 (cost=0.00..68312.62 rows=413 width=0) (actual time=119.238..372.023 rows=4896 loops=1)
Filter: ((firstname)::text = 'Allison'::text)
Rows Removed by Filter: 3439234
Planning time: 0.086 ms
Execution time: 372.447 ms
test=# explain analyze select count(*) from pop1980 where firstname='Jennice';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Aggregate (cost=68313.66..68313.67 rows=1 width=8) (actual time=345.966..345.966 rows=1 loops=1)
-> Seq Scan on pop1980 (cost=0.00..68312.62 rows=413 width=0) (actual time=190.896..345.961 rows=7 loops=1)
Filter: ((firstname)::text = 'Jennice'::text)
Rows Removed by Filter: 3444123
Planning time: 0.388 ms
Execution time: 346.010 ms
0.9x
0.08x
103x
Histograms in MySQL
Click to add textClick to add text
33
Histograms in MySQL
● Not available for use in MySQL 8.0.1
● There are pieces of histogram code, still
– This gives some clues
● Another feature that uses histograms: P_S statement latencies
– P_S.events_statements_histogram_global
P_S.events_statements_histogram_by_digest
– These are totally different kind of histogram
● Buckets are log-scale equi-width.
34
Sampling
● Currently only has a default implementation only
– Which does a full table scan and “rolls the dice” for each row
● Assume there will be an InnoDB implementation
enum class enum_sampling_method { SYSTEM };
class handler {
...
int ha_sample_init(double sampling_percentage, int sampling_seed,
enum_sampling_method sampling_method);
int ha_sample_next(uchar *buf);
int ha_sample_end();
● New methods for storage engine API
35
Histogram storage
● Will be stored in mysql.column_stats table
CREATE TABLE mysql.column_stats (
database_name varchar(64) COLLATE utf8_bin NOT NULL,
table_name varchar(64) COLLATE utf8_bin NOT NULL,
column_name varchar(64) COLLATE utf8_bin NOT NULL,
histogram json NOT NULL,
PRIMARY KEY (database_name,table_name,column_name)
);
● Will be stored as JSON
– No limits on size?
36
“Singleton” histograms
● This is what PostgreSQL calls “Most Common Values”
{
"last-updated": "2015-11-04 15:19:51.000000",
"histogram-type": "singleton",
"null-values": 0.1, // Fraction of NULL values
"buckets":
[
[
42, // Value, data type depends on the source column.
0.001978728666831561 // "Cumulative" frequency
],
…
]
}
37
Height-balanced histograms
{
"last-updated": "2015-11-04 15:19:51.000000",
"histogram-type": "equi-height",
"null-values": 0.1, // Fraction of NULL values
"buckets":
[
[
"bar", // Lower inclusive value
"foo", // Upper inclusive value
0.001978728666831561, // Cumulative frequency
10 // Number of distinct values in this bucket
],
...
]
}
38
Height-balanced histograms
...
"buckets":
[
[
"bar", // Lower inclusive value
"foo", // Upper inclusive value
0.001978728666831561, // Cumulative frequency
10 // Number of distinct values in this bucket
],
...
]
}
● Why “upper inclusive value”? To support holes? At cost of 2x histogram size?
● Why frequency in each bucket? it’s equi-height, so frequencies should be the
same?
● Per-bucket #distinct is interesting but doesn’t seem high-demand.
39
Histograms
● “Singleton”
● Height-balanced
● Both kinds store nulls_fraction Fraction of NULLs is stored
– In both kind of histograms so you can’t have both at the same time?
● Height-balanced allow for “gaps”
● Each bucket has #distinct (non-optional?)
40
MySQL histograms summary
● Seem to be coming in MySQL 8.0
● Support two types
– “Singleton”
– “Height-balanced”
● Both kinds store null-values so they are not used together?
● “Height-balanced”
– May have “holes”?
– Stores “frequency” for each bin (?)
● Collection will probably use sampling
– Which has only full scan implementation ATM
Conclusions
Click to add textClick to add text
42
Conclusions
● Histograms are compact data summaries for use by the optimizer
● PostgreSQL
– Has a mature implementation
– Uses sampling and auto-collection
●
MariaDB
– Supports histograms since MariaDB 10.0
● Compact
● Height-balanced only
– Need to run ANALYZE manually and set the optimizer to use them
● MySQL
– Don’t have histograms, still.
– Preparing to have them in 8.0
– Will support two kinds
● Most common values
● Height-balanced “with gaps” (?)
43
Thanks!
44
Rate My Session

More Related Content

What's hot (20)

Redo log improvements MYSQL 8.0
Redo log improvements MYSQL 8.0Redo log improvements MYSQL 8.0
Redo log improvements MYSQL 8.0
Mydbops
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDB
MariaDB plc
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
PostgreSQL-Consulting
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
Altinity Ltd
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
Altinity Ltd
 
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
Altinity Ltd
 
Using histograms to get better performance
Using histograms to get better performanceUsing histograms to get better performance
Using histograms to get better performance
Sergey Petrunya
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
 
M|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScaleM|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScale
MariaDB plc
 
Streaming Operational Data with MariaDB MaxScale
Streaming Operational Data with MariaDB MaxScaleStreaming Operational Data with MariaDB MaxScale
Streaming Operational Data with MariaDB MaxScale
MariaDB plc
 
MySQL Performance Schema in Action: the Complete Tutorial
MySQL Performance Schema in Action: the Complete TutorialMySQL Performance Schema in Action: the Complete Tutorial
MySQL Performance Schema in Action: the Complete Tutorial
Sveta Smirnova
 
What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0
MariaDB plc
 
SQL Tuning 101
SQL Tuning 101SQL Tuning 101
SQL Tuning 101
Carlos Sierra
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performance
oysteing
 
Achieving compliance With MongoDB Security
Achieving compliance With MongoDB Security Achieving compliance With MongoDB Security
Achieving compliance With MongoDB Security
Mydbops
 
Open Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and HistogramsOpen Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and Histograms
Frederic Descamps
 
MariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & OptimizationMariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & Optimization
MariaDB plc
 
10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouse10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouse
rpolat
 
Advanced MySQL Query Tuning
Advanced MySQL Query TuningAdvanced MySQL Query Tuning
Advanced MySQL Query Tuning
Alexander Rubin
 
Jvm tuning for low latency application & Cassandra
Jvm tuning for low latency application & CassandraJvm tuning for low latency application & Cassandra
Jvm tuning for low latency application & Cassandra
Quentin Ambard
 
Redo log improvements MYSQL 8.0
Redo log improvements MYSQL 8.0Redo log improvements MYSQL 8.0
Redo log improvements MYSQL 8.0
Mydbops
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDB
MariaDB plc
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
PostgreSQL-Consulting
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
Altinity Ltd
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
Altinity Ltd
 
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
Altinity Ltd
 
Using histograms to get better performance
Using histograms to get better performanceUsing histograms to get better performance
Using histograms to get better performance
Sergey Petrunya
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
 
M|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScaleM|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScale
MariaDB plc
 
Streaming Operational Data with MariaDB MaxScale
Streaming Operational Data with MariaDB MaxScaleStreaming Operational Data with MariaDB MaxScale
Streaming Operational Data with MariaDB MaxScale
MariaDB plc
 
MySQL Performance Schema in Action: the Complete Tutorial
MySQL Performance Schema in Action: the Complete TutorialMySQL Performance Schema in Action: the Complete Tutorial
MySQL Performance Schema in Action: the Complete Tutorial
Sveta Smirnova
 
What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0
MariaDB plc
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performance
oysteing
 
Achieving compliance With MongoDB Security
Achieving compliance With MongoDB Security Achieving compliance With MongoDB Security
Achieving compliance With MongoDB Security
Mydbops
 
Open Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and HistogramsOpen Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and Histograms
Frederic Descamps
 
MariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & OptimizationMariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & Optimization
MariaDB plc
 
10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouse10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouse
rpolat
 
Advanced MySQL Query Tuning
Advanced MySQL Query TuningAdvanced MySQL Query Tuning
Advanced MySQL Query Tuning
Alexander Rubin
 
Jvm tuning for low latency application & Cassandra
Jvm tuning for low latency application & CassandraJvm tuning for low latency application & Cassandra
Jvm tuning for low latency application & Cassandra
Quentin Ambard
 

Similar to Histograms in MariaDB, MySQL and PostgreSQL (20)

Improving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimatesImproving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimates
Sergey Petrunya
 
MariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it standMariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it stand
Sergey Petrunya
 
Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8
Sergey Petrunya
 
Understanding histogramppt.prn
Understanding histogramppt.prnUnderstanding histogramppt.prn
Understanding histogramppt.prn
Leyi (Kamus) Zhang
 
Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4
Sergey Petrunya
 
How to use histograms to get better performance
How to use histograms to get better performanceHow to use histograms to get better performance
How to use histograms to get better performance
MariaDB plc
 
Histograms: Pre-12c and now
Histograms: Pre-12c and nowHistograms: Pre-12c and now
Histograms: Pre-12c and now
Anju Garg
 
Histogram Support in MySQL 8.0
Histogram Support in MySQL 8.0Histogram Support in MySQL 8.0
Histogram Support in MySQL 8.0
oysteing
 
Histograms in 12c era
Histograms in 12c eraHistograms in 12c era
Histograms in 12c era
Mauro Pagano
 
Cardinality Estimation through Histogram in Apache Spark 2.3 with Ron Hu and ...
Cardinality Estimation through Histogram in Apache Spark 2.3 with Ron Hu and ...Cardinality Estimation through Histogram in Apache Spark 2.3 with Ron Hu and ...
Cardinality Estimation through Histogram in Apache Spark 2.3 with Ron Hu and ...
Databricks
 
Billion Goods in Few Categories: How Histograms Save a Life?
Billion Goods in Few Categories: How Histograms Save a Life?Billion Goods in Few Categories: How Histograms Save a Life?
Billion Goods in Few Categories: How Histograms Save a Life?
Sveta Smirnova
 
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
DataStax
 
Storing Cassandra Metrics
Storing Cassandra MetricsStoring Cassandra Metrics
Storing Cassandra Metrics
Chris Lohfink
 
MariaDB: Engine Independent Table Statistics, including histograms
MariaDB: Engine Independent Table Statistics, including histogramsMariaDB: Engine Independent Table Statistics, including histograms
MariaDB: Engine Independent Table Statistics, including histograms
Sergey Petrunya
 
Column Statistics in Hive
Column Statistics in HiveColumn Statistics in Hive
Column Statistics in Hive
vshreepadma
 
DB
DBDB
DB
Samchu Li
 
Histograms : Pre-12c and Now
Histograms : Pre-12c and NowHistograms : Pre-12c and Now
Histograms : Pre-12c and Now
Anju Garg
 
MariaDB Optimizer
MariaDB OptimizerMariaDB Optimizer
MariaDB Optimizer
JongJin Lee
 
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
Dave Stokes
 
DB2 Workload Manager Histograms
DB2 Workload Manager HistogramsDB2 Workload Manager Histograms
DB2 Workload Manager Histograms
Keith McDonald
 
Improving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimatesImproving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimates
Sergey Petrunya
 
MariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it standMariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it stand
Sergey Petrunya
 
Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8
Sergey Petrunya
 
Understanding histogramppt.prn
Understanding histogramppt.prnUnderstanding histogramppt.prn
Understanding histogramppt.prn
Leyi (Kamus) Zhang
 
Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4
Sergey Petrunya
 
How to use histograms to get better performance
How to use histograms to get better performanceHow to use histograms to get better performance
How to use histograms to get better performance
MariaDB plc
 
Histograms: Pre-12c and now
Histograms: Pre-12c and nowHistograms: Pre-12c and now
Histograms: Pre-12c and now
Anju Garg
 
Histogram Support in MySQL 8.0
Histogram Support in MySQL 8.0Histogram Support in MySQL 8.0
Histogram Support in MySQL 8.0
oysteing
 
Histograms in 12c era
Histograms in 12c eraHistograms in 12c era
Histograms in 12c era
Mauro Pagano
 
Cardinality Estimation through Histogram in Apache Spark 2.3 with Ron Hu and ...
Cardinality Estimation through Histogram in Apache Spark 2.3 with Ron Hu and ...Cardinality Estimation through Histogram in Apache Spark 2.3 with Ron Hu and ...
Cardinality Estimation through Histogram in Apache Spark 2.3 with Ron Hu and ...
Databricks
 
Billion Goods in Few Categories: How Histograms Save a Life?
Billion Goods in Few Categories: How Histograms Save a Life?Billion Goods in Few Categories: How Histograms Save a Life?
Billion Goods in Few Categories: How Histograms Save a Life?
Sveta Smirnova
 
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
DataStax
 
Storing Cassandra Metrics
Storing Cassandra MetricsStoring Cassandra Metrics
Storing Cassandra Metrics
Chris Lohfink
 
MariaDB: Engine Independent Table Statistics, including histograms
MariaDB: Engine Independent Table Statistics, including histogramsMariaDB: Engine Independent Table Statistics, including histograms
MariaDB: Engine Independent Table Statistics, including histograms
Sergey Petrunya
 
Column Statistics in Hive
Column Statistics in HiveColumn Statistics in Hive
Column Statistics in Hive
vshreepadma
 
Histograms : Pre-12c and Now
Histograms : Pre-12c and NowHistograms : Pre-12c and Now
Histograms : Pre-12c and Now
Anju Garg
 
MariaDB Optimizer
MariaDB OptimizerMariaDB Optimizer
MariaDB Optimizer
JongJin Lee
 
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
Dave Stokes
 
DB2 Workload Manager Histograms
DB2 Workload Manager HistogramsDB2 Workload Manager Histograms
DB2 Workload Manager Histograms
Keith McDonald
 
Ad

More from Sergey Petrunya (20)

New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12
Sergey Petrunya
 
MariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixesMariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixes
Sergey Petrunya
 
JSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger pictureJSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger picture
Sergey Petrunya
 
Optimizer Trace Walkthrough
Optimizer Trace WalkthroughOptimizer Trace Walkthrough
Optimizer Trace Walkthrough
Sergey Petrunya
 
ANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gemANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gem
Sergey Petrunya
 
Optimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesOptimizer features in recent releases of other databases
Optimizer features in recent releases of other databases
Sergey Petrunya
 
MariaDB 10.4 - что нового
MariaDB 10.4 - что новогоMariaDB 10.4 - что нового
MariaDB 10.4 - что нового
Sergey Petrunya
 
MariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit holeMariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit hole
Sergey Petrunya
 
Lessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkLessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmark
Sergey Petrunya
 
MyRocks in MariaDB | M18
MyRocks in MariaDB | M18MyRocks in MariaDB | M18
MyRocks in MariaDB | M18
Sergey Petrunya
 
New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3
Sergey Petrunya
 
MyRocks in MariaDB
MyRocks in MariaDBMyRocks in MariaDB
MyRocks in MariaDB
Sergey Petrunya
 
Say Hello to MyRocks
Say Hello to MyRocksSay Hello to MyRocks
Say Hello to MyRocks
Sergey Petrunya
 
Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2
Sergey Petrunya
 
MyRocks in MariaDB: why and how
MyRocks in MariaDB: why and howMyRocks in MariaDB: why and how
MyRocks in MariaDB: why and how
Sergey Petrunya
 
Эволюция репликации в MySQL и MariaDB
Эволюция репликации в MySQL и MariaDBЭволюция репликации в MySQL и MariaDB
Эволюция репликации в MySQL и MariaDB
Sergey Petrunya
 
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Sergey Petrunya
 
MariaDB 10.1 - что нового.
MariaDB 10.1 - что нового.MariaDB 10.1 - что нового.
MariaDB 10.1 - что нового.
Sergey Petrunya
 
Window functions in MariaDB 10.2
Window functions in MariaDB 10.2Window functions in MariaDB 10.2
Window functions in MariaDB 10.2
Sergey Petrunya
 
MyRocks: табличный движок для MySQL на основе RocksDB
MyRocks: табличный движок для MySQL на основе RocksDBMyRocks: табличный движок для MySQL на основе RocksDB
MyRocks: табличный движок для MySQL на основе RocksDB
Sergey Petrunya
 
New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12
Sergey Petrunya
 
MariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixesMariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixes
Sergey Petrunya
 
JSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger pictureJSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger picture
Sergey Petrunya
 
Optimizer Trace Walkthrough
Optimizer Trace WalkthroughOptimizer Trace Walkthrough
Optimizer Trace Walkthrough
Sergey Petrunya
 
ANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gemANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gem
Sergey Petrunya
 
Optimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesOptimizer features in recent releases of other databases
Optimizer features in recent releases of other databases
Sergey Petrunya
 
MariaDB 10.4 - что нового
MariaDB 10.4 - что новогоMariaDB 10.4 - что нового
MariaDB 10.4 - что нового
Sergey Petrunya
 
MariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit holeMariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit hole
Sergey Petrunya
 
Lessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkLessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmark
Sergey Petrunya
 
MyRocks in MariaDB | M18
MyRocks in MariaDB | M18MyRocks in MariaDB | M18
MyRocks in MariaDB | M18
Sergey Petrunya
 
New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3
Sergey Petrunya
 
Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2
Sergey Petrunya
 
MyRocks in MariaDB: why and how
MyRocks in MariaDB: why and howMyRocks in MariaDB: why and how
MyRocks in MariaDB: why and how
Sergey Petrunya
 
Эволюция репликации в MySQL и MariaDB
Эволюция репликации в MySQL и MariaDBЭволюция репликации в MySQL и MariaDB
Эволюция репликации в MySQL и MariaDB
Sergey Petrunya
 
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Sergey Petrunya
 
MariaDB 10.1 - что нового.
MariaDB 10.1 - что нового.MariaDB 10.1 - что нового.
MariaDB 10.1 - что нового.
Sergey Petrunya
 
Window functions in MariaDB 10.2
Window functions in MariaDB 10.2Window functions in MariaDB 10.2
Window functions in MariaDB 10.2
Sergey Petrunya
 
MyRocks: табличный движок для MySQL на основе RocksDB
MyRocks: табличный движок для MySQL на основе RocksDBMyRocks: табличный движок для MySQL на основе RocksDB
MyRocks: табличный движок для MySQL на основе RocksDB
Sergey Petrunya
 
Ad

Recently uploaded (20)

Neuralink Templateeeeeeeeeeeeeeeeeeeeeeeeee
Neuralink TemplateeeeeeeeeeeeeeeeeeeeeeeeeeNeuralink Templateeeeeeeeeeeeeeeeeeeeeeeeee
Neuralink Templateeeeeeeeeeeeeeeeeeeeeeeeee
alexandernoetzold
 
DevOps for AI: running LLMs in production with Kubernetes and KubeFlow
DevOps for AI: running LLMs in production with Kubernetes and KubeFlowDevOps for AI: running LLMs in production with Kubernetes and KubeFlow
DevOps for AI: running LLMs in production with Kubernetes and KubeFlow
Aarno Aukia
 
Async-ronizing Success at Wix - Patterns for Seamless Microservices - Devoxx ...
Async-ronizing Success at Wix - Patterns for Seamless Microservices - Devoxx ...Async-ronizing Success at Wix - Patterns for Seamless Microservices - Devoxx ...
Async-ronizing Success at Wix - Patterns for Seamless Microservices - Devoxx ...
Natan Silnitsky
 
Smart Financial Solutions: Money Lender Software, Daily Pigmy & Personal Loan...
Smart Financial Solutions: Money Lender Software, Daily Pigmy & Personal Loan...Smart Financial Solutions: Money Lender Software, Daily Pigmy & Personal Loan...
Smart Financial Solutions: Money Lender Software, Daily Pigmy & Personal Loan...
Intelli grow
 
Software Testing & it’s types (DevOps)
Software  Testing & it’s  types (DevOps)Software  Testing & it’s  types (DevOps)
Software Testing & it’s types (DevOps)
S Pranav (Deepu)
 
What is data visualization and how data visualization tool can help.pdf
What is data visualization and how data visualization tool can help.pdfWhat is data visualization and how data visualization tool can help.pdf
What is data visualization and how data visualization tool can help.pdf
Varsha Nayak
 
UPDASP a project coordination unit ......
UPDASP a project coordination unit ......UPDASP a project coordination unit ......
UPDASP a project coordination unit ......
withrj1
 
Milwaukee Marketo User Group June 2025 - Optimize and Enhance Efficiency - Sm...
Milwaukee Marketo User Group June 2025 - Optimize and Enhance Efficiency - Sm...Milwaukee Marketo User Group June 2025 - Optimize and Enhance Efficiency - Sm...
Milwaukee Marketo User Group June 2025 - Optimize and Enhance Efficiency - Sm...
BradBedford3
 
Microsoft Business-230T01A-ENU-PowerPoint_01.pptx
Microsoft Business-230T01A-ENU-PowerPoint_01.pptxMicrosoft Business-230T01A-ENU-PowerPoint_01.pptx
Microsoft Business-230T01A-ENU-PowerPoint_01.pptx
soulamaabdoulaye128
 
Wondershare PDFelement Pro 11.4.20.3548 Crack Free Download
Wondershare PDFelement Pro 11.4.20.3548 Crack Free DownloadWondershare PDFelement Pro 11.4.20.3548 Crack Free Download
Wondershare PDFelement Pro 11.4.20.3548 Crack Free Download
Puppy jhon
 
Advanced Token Development - Decentralized Innovation
Advanced Token Development - Decentralized InnovationAdvanced Token Development - Decentralized Innovation
Advanced Token Development - Decentralized Innovation
arohisinghas720
 
Plooma is a writing platform to plan, write, and shape books your way
Plooma is a writing platform to plan, write, and shape books your wayPlooma is a writing platform to plan, write, and shape books your way
Plooma is a writing platform to plan, write, and shape books your way
Plooma
 
Zoneranker’s Digital marketing solutions
Zoneranker’s Digital marketing solutionsZoneranker’s Digital marketing solutions
Zoneranker’s Digital marketing solutions
reenashriee
 
SAP PM Module Level-IV Training Complete.ppt
SAP PM Module Level-IV Training Complete.pptSAP PM Module Level-IV Training Complete.ppt
SAP PM Module Level-IV Training Complete.ppt
MuhammadShaheryar36
 
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
Alluxio, Inc.
 
Reimagining Software Development and DevOps with Agentic AI
Reimagining Software Development and DevOps with Agentic AIReimagining Software Development and DevOps with Agentic AI
Reimagining Software Development and DevOps with Agentic AI
Maxim Salnikov
 
IMAGE CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORK.P.pptx
IMAGE CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORK.P.pptxIMAGE CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORK.P.pptx
IMAGE CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORK.P.pptx
usmanch7829
 
Automated Migration of ESRI Geodatabases Using XML Control Files and FME
Automated Migration of ESRI Geodatabases Using XML Control Files and FMEAutomated Migration of ESRI Geodatabases Using XML Control Files and FME
Automated Migration of ESRI Geodatabases Using XML Control Files and FME
Safe Software
 
How to Choose the Right Web Development Agency.pdf
How to Choose the Right Web Development Agency.pdfHow to Choose the Right Web Development Agency.pdf
How to Choose the Right Web Development Agency.pdf
Creative Fosters
 
How Insurance Policy Management Software Streamlines Operations
How Insurance Policy Management Software Streamlines OperationsHow Insurance Policy Management Software Streamlines Operations
How Insurance Policy Management Software Streamlines Operations
Insurance Tech Services
 
Neuralink Templateeeeeeeeeeeeeeeeeeeeeeeeee
Neuralink TemplateeeeeeeeeeeeeeeeeeeeeeeeeeNeuralink Templateeeeeeeeeeeeeeeeeeeeeeeeee
Neuralink Templateeeeeeeeeeeeeeeeeeeeeeeeee
alexandernoetzold
 
DevOps for AI: running LLMs in production with Kubernetes and KubeFlow
DevOps for AI: running LLMs in production with Kubernetes and KubeFlowDevOps for AI: running LLMs in production with Kubernetes and KubeFlow
DevOps for AI: running LLMs in production with Kubernetes and KubeFlow
Aarno Aukia
 
Async-ronizing Success at Wix - Patterns for Seamless Microservices - Devoxx ...
Async-ronizing Success at Wix - Patterns for Seamless Microservices - Devoxx ...Async-ronizing Success at Wix - Patterns for Seamless Microservices - Devoxx ...
Async-ronizing Success at Wix - Patterns for Seamless Microservices - Devoxx ...
Natan Silnitsky
 
Smart Financial Solutions: Money Lender Software, Daily Pigmy & Personal Loan...
Smart Financial Solutions: Money Lender Software, Daily Pigmy & Personal Loan...Smart Financial Solutions: Money Lender Software, Daily Pigmy & Personal Loan...
Smart Financial Solutions: Money Lender Software, Daily Pigmy & Personal Loan...
Intelli grow
 
Software Testing & it’s types (DevOps)
Software  Testing & it’s  types (DevOps)Software  Testing & it’s  types (DevOps)
Software Testing & it’s types (DevOps)
S Pranav (Deepu)
 
What is data visualization and how data visualization tool can help.pdf
What is data visualization and how data visualization tool can help.pdfWhat is data visualization and how data visualization tool can help.pdf
What is data visualization and how data visualization tool can help.pdf
Varsha Nayak
 
UPDASP a project coordination unit ......
UPDASP a project coordination unit ......UPDASP a project coordination unit ......
UPDASP a project coordination unit ......
withrj1
 
Milwaukee Marketo User Group June 2025 - Optimize and Enhance Efficiency - Sm...
Milwaukee Marketo User Group June 2025 - Optimize and Enhance Efficiency - Sm...Milwaukee Marketo User Group June 2025 - Optimize and Enhance Efficiency - Sm...
Milwaukee Marketo User Group June 2025 - Optimize and Enhance Efficiency - Sm...
BradBedford3
 
Microsoft Business-230T01A-ENU-PowerPoint_01.pptx
Microsoft Business-230T01A-ENU-PowerPoint_01.pptxMicrosoft Business-230T01A-ENU-PowerPoint_01.pptx
Microsoft Business-230T01A-ENU-PowerPoint_01.pptx
soulamaabdoulaye128
 
Wondershare PDFelement Pro 11.4.20.3548 Crack Free Download
Wondershare PDFelement Pro 11.4.20.3548 Crack Free DownloadWondershare PDFelement Pro 11.4.20.3548 Crack Free Download
Wondershare PDFelement Pro 11.4.20.3548 Crack Free Download
Puppy jhon
 
Advanced Token Development - Decentralized Innovation
Advanced Token Development - Decentralized InnovationAdvanced Token Development - Decentralized Innovation
Advanced Token Development - Decentralized Innovation
arohisinghas720
 
Plooma is a writing platform to plan, write, and shape books your way
Plooma is a writing platform to plan, write, and shape books your wayPlooma is a writing platform to plan, write, and shape books your way
Plooma is a writing platform to plan, write, and shape books your way
Plooma
 
Zoneranker’s Digital marketing solutions
Zoneranker’s Digital marketing solutionsZoneranker’s Digital marketing solutions
Zoneranker’s Digital marketing solutions
reenashriee
 
SAP PM Module Level-IV Training Complete.ppt
SAP PM Module Level-IV Training Complete.pptSAP PM Module Level-IV Training Complete.ppt
SAP PM Module Level-IV Training Complete.ppt
MuhammadShaheryar36
 
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
Alluxio, Inc.
 
Reimagining Software Development and DevOps with Agentic AI
Reimagining Software Development and DevOps with Agentic AIReimagining Software Development and DevOps with Agentic AI
Reimagining Software Development and DevOps with Agentic AI
Maxim Salnikov
 
IMAGE CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORK.P.pptx
IMAGE CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORK.P.pptxIMAGE CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORK.P.pptx
IMAGE CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORK.P.pptx
usmanch7829
 
Automated Migration of ESRI Geodatabases Using XML Control Files and FME
Automated Migration of ESRI Geodatabases Using XML Control Files and FMEAutomated Migration of ESRI Geodatabases Using XML Control Files and FME
Automated Migration of ESRI Geodatabases Using XML Control Files and FME
Safe Software
 
How to Choose the Right Web Development Agency.pdf
How to Choose the Right Web Development Agency.pdfHow to Choose the Right Web Development Agency.pdf
How to Choose the Right Web Development Agency.pdf
Creative Fosters
 
How Insurance Policy Management Software Streamlines Operations
How Insurance Policy Management Software Streamlines OperationsHow Insurance Policy Management Software Streamlines Operations
How Insurance Policy Management Software Streamlines Operations
Insurance Tech Services
 

Histograms in MariaDB, MySQL and PostgreSQL

  • 1. Histograms in MariaDB, MySQL and PostgreSQL Sergei Petrunia, MariaDBSergei Petrunia, MariaDB Santa Clara, California | April 24th – 27th, 2017Santa Clara, California | April 24th – 27th, 2017
  • 2. 2 What this talk is about ● Data statistics histograms in – MariaDB – MySQL (status so far) – PostgreSQL ● This is not a competitive comparison – Rather, a survey
  • 3. Histograms and query optimizers Click to add textClick to add text
  • 4. 4 Query optimizer needs data statistics ● Which query plan enumerates fewer rows – orders->customers or customers->orders? ● It depends on row counts and condition selectivities ● Condition selectivity has a big impact on query speed select * from customers join orders on customer.cust_id=orders.customer_id where customers.balance<1000 and orders.total>10K
  • 5. 5 Data statistics has a big impact on optimizer ● A paper "How good are query optimizers, really?" – Leis et al, VLDB 2015 ● Conclusions section: – "In contrast to cardinality estimation, the contribution of the cost model to the overall query performance is limited." ● This matches our experience
  • 6. 6 Data statistics usage ● Need a *cheap* way to answer questions about – Numbers of rows in the table – Condition selectivity – Column widths – Number of distinct values – … ● Condition selectivity is the most challenging
  • 7. 7 Histogram as a compact data summary ● Partition the value space into buckets ● Keep an array of (bucket_bounds, n_values) – Takes O(#buckets) space
  • 8. 8 Histogram and condition selectivity col BETWEEN ‘a’ AND ‘b’ ● Sum row counts in the covered buckets ● Partially covered bucket? – Assume a fraction of rows match – This is a source of inaccuracy ● More buckets – more accurate estimates
  • 9. 9 Histogram types ● Different strategies for choosing buckets – Equi-width – Equi-height – Most Common Values – ...
  • 10. 10 Equi-width histogram ● Bucket bounds pre-defined – Equal, log-scale, etc ● Easy to understand, easy to collect. ● Not very efficient – Densely and sparsely-populated regions have the same #buckets – What if densely-populated regions had more buckets?
  • 11. 11 Equi-height histogram ● Pick the bucket bounds such that each bucket has the same #rows – Densely populated areas get more buckets – Sparsely populated get fewer buckets ● Estimation error is limited by bucket size – Which is now limited.
  • 12. 12 Most Common Values histogram ● Suitable for enum-type domains ● All possible values fit in the histogram ● Just a list of values and frequencies value1 count1 value2 count2 value3 count3 ... ...
  • 13. 13 Histogram collection algorithms ● Equi-width – Find (or guess) min and max value – For each value ● Find which histogram bin it falls into ● Increment bin’s counter ● Equi-height – Sort the values – First value starts bin #0 – Value at n_values * (1/n_bins) starts bin #2 – Value at n_values * (2/n_bins) starts bin #3 – ...
  • 14. 14 Histogram collection strategies ● Scan the whole dataset – Used by MariaDB – Produces a “perfect” histogram – May be expensive ● Do random sampling – Used by PostgreSQL (MySQL going to do it, too?) – Produces imprecise histograms – Non-deterministic results ● Incremental updates – hard to do, not used
  • 15. 15 Summary so far ● Query optimizers need condition selectivities ● These are provided by histograms ● Histograms are compact data summaries ● Histogram types – Width-balanced – Height-balanced (better) – Most-Common-Values ● Histogram collection methods – Scan the whole dataset – Do random sampling.
  • 16. Histograms in MariaDB Click to add textClick to add text
  • 17. 17 Histograms in MariaDB ● Available in MariaDB 10.0 – (Stable since March, 2014) ● Used in the real world ● Good for common use cases – has some limitations ● Sometimes are called “Engine-Independent Table Statistics” – Although being engine-independent is not the primary point.
  • 18. 18 Histogram storage in MariaDB ● Are stored in mysql.column_stats table CREATE TABLE mysql.column_stats ( db_name varchar(64) NOT NULL, table_name varchar(64) NOT NULL, column_name varchar(64) NOT NULL, min_value varbinary(255) DEFAULT NULL, max_value varbinary(255) DEFAULT NULL, nulls_ratio decimal(12,4) DEFAULT NULL, avg_length decimal(12,4) DEFAULT NULL, avg_frequency decimal(12,4) DEFAULT NULL, hist_size tinyint unsigned, hist_type enum('SINGLE_PREC_HB','DOUBLE_PREC_HB'), histogram varbinary(255), PRIMARY KEY (db_name,table_name,column_name) ); ● Very compact: max 255 bytes (per column)
  • 19. 19 Collecting a histogram set histogram_size=255; set histogram_type='DOUBLE_PREC_HB'; analyze table tbl persistent for all; analyze table tbl persistent for columns (col1, col2) indexes (); +----------+---------+----------+-----------------------------------------+ | Table | Op | Msg_type | Msg_text | +----------+---------+----------+-----------------------------------------+ | test.tbl | analyze | status | Engine-independent statistics collected | | test.tbl | analyze | status | OK | +----------+---------+----------+-----------------------------------------+ ● Manual collection only set use_stat_tables='preferably'; set optimizer_use_condition_selectivity=4; <query>; ● Make the optimizer use it
  • 20. 20 Examining a histogram select * from mysql.column_stats where table_name='pop1980_cp' and column_name='firstname' *************************** 1. row *************************** db_name: babynames table_name: pop1980_cp column_name: firstname min_value: Aaliyah max_value: Zvi nulls_ratio: 0.0000 avg_length: 6.0551 avg_frequency: 194.4642 hist_size: 32 hist_type: DOUBLE_PREC_HB histogram: � ���C)�G�[j��fzz�z]����3� select decode_histogram(hist_type,histogram) from mysql.column_stats where table_name='pop1980_cp' and column_name='firstname' *************************** 1. row *************************** decode_histogram(hist_type,histogram): 0.00201,0.04048,0.03833,0.03877,0.04158,0.11852,0.07912,0.00218,0.00093,0.03940, 0.07710,0.00124,0.08035,0.11992,0.03877,0.03989,0.24140
  • 21. 21 Histograms in MariaDB - summary ● Available since MariaDB 10.0 ● Special ANALYZE command to collect stats – Does a full table scan – May require a lot of space for big VARCHARs: MDEV-6529 “EITS ANALYZE uses disk space inefficiently for VARCHAR columns” ● Not used by the optimizer by default – Special settings to get optimizer to use them.
  • 22. Histograms in PostgreSQL Click to add textClick to add text
  • 23. 23 Histograms in PostgreSQL ● Data statistics – Fraction of NULL-values – Most common value (MCV) list – Height-balanced histogram (excludes MCV values) – A few other parameters ● avg_length ● n_distinct_values ● ... ● Collection algorithm – One-pass random sampling
  • 24. 24 Collecting histograms in PostgreSQL -- Global parameter specifying number of buckets -- the default is 100 set default_statistics_target=N; -- Can also override for specific columns alter table tbl alter column_name set statistics N; -- Collect the statistics analyze tablename; # number of inserted/updated/deleted tuples to trigger an ANALYZE autovacuum_analyze_threshold = N # fraction of the table size to add to autovacuum_analyze_threshold # when deciding whether to trigger ANALYZE autovacuum_analyze_scale_factor=N.N postgresql.conf, or per-table
  • 25. 25 Examining the histogram select * from pg_stats where tablename='pop1980'; tablename | pop1980 attname | firstname inherited | f null_frac | 0 avg_width | 7 n_distinct | 9320 most_common_vals | {Michael,Jennifer,Christopher,Jason,David,James, Matthew,John,Joshua,Amanda} most_common_freqs | {0.0201067,0.0172667,0.0149067,0.0139,0.0124533, 0.01164,0.0109667,0.0107133,0.0106067,0.01028} histogram_bounds | {Aaliyah,Belinda,Christine,Elsie,Jaron,Kamia, Lindsay,Natasha,Robin,Steven,Zuriel} correlation | 0.0066454 most_common_elems |
  • 26. 26 Histograms are collected by doing sampling ● src/backend/commands/analyze.c, std_typanalyze() refers to ● "Random Sampling for Histogram Construction: How much is enough?” – Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya, ACM SIGMOD, 1998. Histogram size Rows in table (=10^6) Max relative error in bin (=0.5) Error probability (=0.01) Random sample size ● 100 buckets = 30,000 rows sample
  • 27. 27 Histogram sampling in PostgreSQL ● 30K rows are sampled from random locations in the table – Does a skip scan forward – “Randomly chosen rows in randomly chosen blocks” ● Choice of Most Common Values – Sample values that are 25% more common than average – Values that would take more than one histogram bucket. – All seen values are MCVs? No histogram is built.
  • 28. 28 Beyond single-column histograms ● Conditions can be correlated select ... from order_items where shipdate='2015-12-15' AND item_name='christmas light' 'swimsuit' ● Correlation can have a big effect – MIN(1/n, 1/m) – (1/n) * (1/m) – 0 ● Multi-column “histograms” are hard ● “Possible PostgreSQL 10.0 feature: multivariate statistics”
  • 29. 29 PostgreSQL: Conclusions ● Collects and uses both – Height-balanced histogram – Most Common Values list ● Uses sampling for collection ● Can run ANALYZE yourself – Or VACUUM will do it automatically ● Multivariate stats are in the plans
  • 30. 30 Histogram test - MariaDB ● Real world data, people born in 1980 MariaDB [babynames]> analyze select count(*) from pop1980 where firstname='Jennifer'; +------+-------------+---------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+-------------+---------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+ | 1 | SIMPLE | pop1980 | ALL | NULL | NULL | NULL | NULL | 3444156 | 3444156.00 | 4.69 | 1.70 | Using where | +------+-------------+---------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+ MariaDB [babynames]> analyze select count(*) from pop1980 where firstname='Allison'; +------+-------------+---------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+-------------+---------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+ | 1 | SIMPLE | pop1980 | ALL | NULL | NULL | NULL | NULL | 3444156 | 3444156.00 | 2.89 | 0.14 | Using where | +------+-------------+---------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+ MariaDB [babynames]> analyze select count(*) from pop1980 where firstname='Jennice'; +------+-------------+---------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+-------------+---------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+ | 1 | SIMPLE | pop1980 | ALL | NULL | NULL | NULL | NULL | 3444156 | 3444156.00 | 4.69 | 0.00 | Using where | +------+-------------+---------+------+---------------+------+---------+------+---------+------------+----------+------------+-------------+ Jennifer 58,381 Allison, 4,868 Jennice, 7 2.75x 20x ?x
  • 31. 31 Histogram test - PostgreSQL ● Real world data, people born in 1980 Jennifer 58,381 Allison, 4,868 Jennice, 7 test=# explain analyze select count(*) from pop1980 where firstname='Jennifer'; QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Aggregate (cost=68456.71..68456.71 rows=1 width=8) (actual time=372.593..372.593 rows=1 loops=1) -> Seq Scan on pop1980 (cost=0.00..68312.62 rows=57632 width=0) (actual time=0.288..366.058 rows=58591 loops=1) Filter: ((firstname)::text = 'Jennifer'::text) Rows Removed by Filter: 3385539 Planning time: 0.098 ms Execution time: 372.625 ms test=# explain analyze select count(*) from pop1980 where firstname='Allison'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Aggregate (cost=68313.66..68313.67 rows=1 width=8) (actual time=372.415..372.415 rows=1 loops=1) -> Seq Scan on pop1980 (cost=0.00..68312.62 rows=413 width=0) (actual time=119.238..372.023 rows=4896 loops=1) Filter: ((firstname)::text = 'Allison'::text) Rows Removed by Filter: 3439234 Planning time: 0.086 ms Execution time: 372.447 ms test=# explain analyze select count(*) from pop1980 where firstname='Jennice'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Aggregate (cost=68313.66..68313.67 rows=1 width=8) (actual time=345.966..345.966 rows=1 loops=1) -> Seq Scan on pop1980 (cost=0.00..68312.62 rows=413 width=0) (actual time=190.896..345.961 rows=7 loops=1) Filter: ((firstname)::text = 'Jennice'::text) Rows Removed by Filter: 3444123 Planning time: 0.388 ms Execution time: 346.010 ms 0.9x 0.08x 103x
  • 32. Histograms in MySQL Click to add textClick to add text
  • 33. 33 Histograms in MySQL ● Not available for use in MySQL 8.0.1 ● There are pieces of histogram code, still – This gives some clues ● Another feature that uses histograms: P_S statement latencies – P_S.events_statements_histogram_global P_S.events_statements_histogram_by_digest – These are totally different kind of histogram ● Buckets are log-scale equi-width.
  • 34. 34 Sampling ● Currently only has a default implementation only – Which does a full table scan and “rolls the dice” for each row ● Assume there will be an InnoDB implementation enum class enum_sampling_method { SYSTEM }; class handler { ... int ha_sample_init(double sampling_percentage, int sampling_seed, enum_sampling_method sampling_method); int ha_sample_next(uchar *buf); int ha_sample_end(); ● New methods for storage engine API
  • 35. 35 Histogram storage ● Will be stored in mysql.column_stats table CREATE TABLE mysql.column_stats ( database_name varchar(64) COLLATE utf8_bin NOT NULL, table_name varchar(64) COLLATE utf8_bin NOT NULL, column_name varchar(64) COLLATE utf8_bin NOT NULL, histogram json NOT NULL, PRIMARY KEY (database_name,table_name,column_name) ); ● Will be stored as JSON – No limits on size?
  • 36. 36 “Singleton” histograms ● This is what PostgreSQL calls “Most Common Values” { "last-updated": "2015-11-04 15:19:51.000000", "histogram-type": "singleton", "null-values": 0.1, // Fraction of NULL values "buckets": [ [ 42, // Value, data type depends on the source column. 0.001978728666831561 // "Cumulative" frequency ], … ] }
  • 37. 37 Height-balanced histograms { "last-updated": "2015-11-04 15:19:51.000000", "histogram-type": "equi-height", "null-values": 0.1, // Fraction of NULL values "buckets": [ [ "bar", // Lower inclusive value "foo", // Upper inclusive value 0.001978728666831561, // Cumulative frequency 10 // Number of distinct values in this bucket ], ... ] }
  • 38. 38 Height-balanced histograms ... "buckets": [ [ "bar", // Lower inclusive value "foo", // Upper inclusive value 0.001978728666831561, // Cumulative frequency 10 // Number of distinct values in this bucket ], ... ] } ● Why “upper inclusive value”? To support holes? At cost of 2x histogram size? ● Why frequency in each bucket? it’s equi-height, so frequencies should be the same? ● Per-bucket #distinct is interesting but doesn’t seem high-demand.
  • 39. 39 Histograms ● “Singleton” ● Height-balanced ● Both kinds store nulls_fraction Fraction of NULLs is stored – In both kind of histograms so you can’t have both at the same time? ● Height-balanced allow for “gaps” ● Each bucket has #distinct (non-optional?)
  • 40. 40 MySQL histograms summary ● Seem to be coming in MySQL 8.0 ● Support two types – “Singleton” – “Height-balanced” ● Both kinds store null-values so they are not used together? ● “Height-balanced” – May have “holes”? – Stores “frequency” for each bin (?) ● Collection will probably use sampling – Which has only full scan implementation ATM
  • 41. Conclusions Click to add textClick to add text
  • 42. 42 Conclusions ● Histograms are compact data summaries for use by the optimizer ● PostgreSQL – Has a mature implementation – Uses sampling and auto-collection ● MariaDB – Supports histograms since MariaDB 10.0 ● Compact ● Height-balanced only – Need to run ANALYZE manually and set the optimizer to use them ● MySQL – Don’t have histograms, still. – Preparing to have them in 8.0 – Will support two kinds ● Most common values ● Height-balanced “with gaps” (?)