SlideShare a Scribd company logo
MySQL Indexing
Best Practices for MySQL 5.6
Peter Zaitsev
CEO, Percona
MySQL Connect
Sep 22, 2013
San Francisco,CA
For those who Does not Know Us…
• Percona – Helping Businesses to be Successful
with MySQL
– Support, Consulting, RemoteDBA, Training

• Creators of Open Source Software for MySQL
– Percona Server, Percona XtraBackup, Percona
Toolkit, Percona XtraDB Cluster

• MySQL Ecosystem Educators
– MySQLPerformanceBlog, Books, Webinars, Meetu
ps, Conferences
www.percona.com
You’ve Made a Great Choice !
• Understanding indexing is crucial both for
Developers and DBAs
• Poor index choices are responsible for large
portion of production problems
• Indexing is not a rocket science

www.percona.com
MySQL Indexing: Agenda
• Understanding Indexing
• Setting up best indexes for your applications
• Working around common MySQL limitations

www.percona.com
Indexing in the Nutshell
• What are indexes for ?
– Speed up access in the database
– Help to enforce constraints (UNIQUE, FOREIGN
KEY)
– Queries can be ran without any indexes
• But it can take a really long time

www.percona.com
Types of Indexes you might heard about
• BTREE Indexes
– Majority of indexes you deal in MySQL is this type

• RTREE Indexes
– MyISAM only, for GIS

• HASH Indexes
– MEMORY, NDB

• FULLTEXT Indexes
– MyISAM, Innodb starting 5.6
www.percona.com
Family of BTREE like Indexes
• A lot of different implementations
– Share same properties in what operations they can
speed up
– Memory vs Disk is life changer

• B+ Trees are typically used for Disk storage
– Data stored in leaf nodes

www.percona.com
B+Tree Example

Branch/Root Node

Less than 3

Data Pointers

Leaf Node

www.percona.com
Indexes in MyISAM vs Innodb
• In MyISAM data pointers point to physical
offset in the data file
– All indexes are essentially equivalent

• In Innodb
– PRIMARY KEY (Explicit or Implicit) - stores data in
the leaf pages of the index, not pointer
– Secondary Indexes – store primary key as data
pointer
www.percona.com
What Operations can BTREE Index do ?
•
•
•
•

Find all rows with KEY=5 (point lookup)
Find all rows with KEY>5 (open range)
Find all rows with 5<KEY<10 (closed range)
NOT find all rows with last digit of the KEY is
Zero
– This can’t be defined as a “range” operation

www.percona.com
String Indexes
• There is no difference… really
– Sort order is defined for strings (collation)
• “AAAA” < “AAAB”

• Prefix LIKE is a special type of Range
– LIKE “ABC%” means
• “ABC*LOWEST+”<KEY<“ABC*HIGHEST+”

– LIKE “%ABC” can’t be optimized by use of the
index
www.percona.com
Multiple Column Indexes
• Sort Order is defined, comparing leading
column, then second etc
– KEY(col1,col2,col3)
– (1,2,3) < (1,3,1)

• It is still one BTREE Index; not a separate BTREE
index for each level

www.percona.com
Overhead of The Indexing
• Indexes are costly; Do not add more than you
need
– In most cases extending index is better than
adding new one

• Writes - Updating indexes is often major cost
of database writes
• Reads - Wasted space on disk and in memory;
additional overhead during query optimization
www.percona.com
Impact on Cost of Indexing
• Long PRIMARY KEY for Innodb
– Make all Secondary keys longer and slower

• “Random” PRIMARY KEY for Innodb
– Insertion causes a lot of page splits

• Longer indexes are generally slower
• Index with insertion in random order
– SHA1(‘password’)

• Low selectivity index cheap for insert
– Index on gender

• Correlated indexes are less expensive
– insert_time is correlated with auto_increment id
www.percona.com
Indexing Innodb Tables
• Data is clustered by Primary Key
– Pick PRIMARY KEY what suites you best
– For comments – (POST_ID,COMMENT_ID) can be
good PRIMARY KEY storing all comments for single
post close together
• Alternatively “pack” to single BIGINT

• PRIMARY KEY is implicitly appended to all indexes
– KEY (A) is really KEY (A,ID) internally
– Useful for sorting, Covering Index.
www.percona.com
How MySQL Uses Indexes
•
•
•
•

Data Lookups
Sorting
Avoiding reading “data”
Special Optimizations

www.percona.com
Using Indexes for Data Lookups
• SELECT * FROM EMPLOYEES WHERE
LAST_NAME=“Smith”
– The classical use of index on (LAST_NAME)

• Can use Multiple column indexes
– SELECT * FROM EMPLOYEES WHERE
LAST_NAME=“Smith” AND DEPT=“Accounting”
– Will use index on (DEPT,LAST_NAME)

www.percona.com
It Gets Tricky With Multiple Columns
• Index (A,B,C) - order of columns matters
• Will use Index for lookup (all listed keyparts)
–
–
–
–

A>5
A=5 AND B>6
A=5 AND B=6 AND C=7
A=5 AND B IN (2,3) AND C>5

• Will NOT use Index
– B>5 – Leading column is not referenced
– B=6 AND C=7 - Leading column is not referenced

• Will use Part of the index
– A>5 AND B=2 - range on first column; only use this key part
– A=5 AND B>6 AND C=2 - range on second column, use 2 parts
www.percona.com
The First Rule of MySQL Optimizer
• MySQL will stop using key parts in multi part
index as soon as it met the real range (<,>,
BETWEEN), it however is able to continue
using key parts further to the right if IN(…)
range is used

www.percona.com
Using Index for Sorting
• SELECT * FROM PLAYERS ORDER BY SCORE
DESC LIMIT 10
– Will use index on SCORE column
– Without index MySQL will do “filesort” (external
sort) which is very expensive

• Often Combined with using Index for lookup
– SELECT * FROM PLAYERS WHERE COUNTRY=“US”
ORDER BY SCORE DESC LIMIT 10
• Best served by Index on (COUNTRY,SCORE)

www.percona.com
Multi Column indexes for efficient sorting
• It becomes even more restricted!
• KEY(A,B)
• Will use Index for Sorting
–
–
–
–

ORDER BY A
- sorting by leading column
A=5 ORDER BY B - EQ filtering by 1st and sorting by 2nd
ORDER BY A DESC, B DESC - Sorting by 2 columns in same order
A>5 ORDER BY A - Range on the column, sorting on the same

• Will NOT use Index for Sorting
–
–
–
–

ORDER BY B - Sorting by second column in the index
A>5 ORDER BY B – Range on first column, sorting by second
A IN(1,2) ORDER BY B - In-Range on first column
ORDER BY A ASC, B DESC - Sorting in the different order
www.percona.com
MySQL Using Index for Sorting Rules
• You can’t sort in different order by 2 columns
• You can only have Equality comparison (=) for
columns which are not part of ORDER BY
– Not even IN() works in this case

www.percona.com
Avoiding Reading The data
• “Covering Index”
– Applies to index use for specific query, not type of
index.

• Reading Index ONLY and not accessing the “data”
• SELECT STATUS FROM ORDERS WHERE
CUSTOMER_ID=123
– KEY(CUSTOMER_ID,STATUS)

• Index is typically smaller than data
• Access is a lot more sequential
– Access through data pointers is often quite “random”
www.percona.com
Min/Max Optimizations
• Index help MIN()/MAX() aggregate functions
– But only these

• SELECT MAX(ID) FROM TBL;
• SELECT MAX(SALARY) FROM EMPLOYEE
GROUP BY DEPT_ID
– Will benefit from (DEPT_ID,SALARY) index
– “Using index for group-by”

www.percona.com
Indexes and Joins
• MySQL Performs Joins as “Nested Loops”
– SELECT * FROM POSTS,COMMENTS WHERE
AUTHOR=“Peter” AND COMMENTS.POST_ID=POSTS.ID
• Scan table POSTS finding all posts which have Peter as an Author
• For every such post go to COMMENTS table to fetch all comments

• Very important to have all JOINs Indexed
• Index is only needed on table which is being looked up
– The index on POSTS.ID is not needed for this query
performance

• Re-Design JOIN queries which can’t be well indexed

www.percona.com
Using Multiple Indexes for the table
• MySQL Can use More than one index
– “Index Merge”

• SELECT * FROM TBL WHERE A=5 AND B=6
– Can often use Indexes on (A) and (B) separately
– Index on (A,B) is much better

• SELECT * FROM TBL WHERE A=5 OR B=6
– 2 separate indexes is as good as it gets
– Index (A,B) can’t be used for this query
www.percona.com
Prefix Indexes
• You can build Index on the leftmost prefix of
the column
– ALTER TABLE TITLE ADD KEY(TITLE(20));
– Needed to index BLOB/TEXT columns
– Can be significantly smaller
– Can’t be used as covering index
– Choosing prefix length becomes the question

www.percona.com
Choosing Prefix Length
• Prefix should be “Selective enough”
– Check number of distinct prefixes vs number of
total distinct values
mysql> select count(distinct(title))
total, count(distinct(left(title,10)))
p10, count(distinct(left(title,20))) p20 from title;
+--------+--------+--------+
| total | p10
| p20
|
+--------+--------+--------+
| 998335 | 624949 | 960894 |
+--------+--------+--------+
1 row in set (44.19 sec)

www.percona.com
Choosing Prefix Length
• Check for Outliers
– Ensure there are not too many rows sharing the
same prefix
Most common Titles
mysql> select count(*) cnt, title tl
from title group by tl order by cnt desc
limit 3;
+-----+-----------------+
| cnt | tl
|
+-----+-----------------+
| 136 | The Wedding
|
| 129 | Lost and Found |
| 112 | Horror Marathon |
+-----+-----------------+
3 rows in set (27.49 sec)

Most Common Title Prefixes
mysql> select count(*) cnt, left(title,20) tl
from title group by tl order by cnt desc
limit 3;
+-----+----------------------+
| cnt | tl
|
+-----+----------------------+
| 184 | Wetten, dass..? aus |
| 136 | The Wedding
|
| 129 | Lost and Found
|
+-----+----------------------+
3 rows in set (33.23 sec)

www.percona.com
What is new with MySQL 5.6 ?
• Many Optimizer improvements
– Most of them will make your queries better
automatically
– join_buffer_size variable has whole new meaning
• Values if 32MB+ can make sense

• Focus on Index Design Practices for this
presentation
– Most important one: ICP (Index Condition
Pushdown)
www.percona.com
Understanding ICP
• Push where clause “Conditions” for Storage
engine to filter
– Think name like “%ill%” (will not convert to range)

• “Much more flexible covering Index”
– Plus filtering done on the engine level – efficient

• Before MySQL 5.5
– All or none. All is resolved through the index or
“row” is read if within range
www.percona.com
ICP Examples
• SELECT A … WHERE B=2 AND C LIKE “%ill%’
– MySQL 5.5 and below
• Index (B) – traditional. Using index for range only
• Index (B,C,A) - covering. All involved columns included

– MySQL 5.6
• Index (B,C)
– Range access by B; Filter clause on C only read full row if match

• More cases
– SELECT * …
– WHERE A=5 and C=6 ; Index (A,B,C)
• Will scan all index entries with A=5 not all rows

www.percona.com
How MySQL Picks which Index to Use ?
• Performs dynamic picking for every query
execution
– The constants in query texts matter a lot

• Estimates number of rows it needs to access
for given index by doing “dive” in the table
• Uses “Cardinality” statistics if impossible
– This is what ANALYZE TABLE updates

www.percona.com
More on Picking the Index
• Not Just minimizing number of scanned rows
• Lots of other heuristics and hacks
–
–
–
–

PRIMARY Key is special for Innodb
Covering Index benefits
Full table scan is faster, all being equal
Can we also use index for Sorting

• Things to know
– Verify plan MySQL is actually using
– Note it can change dynamically based on constants
and data
www.percona.com
Use EXPLAIN
• EXPLAIN is a great tool to see how MySQL
plans to execute the query
– http://dev.mysql.com/doc/refman/5.6/en/usingexplain.html
– Remember real execution might be different
mysql> explain select max(season_nr) from title group by production_year;
+----+-------------+-------+-------+---------------+-----------------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key
| key_len | ref | rows | Extra
|
+----+-------------+-------+-------+---------------+-----------------+---------+------+------+--------------------------+
| 1 | SIMPLE
| title | range | NULL
| production_year | 5
| NULL | 201 | Using index for group-by |
+----+-------------+-------+-------+---------------+-----------------+---------+------+------+--------------------------+
1 row in set (0.01 sec)

www.percona.com
Indexing Strategy
• Build indexes for set of your performance critical
queries
– Look at them together not just one by one

• Best if all WHERE clause and JOIN clauses are
using indexes for lookups
– At least most selective parts are

• Generally extend index if you can, instead of
creating new indexes
• Validate performance impact as you’re doing
changes
www.percona.com
Indexing Strategy Example
• Build Index order which benefits more queries
– SELECT * FROM TBL WHERE A=5 AND B=6
– SELECT * FROM TBL WHERE A>5 AND B=6
– KEY (B,A) Is better for such query mix

• All being equal put more selective key part first
• Do not add indexes for non performance
critical queries
– Many indexes slow system down
www.percona.com
Trick #1: Enumerating Ranges
• KEY (A,B)
• SELECT * FROM TBL WHERE A BETWEEN 2
AND 4 AND B=5
– Will only use first key part of the index

• SELECT * FROM TBL WHERE A IN (2,3,4) AND
B=5
– Will use both key parts

www.percona.com
Trick #2: Adding Fake Filter
• KEY (GENDER,CITY)
• SELECT * FROM PEOPLE WHERE CITY=“NEW
YORK”
– Will not be able to use the index at all

• SELECT * FROM PEOPLE WHERE GENDER IN
(“M”,”F”) AND CITY=“NEW YORK”
– Will be able to use the index

• The trick works best with low selectivity columns.
– Gender, Status, Boolean Types etc
www.percona.com
Trick #3: Unionizing Filesort
• KEY(A,B)
• SELECT * FROM TBL WHERE A IN (1,2) ORDER BY
B LIMIT 5;
– Will not be able to use index for SORTING

• (SELECT * FROM TBL WHERE A=1 ORDER BY B
LIMIT 5) UNION ALL (SELECT * FROM TBL WHERE
A=2 ORDER BY B LIMIT 5) ORDER BY B LIMIT 5;
– Will use the index for Sorting. “filesort” will be needed
only to sort over 10 rows.
www.percona.com
Thank You !
• pz@percona.com
• http://www.percona.com
• @percona at Twitter
• http://www.facebook.com/Percona

www.percona.com

More Related Content

PDF
How to Design Indexes, Really
PDF
Advanced MySQL Query Tuning
PDF
MySQL Query And Index Tuning
PDF
MySQL Index Cookbook
PDF
The MySQL Query Optimizer Explained Through Optimizer Trace
PDF
MySQL Performance Tuning: Top 10 Tips
PDF
MySQL: Indexing for Better Performance
PDF
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
How to Design Indexes, Really
Advanced MySQL Query Tuning
MySQL Query And Index Tuning
MySQL Index Cookbook
The MySQL Query Optimizer Explained Through Optimizer Trace
MySQL Performance Tuning: Top 10 Tips
MySQL: Indexing for Better Performance
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013

What's hot (20)

PDF
More mastering the art of indexing
PDF
What is new in PostgreSQL 14?
PDF
How to Use JSON in MySQL Wrong
PDF
How to Analyze and Tune MySQL Queries for Better Performance
PDF
Sql query patterns, optimized
PDF
PostgreSQL Performance Tuning
PPTX
Indexing the MySQL Index: Key to performance tuning
PDF
MySQL Indexing : Improving Query Performance Using Index (Covering Index)
PDF
PostgreSQL WAL for DBAs
PDF
Recursive Query Throwdown
PDF
Oracle Performance Tuning Fundamentals
PDF
Deep dive into PostgreSQL statistics.
PPTX
PDF
Postgresql database administration volume 1
PPT
Introduction to redis
PDF
The InnoDB Storage Engine for MySQL
PDF
Full Text Search In PostgreSQL
PDF
InnoDB Internal
PDF
Using Optimizer Hints to Improve MySQL Query Performance
PDF
Mysql Explain Explained
More mastering the art of indexing
What is new in PostgreSQL 14?
How to Use JSON in MySQL Wrong
How to Analyze and Tune MySQL Queries for Better Performance
Sql query patterns, optimized
PostgreSQL Performance Tuning
Indexing the MySQL Index: Key to performance tuning
MySQL Indexing : Improving Query Performance Using Index (Covering Index)
PostgreSQL WAL for DBAs
Recursive Query Throwdown
Oracle Performance Tuning Fundamentals
Deep dive into PostgreSQL statistics.
Postgresql database administration volume 1
Introduction to redis
The InnoDB Storage Engine for MySQL
Full Text Search In PostgreSQL
InnoDB Internal
Using Optimizer Hints to Improve MySQL Query Performance
Mysql Explain Explained
Ad

Viewers also liked (20)

PPTX
MySQL Performance Tips & Best Practices
PPT
Explain that explain
PPTX
Optimizing MySQL Queries
PDF
Need for Speed: MySQL Indexing
PPTX
Oracle golden gate 12c New Features
PDF
Advanced Query Optimizer Tuning and Analysis
PDF
BITS: Introduction to MySQL - Introduction and Installation
PDF
MHA (MySQL High Availability): Getting started & moving past quirks
PDF
1 data types
PDF
MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen
PDF
MySQL for Large Scale Social Games
PDF
3 indexes
PDF
KDC to Kaijeliay....
PPT
Database indexing framework
PDF
ISUCONの話(夏期講習2014)
PPTX
Install oracle binaris or clonse oracle home
PPT
SphinxSearch
PPTX
Fusion-io and MySQL at Craigslist
PDF
Managing Big Data with MySQL
MySQL Performance Tips & Best Practices
Explain that explain
Optimizing MySQL Queries
Need for Speed: MySQL Indexing
Oracle golden gate 12c New Features
Advanced Query Optimizer Tuning and Analysis
BITS: Introduction to MySQL - Introduction and Installation
MHA (MySQL High Availability): Getting started & moving past quirks
1 data types
MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen
MySQL for Large Scale Social Games
3 indexes
KDC to Kaijeliay....
Database indexing framework
ISUCONの話(夏期講習2014)
Install oracle binaris or clonse oracle home
SphinxSearch
Fusion-io and MySQL at Craigslist
Managing Big Data with MySQL
Ad

Similar to MySQL Indexing - Best practices for MySQL 5.6 (20)

PPTX
Работа с индексами - лучшие практики для MySQL 5.6, Петр Зайцев (Percona)
PDF
MYSQL Query Anti-Patterns That Can Be Moved to Sphinx
PDF
Statistics and Indexes Internals
PDF
Scaling MySQL Strategies for Developers
PDF
Webinar 2013 advanced_query_tuning
PDF
Query Optimization with MySQL 5.6: Old and New Tricks
PDF
MySQL Indexing
PDF
Sphinx new
PPTX
New T-SQL Features in SQL Server 2012
PDF
PostgreSQL 9.0 & The Future
PDF
Introduction to MySQL Query Tuning for Dev[Op]s
PDF
Mysql query optimization
PDF
Ten Reasons Why You Should Prefer PostgreSQL to MySQL
PPTX
PostgreSQL - It's kind've a nifty database
PDF
High Performance Rails with MySQL
PDF
Building better SQL Server Databases
PDF
MySQL for beginners
PDF
10 Reasons to Start Your Analytics Project with PostgreSQL
KEY
PostgreSQL
PDF
What is MariaDB Server 10.3?
Работа с индексами - лучшие практики для MySQL 5.6, Петр Зайцев (Percona)
MYSQL Query Anti-Patterns That Can Be Moved to Sphinx
Statistics and Indexes Internals
Scaling MySQL Strategies for Developers
Webinar 2013 advanced_query_tuning
Query Optimization with MySQL 5.6: Old and New Tricks
MySQL Indexing
Sphinx new
New T-SQL Features in SQL Server 2012
PostgreSQL 9.0 & The Future
Introduction to MySQL Query Tuning for Dev[Op]s
Mysql query optimization
Ten Reasons Why You Should Prefer PostgreSQL to MySQL
PostgreSQL - It's kind've a nifty database
High Performance Rails with MySQL
Building better SQL Server Databases
MySQL for beginners
10 Reasons to Start Your Analytics Project with PostgreSQL
PostgreSQL
What is MariaDB Server 10.3?

More from MYXPLAIN (14)

PDF
Advanced MySQL Query and Schema Tuning
PDF
Are You Getting the Best of your MySQL Indexes
PDF
How to Design Indexes, Really
PDF
MySQL 5.6 Performance
PDF
56 Query Optimization
PDF
Tools and Techniques for Index Design
PDF
Powerful Explain in MySQL 5.6
PDF
Optimizing Queries with Explain
PDF
The Power of MySQL Explain
PDF
Improving Performance with Better Indexes
PDF
Explaining the MySQL Explain
PDF
Covering indexes
PDF
MySQL Optimizer Overview
PDF
Advanced query optimization
Advanced MySQL Query and Schema Tuning
Are You Getting the Best of your MySQL Indexes
How to Design Indexes, Really
MySQL 5.6 Performance
56 Query Optimization
Tools and Techniques for Index Design
Powerful Explain in MySQL 5.6
Optimizing Queries with Explain
The Power of MySQL Explain
Improving Performance with Better Indexes
Explaining the MySQL Explain
Covering indexes
MySQL Optimizer Overview
Advanced query optimization

Recently uploaded (20)

PDF
Empathic Computing: Creating Shared Understanding
PDF
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
PDF
Transforming Manufacturing operations through Intelligent Integrations
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Chapter 2 Digital Image Fundamentals.pdf
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
CIFDAQ's Market Wrap: Ethereum Leads, Bitcoin Lags, Institutions Shift
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Modernizing your data center with Dell and AMD
PDF
Electronic commerce courselecture one. Pdf
Empathic Computing: Creating Shared Understanding
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Advanced methodologies resolving dimensionality complications for autism neur...
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
Transforming Manufacturing operations through Intelligent Integrations
Spectral efficient network and resource selection model in 5G networks
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Chapter 2 Digital Image Fundamentals.pdf
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
20250228 LYD VKU AI Blended-Learning.pptx
MYSQL Presentation for SQL database connectivity
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
Review of recent advances in non-invasive hemoglobin estimation
Per capita expenditure prediction using model stacking based on satellite ima...
CIFDAQ's Market Wrap: Ethereum Leads, Bitcoin Lags, Institutions Shift
“AI and Expert System Decision Support & Business Intelligence Systems”
Modernizing your data center with Dell and AMD
Electronic commerce courselecture one. Pdf

MySQL Indexing - Best practices for MySQL 5.6

  • 1. MySQL Indexing Best Practices for MySQL 5.6 Peter Zaitsev CEO, Percona MySQL Connect Sep 22, 2013 San Francisco,CA
  • 2. For those who Does not Know Us… • Percona – Helping Businesses to be Successful with MySQL – Support, Consulting, RemoteDBA, Training • Creators of Open Source Software for MySQL – Percona Server, Percona XtraBackup, Percona Toolkit, Percona XtraDB Cluster • MySQL Ecosystem Educators – MySQLPerformanceBlog, Books, Webinars, Meetu ps, Conferences www.percona.com
  • 3. You’ve Made a Great Choice ! • Understanding indexing is crucial both for Developers and DBAs • Poor index choices are responsible for large portion of production problems • Indexing is not a rocket science www.percona.com
  • 4. MySQL Indexing: Agenda • Understanding Indexing • Setting up best indexes for your applications • Working around common MySQL limitations www.percona.com
  • 5. Indexing in the Nutshell • What are indexes for ? – Speed up access in the database – Help to enforce constraints (UNIQUE, FOREIGN KEY) – Queries can be ran without any indexes • But it can take a really long time www.percona.com
  • 6. Types of Indexes you might heard about • BTREE Indexes – Majority of indexes you deal in MySQL is this type • RTREE Indexes – MyISAM only, for GIS • HASH Indexes – MEMORY, NDB • FULLTEXT Indexes – MyISAM, Innodb starting 5.6 www.percona.com
  • 7. Family of BTREE like Indexes • A lot of different implementations – Share same properties in what operations they can speed up – Memory vs Disk is life changer • B+ Trees are typically used for Disk storage – Data stored in leaf nodes www.percona.com
  • 8. B+Tree Example Branch/Root Node Less than 3 Data Pointers Leaf Node www.percona.com
  • 9. Indexes in MyISAM vs Innodb • In MyISAM data pointers point to physical offset in the data file – All indexes are essentially equivalent • In Innodb – PRIMARY KEY (Explicit or Implicit) - stores data in the leaf pages of the index, not pointer – Secondary Indexes – store primary key as data pointer www.percona.com
  • 10. What Operations can BTREE Index do ? • • • • Find all rows with KEY=5 (point lookup) Find all rows with KEY>5 (open range) Find all rows with 5<KEY<10 (closed range) NOT find all rows with last digit of the KEY is Zero – This can’t be defined as a “range” operation www.percona.com
  • 11. String Indexes • There is no difference… really – Sort order is defined for strings (collation) • “AAAA” < “AAAB” • Prefix LIKE is a special type of Range – LIKE “ABC%” means • “ABC*LOWEST+”<KEY<“ABC*HIGHEST+” – LIKE “%ABC” can’t be optimized by use of the index www.percona.com
  • 12. Multiple Column Indexes • Sort Order is defined, comparing leading column, then second etc – KEY(col1,col2,col3) – (1,2,3) < (1,3,1) • It is still one BTREE Index; not a separate BTREE index for each level www.percona.com
  • 13. Overhead of The Indexing • Indexes are costly; Do not add more than you need – In most cases extending index is better than adding new one • Writes - Updating indexes is often major cost of database writes • Reads - Wasted space on disk and in memory; additional overhead during query optimization www.percona.com
  • 14. Impact on Cost of Indexing • Long PRIMARY KEY for Innodb – Make all Secondary keys longer and slower • “Random” PRIMARY KEY for Innodb – Insertion causes a lot of page splits • Longer indexes are generally slower • Index with insertion in random order – SHA1(‘password’) • Low selectivity index cheap for insert – Index on gender • Correlated indexes are less expensive – insert_time is correlated with auto_increment id www.percona.com
  • 15. Indexing Innodb Tables • Data is clustered by Primary Key – Pick PRIMARY KEY what suites you best – For comments – (POST_ID,COMMENT_ID) can be good PRIMARY KEY storing all comments for single post close together • Alternatively “pack” to single BIGINT • PRIMARY KEY is implicitly appended to all indexes – KEY (A) is really KEY (A,ID) internally – Useful for sorting, Covering Index. www.percona.com
  • 16. How MySQL Uses Indexes • • • • Data Lookups Sorting Avoiding reading “data” Special Optimizations www.percona.com
  • 17. Using Indexes for Data Lookups • SELECT * FROM EMPLOYEES WHERE LAST_NAME=“Smith” – The classical use of index on (LAST_NAME) • Can use Multiple column indexes – SELECT * FROM EMPLOYEES WHERE LAST_NAME=“Smith” AND DEPT=“Accounting” – Will use index on (DEPT,LAST_NAME) www.percona.com
  • 18. It Gets Tricky With Multiple Columns • Index (A,B,C) - order of columns matters • Will use Index for lookup (all listed keyparts) – – – – A>5 A=5 AND B>6 A=5 AND B=6 AND C=7 A=5 AND B IN (2,3) AND C>5 • Will NOT use Index – B>5 – Leading column is not referenced – B=6 AND C=7 - Leading column is not referenced • Will use Part of the index – A>5 AND B=2 - range on first column; only use this key part – A=5 AND B>6 AND C=2 - range on second column, use 2 parts www.percona.com
  • 19. The First Rule of MySQL Optimizer • MySQL will stop using key parts in multi part index as soon as it met the real range (<,>, BETWEEN), it however is able to continue using key parts further to the right if IN(…) range is used www.percona.com
  • 20. Using Index for Sorting • SELECT * FROM PLAYERS ORDER BY SCORE DESC LIMIT 10 – Will use index on SCORE column – Without index MySQL will do “filesort” (external sort) which is very expensive • Often Combined with using Index for lookup – SELECT * FROM PLAYERS WHERE COUNTRY=“US” ORDER BY SCORE DESC LIMIT 10 • Best served by Index on (COUNTRY,SCORE) www.percona.com
  • 21. Multi Column indexes for efficient sorting • It becomes even more restricted! • KEY(A,B) • Will use Index for Sorting – – – – ORDER BY A - sorting by leading column A=5 ORDER BY B - EQ filtering by 1st and sorting by 2nd ORDER BY A DESC, B DESC - Sorting by 2 columns in same order A>5 ORDER BY A - Range on the column, sorting on the same • Will NOT use Index for Sorting – – – – ORDER BY B - Sorting by second column in the index A>5 ORDER BY B – Range on first column, sorting by second A IN(1,2) ORDER BY B - In-Range on first column ORDER BY A ASC, B DESC - Sorting in the different order www.percona.com
  • 22. MySQL Using Index for Sorting Rules • You can’t sort in different order by 2 columns • You can only have Equality comparison (=) for columns which are not part of ORDER BY – Not even IN() works in this case www.percona.com
  • 23. Avoiding Reading The data • “Covering Index” – Applies to index use for specific query, not type of index. • Reading Index ONLY and not accessing the “data” • SELECT STATUS FROM ORDERS WHERE CUSTOMER_ID=123 – KEY(CUSTOMER_ID,STATUS) • Index is typically smaller than data • Access is a lot more sequential – Access through data pointers is often quite “random” www.percona.com
  • 24. Min/Max Optimizations • Index help MIN()/MAX() aggregate functions – But only these • SELECT MAX(ID) FROM TBL; • SELECT MAX(SALARY) FROM EMPLOYEE GROUP BY DEPT_ID – Will benefit from (DEPT_ID,SALARY) index – “Using index for group-by” www.percona.com
  • 25. Indexes and Joins • MySQL Performs Joins as “Nested Loops” – SELECT * FROM POSTS,COMMENTS WHERE AUTHOR=“Peter” AND COMMENTS.POST_ID=POSTS.ID • Scan table POSTS finding all posts which have Peter as an Author • For every such post go to COMMENTS table to fetch all comments • Very important to have all JOINs Indexed • Index is only needed on table which is being looked up – The index on POSTS.ID is not needed for this query performance • Re-Design JOIN queries which can’t be well indexed www.percona.com
  • 26. Using Multiple Indexes for the table • MySQL Can use More than one index – “Index Merge” • SELECT * FROM TBL WHERE A=5 AND B=6 – Can often use Indexes on (A) and (B) separately – Index on (A,B) is much better • SELECT * FROM TBL WHERE A=5 OR B=6 – 2 separate indexes is as good as it gets – Index (A,B) can’t be used for this query www.percona.com
  • 27. Prefix Indexes • You can build Index on the leftmost prefix of the column – ALTER TABLE TITLE ADD KEY(TITLE(20)); – Needed to index BLOB/TEXT columns – Can be significantly smaller – Can’t be used as covering index – Choosing prefix length becomes the question www.percona.com
  • 28. Choosing Prefix Length • Prefix should be “Selective enough” – Check number of distinct prefixes vs number of total distinct values mysql> select count(distinct(title)) total, count(distinct(left(title,10))) p10, count(distinct(left(title,20))) p20 from title; +--------+--------+--------+ | total | p10 | p20 | +--------+--------+--------+ | 998335 | 624949 | 960894 | +--------+--------+--------+ 1 row in set (44.19 sec) www.percona.com
  • 29. Choosing Prefix Length • Check for Outliers – Ensure there are not too many rows sharing the same prefix Most common Titles mysql> select count(*) cnt, title tl from title group by tl order by cnt desc limit 3; +-----+-----------------+ | cnt | tl | +-----+-----------------+ | 136 | The Wedding | | 129 | Lost and Found | | 112 | Horror Marathon | +-----+-----------------+ 3 rows in set (27.49 sec) Most Common Title Prefixes mysql> select count(*) cnt, left(title,20) tl from title group by tl order by cnt desc limit 3; +-----+----------------------+ | cnt | tl | +-----+----------------------+ | 184 | Wetten, dass..? aus | | 136 | The Wedding | | 129 | Lost and Found | +-----+----------------------+ 3 rows in set (33.23 sec) www.percona.com
  • 30. What is new with MySQL 5.6 ? • Many Optimizer improvements – Most of them will make your queries better automatically – join_buffer_size variable has whole new meaning • Values if 32MB+ can make sense • Focus on Index Design Practices for this presentation – Most important one: ICP (Index Condition Pushdown) www.percona.com
  • 31. Understanding ICP • Push where clause “Conditions” for Storage engine to filter – Think name like “%ill%” (will not convert to range) • “Much more flexible covering Index” – Plus filtering done on the engine level – efficient • Before MySQL 5.5 – All or none. All is resolved through the index or “row” is read if within range www.percona.com
  • 32. ICP Examples • SELECT A … WHERE B=2 AND C LIKE “%ill%’ – MySQL 5.5 and below • Index (B) – traditional. Using index for range only • Index (B,C,A) - covering. All involved columns included – MySQL 5.6 • Index (B,C) – Range access by B; Filter clause on C only read full row if match • More cases – SELECT * … – WHERE A=5 and C=6 ; Index (A,B,C) • Will scan all index entries with A=5 not all rows www.percona.com
  • 33. How MySQL Picks which Index to Use ? • Performs dynamic picking for every query execution – The constants in query texts matter a lot • Estimates number of rows it needs to access for given index by doing “dive” in the table • Uses “Cardinality” statistics if impossible – This is what ANALYZE TABLE updates www.percona.com
  • 34. More on Picking the Index • Not Just minimizing number of scanned rows • Lots of other heuristics and hacks – – – – PRIMARY Key is special for Innodb Covering Index benefits Full table scan is faster, all being equal Can we also use index for Sorting • Things to know – Verify plan MySQL is actually using – Note it can change dynamically based on constants and data www.percona.com
  • 35. Use EXPLAIN • EXPLAIN is a great tool to see how MySQL plans to execute the query – http://dev.mysql.com/doc/refman/5.6/en/usingexplain.html – Remember real execution might be different mysql> explain select max(season_nr) from title group by production_year; +----+-------------+-------+-------+---------------+-----------------+---------+------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+-----------------+---------+------+------+--------------------------+ | 1 | SIMPLE | title | range | NULL | production_year | 5 | NULL | 201 | Using index for group-by | +----+-------------+-------+-------+---------------+-----------------+---------+------+------+--------------------------+ 1 row in set (0.01 sec) www.percona.com
  • 36. Indexing Strategy • Build indexes for set of your performance critical queries – Look at them together not just one by one • Best if all WHERE clause and JOIN clauses are using indexes for lookups – At least most selective parts are • Generally extend index if you can, instead of creating new indexes • Validate performance impact as you’re doing changes www.percona.com
  • 37. Indexing Strategy Example • Build Index order which benefits more queries – SELECT * FROM TBL WHERE A=5 AND B=6 – SELECT * FROM TBL WHERE A>5 AND B=6 – KEY (B,A) Is better for such query mix • All being equal put more selective key part first • Do not add indexes for non performance critical queries – Many indexes slow system down www.percona.com
  • 38. Trick #1: Enumerating Ranges • KEY (A,B) • SELECT * FROM TBL WHERE A BETWEEN 2 AND 4 AND B=5 – Will only use first key part of the index • SELECT * FROM TBL WHERE A IN (2,3,4) AND B=5 – Will use both key parts www.percona.com
  • 39. Trick #2: Adding Fake Filter • KEY (GENDER,CITY) • SELECT * FROM PEOPLE WHERE CITY=“NEW YORK” – Will not be able to use the index at all • SELECT * FROM PEOPLE WHERE GENDER IN (“M”,”F”) AND CITY=“NEW YORK” – Will be able to use the index • The trick works best with low selectivity columns. – Gender, Status, Boolean Types etc www.percona.com
  • 40. Trick #3: Unionizing Filesort • KEY(A,B) • SELECT * FROM TBL WHERE A IN (1,2) ORDER BY B LIMIT 5; – Will not be able to use index for SORTING • (SELECT * FROM TBL WHERE A=1 ORDER BY B LIMIT 5) UNION ALL (SELECT * FROM TBL WHERE A=2 ORDER BY B LIMIT 5) ORDER BY B LIMIT 5; – Will use the index for Sorting. “filesort” will be needed only to sort over 10 rows. www.percona.com
  • 41. Thank You ! • [email protected] • http://www.percona.com • @percona at Twitter • http://www.facebook.com/Percona www.percona.com