SlideShare a Scribd company logo
Databases
Kodok Márton
■ simb.ro
■ kodokmarton.eu
■ twitter.com/martonkodok
■ facebook.com/marton.kodok
■ stackoverflow.com/users/243782/pentium10
23 May, 2013 @Sapientia
Relational Databases
● A relational database is essentially a group of tables (entities).
● Tables are made up of columns and rows.
● Those tables have constraints, and relationships are defined between them.
● Relational databases are queried using SQL
● Multiple tables being accessed in a single query are "joined" together, typically by
a criteria defined in the table relationship columns.
● Normalization is a data-structuring model used with relational databases that
ensures data consistency and removes data duplication.
Non-Relational Databases (NoSQL)
● Key-value stores are the simplest NoSQL
databases. Every single item in the database is
stored as an attribute name, or key, together with its
value. Examples of key-value stores are Riak and
MongoDB. Some key-value stores, such as Redis,
allow each value to have a type, such as "integer",
which adds functionality.
● Document databases can contain many different
key-value pairs, or key-array pairs, or even nested
documents
● Graph stores are used to store information about
networks, such as social connections
● Wide-column stores such as Cassandra and
HBase are optimized for queries over large datasets,
and store columns of data together, instead of rows.
SQL vs NoSQL
The purpose of this presentation is NOT about
SQL vs NoSQL.
Let’s be blunt: none of them are difficult, we need both of them.
We evolve.
Spreadsheets/Frontends
Excel and Access are not a database.
Let’s be blunt: Excel does not need more than 256 columns and 65 536 rows.
DDL (Data Definition Language)
CREATE TABLE employees (
id INTEGER(11) PRIMARY KEY,
first_name VARCHAR(50) NULL,
last_name VARCHAR(75) NOT NULL,
dateofbirth DATE NULL
);
ALTER TABLE sink ADD bubbles INTEGER;
ALTER TABLE sink DROP COLUMN bubbles;
DROP TABLE employees;
RENAME TABLE My_table TO
Tmp_table;
TRUNCATE TABLE My_table;
CREATE, ALTER, DROP, RENAME, TRUNCATE
SQL (Structured Query Language)
INSERT INTO My_table
(field1, field2, field3)
VALUES
('test', 'N', NULL);
SELECT Book.title AS Title,
COUNT(*) AS Authors
FROM Book JOIN Book_author
ON Book.isbn = Book_author.isbn
GROUP BY Book.title;
UPDATE My_table
SET field1 = 'updated value'
WHERE field2 = 'N';
DELETE FROM My_table
WHERE field2 = 'N';
CRUD (CREATE, READ, UPDATE, DELETE)
Indexes (fast lookup + constraints)
Constraint:
a. PRIMARY
b. UNIQUE
c. FOREIGN KEY
● Index reduce the amount of data the server has to examine
● can speed up reads but can slow down inserts and updates
● is used to enforce constraints
Type:
1. BTREE
- can be used for look-ups and sorting
- can match the full value
- can match a leftmost prefix ( LIKE 'ma%' )
2. HASH
- only supports equality comparisons: =, IN()
- can't be used for sorting
3. FULLTEXT
- only MyISAM tables
- compares words or phrases
- returns a relevance value
Some Queries That Can Use BTREE Index
● point look-up
SELECT * FROM students WHERE grade = 100;
● open range
SELECT * FROM students WHERE grade > 75;
● closed range
SELECT * FROM students WHERE 70 < grade < 80;
● special range
SELECT * FROM students WHERE name LIKE 'ma%';
Multi Column Indexes Useful for sorting/where
CREATE INDEX `salary_name_idx` ON emp(salary, name);
SELECT salary, name FROM emp ORDER BY salary, name;
(5000, 'john') < (5000, 'michael')
(9000, 'philip') < (9999, 'steve')
Indexing InnoDB Tables
● data is clustered by primary key
● primary key is implicitly appended to all indexes
CREATE INDEX fname_idx ON emp(firstname);
actually creates KEY(firstname, id) internally
Avoid long primary keys!
TS-09061982110055-12345
349950002348857737488334
supercalifragilisticexpialidocious
How MySQL Uses Indexes
● looking up data
● joining tables
● sorting
● avoiding reading data
MySQL chooses only ONE index per table.
DON'Ts
● don't follow optimization rules blindly
● don't create an index for every column in your table
thinking that it will make things faster
● don't create duplicate indexes
ex.
BAD:
create index firstname_ix on Employee(firstname);
create index lastname_ix on Employee(lastname);
GOOD:
create index first_last_ix on Employee(firstname, lastname);
create index id_ix on Employee(id);
DOs
● use index for optimizing look-ups, sorting and retrieval of
data
● use short primary keys if possible when using the
InnoDB storage engine
● extend index if you can, instead of creating new indexes
● validate performance impact as you're doing changes
● remove unused indexes
Speeding it up
● proper table design (3nf)
● understand query cache (internal of MySQL)
● EXPLAIN syntax
● proper indexes
● MySQL server daemon optimization
● Slow Query Logs
● Stored Procedures
● Profiling
● Redundancy -> Master : Slave
● Sharding
● mix database servers based on business logic
-> Memcache, Redis, MongoDB
EXPLAIN
EXPLAIN SELECT * FROM attendees
WHERE conference_id = 123 AND registration_status > 0
table possible_keys key rows
attendees NULL NULL 14052
The three most important columns returned by EXPLAIN
1) Possible keys
● All the possible indexes which MySQL could have used
● Based on a series of very quick lookups and calculations
2) Chosen key
3) Rows scanned
● Indication of effort required to identify your result set
-> Interpreting the results
Interpreting the results
EXPLAIN SELECT * FROM attendees
WHERE conference_id = 123 AND registration_status > 0
table possible_keys key rows
attendees NULL NULL 14052
● No suitable indexes for this query
● MySQL had to do a full table scan
● Full table scans are almost always the slowest query
● Full table scans, while not always bad, are usually an indication that an
index is required
-> Adding indexes
Adding indexes
ALTER TABLE ADD INDEX conf (conference_id);
ALTER TABLE ADD INDEX reg (registration_status);
EXPLAIN SELECT * FROM attendees
WHERE conference_id = 123 AND registration_status > 0
table possible_keys key rows
attendees conf, reg conf 331
● MySQL had two indexes to choose from, but discarded “reg”
● “reg” isn't sufficiently unique
● The spread of values can also be a factor (e.g when 99% of rows contain
the same value)
● Index “uniqueness” is called cardinality
● There is scope for some performance increase... Lower server load,
quicker response
-> Choosing a better index
Choosing a better index
ALTER TABLE ADD INDEX reg_conf_index (registration_status, conference_id);
EXPLAIN SELECT * FROM attendees
WHERE registration_status > 0 AND conference_id = 123
table possible_keys key rows
attendees reg, conf,
reg_conf_index
reg_conf_index 204
● reg_conf_index is a much better choice
● Note that the other two keys are still available, just not as effective
● Our query is now served well by the new index
-> Using it wrong
Watch for WHERE column order
DELETE INDEX conf; DELETE INDEX reg;
EXPLAIN SELECT * FROM attendees WHERE conference_id = 123
table possible_keys key rows
attendees NULL NULL 14052
● Without the “conf” index, we're back to square one
● The order in which fields were defined in a composite index affects whether it is available for
use in a query
● Remember, we defined our index : (registration_status, conference_id)
Potential workaround:
EXPLAIN SELECT * FROM attendees WHERE registration_status >= -1 AND
conference_id = 123
table possible_keys key rows
attendees reg_conf_index reg_conf_index 204
JOINs
● JOINing together large data sets (>= 10,000) is really where EXPLAIN
becomes useful
● Each JOIN in a query gets its own row in EXPLAIN
● Make sure each JOIN condition is FAST
● Make sure each joined table is getting to its result set as quickly as possible
● The benefits compound if each join requires less effort
Simple JOIN example
EXPLAIN SELECT * FROM conferences c
JOIN attendees a ON c.conference_id = a.conference_id
WHERE conferences.location_id = 2 AND conferences.topic_id IN (4,6,1) AND
attendees.registration_status > 1
table type possible_keys key rows
conferences ref conference_topic conference_topic 15
attendees ALL NULL NULL 14052
● Looks like I need an index on attendees.conference_id
● Another indication of effort, aside from rows scanned
● Here, “ALL” is bad – we should be aiming for “ref”
● There are 13 different values for “type”
● Common values are:
const, eq_ref, ref, fulltext, index_merge, range, all
http://dev.mysql.com/doc/refman/5.0/en/using-explain.html
The "extra" column
With every EXPLAIN, you get an “extra” column, which shows additional
operations invoked to get your result set.
Some example “extra” values:
● Using index
● Using where
● Using temporary table
● Using filesort
There are many more “extra” values which are discussed in the MySQL manual:
Distinct, Full scan, Impossible HAVING, Impossible WHERE, Not exists
http://dev.mysql.com/doc/refman/5.0/en/explain-output.html#explain-join-types
table type possible_keys key rows extra
attendees ref conf conf 331 Using where
Using filesort
Using filesort
Avoid, because:
● Doesn't use an index
● Involves a full scan of your result set
● Employs a generic (i.e. one size fits all) algorithm
● Creates temporary tables
● Uses the filesystem (seek)
● Will get slower with more data
It's not all bad...
● Perfectly acceptable provided you get to your
● result set as quickly as possible, and keep it predictably small
● Sometimes unavoidable - ORDER BY RAND()
● ORDER BY operations can use indexes to do the sorting!
Using filesort
EXPLAIN SELECT * FROM attendees
WHERE conference_id = 123 ORDER BY surname
ALTER TABLE attendees ADD INDEX conf_surname (conference_id, surname);
We've avoided a filesort!
table possible_keys key rows extra
attendees conference_id conference_id 331 Using filesort
MySQL is using an index, but it's sorting the results slowly
table possible_keys key rows extra
attendees conference_id,
conf_surname
conf_surname 331
NoSQL engines
● Redis
● MongoDB
● Cassandra
● CouchDB
● DynamoDB
● Riak
● Membase
● HBase
Data is created, updated, deleted, retrieved using API calls.
All application and data integrity logic is contained in the application code.
Redis
● Written In: C/C++
● Main point: Blazing fast
● License: BSD
● Protocol: Telnet-like
● Disk-backed in-memory database,
● Master-slave replication
● Simple values or hash tables by keys,
● but complex operations like ZREVRANGEBYSCORE.
● INCR & co (good for rate limiting or statistics)
● Has sets (also union/diff/inter)
● Has lists (also a queue; blocking pop)
● Has hashes (objects of multiple fields)
● Sorted sets (high score table, good for range queries)
● Redis has transactions (!)
● Values can be set to expire (as in a cache)
● Pub/Sub lets one implement messaging (!)
Best used: For rapidly changing data with a foreseeable database size (should fit mostly in memory).
For example: Stock prices. Analytics. Real-time data collection. Real-time communication.
SET uid:1000:username antirezr
uid:1000:followers
uid:1000:following
GET foo => bar
INCR foo => 11
LPUSH mylist a (now mylist holds one element list 'a')
LPUSH mylist b (now mylist holds 'b,a')
LPUSH mylist c (now mylist holds 'c,b,a')
SADD myset a
SADD myset b
SADD myset foo
SADD myset bar
SCARD myset => 4
SMEMBERS myset => bar,a,foo,b
MongoDB
● Written In: C++
● Main point: Retains some friendly properties of SQL. (Query, index)
● License: AGPL (Drivers: Apache)
● Protocol: Custom, binary (BSON)
● Master/slave replication (auto failover with replica sets)
● Sharding built-in
● Queries are javascript expressions / Run arbitrary javascript functions server-side
● Uses memory mapped files for data storage
● Performance over features
● Journaling (with --journal) is best turned on
● On 32bit systems, limited to ~2.5Gb
● An empty database takes up 192Mb
● Has geospatial indexing
Best used: If you need dynamic queries. If you prefer to define indexes, not map/reduce functions. If
you need good performance on a big DB.
For example: For most things that you would do with MySQL or PostgreSQL, but having predefined
columns really holds you back.
MongoDB -> JSON
The MongoDB examples assume a collection named users that contain
documents of the following prototype:
{
_id: ObjectID("509a8fb2f3f4948bd2f983a0"),
user_id: "abc123",
age: 55,
status: 'A'
}
MongoDB -> insert
SQL MongoDB
CREATE TABLE users (
id MEDIUMINT NOT NULL
AUTO_INCREMENT,
user_id Varchar(30),
age Number,
status char(1),
PRIMARY KEY (id)
)
db.users.insert( {
user_id: "abc123",
age: 55,
status: "A"
} )
Implicitly created on first insert operation.
The primary key _id is automatically added
if _id field is not specified.
MongoDB -> Alter, Index, Select
SQL MongoDB
ALTER TABLE users
ADD join_date DATETIME
db.users.update(
{ },
{ $set: { join_date: new Date() } },
{ multi: true }
)
CREATE INDEX idx_user_id_asc
ON users(user_id)
db.users.ensureIndex( { user_id: 1 } )
SELECT user_id, status
FROM users
WHERE status = "A"
db.users.find(
{ status: "A" },
{ user_id: 1, status: 1, _id: 0 }
)
SELECT *
FROM users
WHERE status = "A"
OR age = 50
db.users.find(
{ $or: [ { status: "A" } ,
{ age: 50 } ] }
)
Job Trends from Indeed.com
Thank you.
Questions?
Ad

Recommended

(발표자료) CentOS EOL에 따른 대응 OS 검토 및 적용 방안.pdf
(발표자료) CentOS EOL에 따른 대응 OS 검토 및 적용 방안.pdf
ssuserf8b8bd1
 
Oracle Cloud Infrastructure Overview Deck.pptx
Oracle Cloud Infrastructure Overview Deck.pptx
LabibKhairi
 
[야생의 땅: 듀랑고] 서버 아키텍처 - SPOF 없는 분산 MMORPG 서버
[야생의 땅: 듀랑고] 서버 아키텍처 - SPOF 없는 분산 MMORPG 서버
Heungsub Lee
 
Next-generation MMORPG service architecture
Next-generation MMORPG service architecture
Jongwon Kim
 
Microsoft Azure Active Directory
Microsoft Azure Active Directory
David J Rosenthal
 
Introduction to Google Cloud Platform
Introduction to Google Cloud Platform
Sujai Prakasam
 
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유
Hyojun Jeon
 
백억개의 로그를 모아 검색하고 분석하고 학습도 시켜보자 : 로기스
백억개의 로그를 모아 검색하고 분석하고 학습도 시켜보자 : 로기스
NAVER D2
 
AWS solution Architect Associate study material
AWS solution Architect Associate study material
Nagesh Ramamoorthy
 
Quic을 이용한 네트워크 성능 개선
Quic을 이용한 네트워크 성능 개선
NAVER D2
 
オラクルのHPC/GPUソリューションご紹介(2021/08版)
オラクルのHPC/GPUソリューションご紹介(2021/08版)
オラクルエンジニア通信
 
Migrating Oracle database to PostgreSQL
Migrating Oracle database to PostgreSQL
Umair Mansoob
 
Windows Azure Virtual Machines
Windows Azure Virtual Machines
Clint Edmonson
 
대용량 분산 아키텍쳐 설계 #1 아키텍쳐 설계 방법론
대용량 분산 아키텍쳐 설계 #1 아키텍쳐 설계 방법론
Terry Cho
 
Ws2012フェールオーバークラスタリングdeep dive 130802
Ws2012フェールオーバークラスタリングdeep dive 130802
wintechq
 
Microservices, Kubernetes and Istio - A Great Fit!
Microservices, Kubernetes and Istio - A Great Fit!
Animesh Singh
 
Google Cloud Platform Training | Introduction To GCP | Google Cloud Platform ...
Google Cloud Platform Training | Introduction To GCP | Google Cloud Platform ...
Edureka!
 
Azure Web Apps - Introduction
Azure Web Apps - Introduction
Christopher Gomez
 
Domain Driven Design
Domain Driven Design
Araf Karsh Hamid
 
Aws
Aws
mahes3231
 
AWS PPT.pptx
AWS PPT.pptx
GauravSharma164138
 
デプロイメントパイプラインって何?
デプロイメントパイプラインって何?
ke-m kamekoopa
 
Google cloud platform introduction
Google cloud platform introduction
Simon Su
 
Introduction to the Microsoft Azure Cloud.pptx
Introduction to the Microsoft Azure Cloud.pptx
EverestMedinilla2
 
Azure Logic Apps
Azure Logic Apps
BizTalk360
 
Microsoft Azure Cloud Services
Microsoft Azure Cloud Services
David J Rosenthal
 
Google cloud platform
Google cloud platform
Ankit Malviya
 
Understanding cloud with Google Cloud Platform
Understanding cloud with Google Cloud Platform
Dr. Ketan Parmar
 
Mysql Explain Explained
Mysql Explain Explained
Jeremy Coates
 
MySQL Indexing : Improving Query Performance Using Index (Covering Index)
MySQL Indexing : Improving Query Performance Using Index (Covering Index)
Hemant Kumar Singh
 

More Related Content

What's hot (20)

AWS solution Architect Associate study material
AWS solution Architect Associate study material
Nagesh Ramamoorthy
 
Quic을 이용한 네트워크 성능 개선
Quic을 이용한 네트워크 성능 개선
NAVER D2
 
オラクルのHPC/GPUソリューションご紹介(2021/08版)
オラクルのHPC/GPUソリューションご紹介(2021/08版)
オラクルエンジニア通信
 
Migrating Oracle database to PostgreSQL
Migrating Oracle database to PostgreSQL
Umair Mansoob
 
Windows Azure Virtual Machines
Windows Azure Virtual Machines
Clint Edmonson
 
대용량 분산 아키텍쳐 설계 #1 아키텍쳐 설계 방법론
대용량 분산 아키텍쳐 설계 #1 아키텍쳐 설계 방법론
Terry Cho
 
Ws2012フェールオーバークラスタリングdeep dive 130802
Ws2012フェールオーバークラスタリングdeep dive 130802
wintechq
 
Microservices, Kubernetes and Istio - A Great Fit!
Microservices, Kubernetes and Istio - A Great Fit!
Animesh Singh
 
Google Cloud Platform Training | Introduction To GCP | Google Cloud Platform ...
Google Cloud Platform Training | Introduction To GCP | Google Cloud Platform ...
Edureka!
 
Azure Web Apps - Introduction
Azure Web Apps - Introduction
Christopher Gomez
 
Domain Driven Design
Domain Driven Design
Araf Karsh Hamid
 
Aws
Aws
mahes3231
 
AWS PPT.pptx
AWS PPT.pptx
GauravSharma164138
 
デプロイメントパイプラインって何?
デプロイメントパイプラインって何?
ke-m kamekoopa
 
Google cloud platform introduction
Google cloud platform introduction
Simon Su
 
Introduction to the Microsoft Azure Cloud.pptx
Introduction to the Microsoft Azure Cloud.pptx
EverestMedinilla2
 
Azure Logic Apps
Azure Logic Apps
BizTalk360
 
Microsoft Azure Cloud Services
Microsoft Azure Cloud Services
David J Rosenthal
 
Google cloud platform
Google cloud platform
Ankit Malviya
 
Understanding cloud with Google Cloud Platform
Understanding cloud with Google Cloud Platform
Dr. Ketan Parmar
 
AWS solution Architect Associate study material
AWS solution Architect Associate study material
Nagesh Ramamoorthy
 
Quic을 이용한 네트워크 성능 개선
Quic을 이용한 네트워크 성능 개선
NAVER D2
 
オラクルのHPC/GPUソリューションご紹介(2021/08版)
オラクルのHPC/GPUソリューションご紹介(2021/08版)
オラクルエンジニア通信
 
Migrating Oracle database to PostgreSQL
Migrating Oracle database to PostgreSQL
Umair Mansoob
 
Windows Azure Virtual Machines
Windows Azure Virtual Machines
Clint Edmonson
 
대용량 분산 아키텍쳐 설계 #1 아키텍쳐 설계 방법론
대용량 분산 아키텍쳐 설계 #1 아키텍쳐 설계 방법론
Terry Cho
 
Ws2012フェールオーバークラスタリングdeep dive 130802
Ws2012フェールオーバークラスタリングdeep dive 130802
wintechq
 
Microservices, Kubernetes and Istio - A Great Fit!
Microservices, Kubernetes and Istio - A Great Fit!
Animesh Singh
 
Google Cloud Platform Training | Introduction To GCP | Google Cloud Platform ...
Google Cloud Platform Training | Introduction To GCP | Google Cloud Platform ...
Edureka!
 
Azure Web Apps - Introduction
Azure Web Apps - Introduction
Christopher Gomez
 
デプロイメントパイプラインって何?
デプロイメントパイプラインって何?
ke-m kamekoopa
 
Google cloud platform introduction
Google cloud platform introduction
Simon Su
 
Introduction to the Microsoft Azure Cloud.pptx
Introduction to the Microsoft Azure Cloud.pptx
EverestMedinilla2
 
Azure Logic Apps
Azure Logic Apps
BizTalk360
 
Microsoft Azure Cloud Services
Microsoft Azure Cloud Services
David J Rosenthal
 
Google cloud platform
Google cloud platform
Ankit Malviya
 
Understanding cloud with Google Cloud Platform
Understanding cloud with Google Cloud Platform
Dr. Ketan Parmar
 

Similar to Introduction to Databases - query optimizations for MySQL (20)

Mysql Explain Explained
Mysql Explain Explained
Jeremy Coates
 
MySQL Indexing : Improving Query Performance Using Index (Covering Index)
MySQL Indexing : Improving Query Performance Using Index (Covering Index)
Hemant Kumar Singh
 
MySQL Performance Optimization
MySQL Performance Optimization
Mindfire Solutions
 
MySQL Indexing
MySQL Indexing
BADR
 
MySQL Query Optimisation 101
MySQL Query Optimisation 101
Federico Razzoli
 
San diegophp
San diegophp
Dave Stokes
 
Steps towards of sql server developer
Steps towards of sql server developer
Ahsan Kabir
 
Goldilocks and the Three MySQL Queries
Goldilocks and the Three MySQL Queries
Dave Stokes
 
Sydney Oracle Meetup - indexes
Sydney Oracle Meetup - indexes
paulguerin
 
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09
guest9d79e073
 
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09
Mark Ginnebaugh
 
Database Basics
Database Basics
Abdel Moneim Emad
 
15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance
guest9912e5
 
Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007
paulguerin
 
SQL Tunning
SQL Tunning
Dhananjay Goel
 
Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008
paulguerin
 
Basics on SQL queries
Basics on SQL queries
Knoldus Inc.
 
SQL Server 2008 Development for Programmers
SQL Server 2008 Development for Programmers
Adam Hutson
 
RDBMS SQL Basics
RDBMS SQL Basics
David Gloyn-Cox
 
PostgreSQL - Object Relational Database
PostgreSQL - Object Relational Database
Mubashar Iqbal
 
Mysql Explain Explained
Mysql Explain Explained
Jeremy Coates
 
MySQL Indexing : Improving Query Performance Using Index (Covering Index)
MySQL Indexing : Improving Query Performance Using Index (Covering Index)
Hemant Kumar Singh
 
MySQL Performance Optimization
MySQL Performance Optimization
Mindfire Solutions
 
MySQL Indexing
MySQL Indexing
BADR
 
MySQL Query Optimisation 101
MySQL Query Optimisation 101
Federico Razzoli
 
Steps towards of sql server developer
Steps towards of sql server developer
Ahsan Kabir
 
Goldilocks and the Three MySQL Queries
Goldilocks and the Three MySQL Queries
Dave Stokes
 
Sydney Oracle Meetup - indexes
Sydney Oracle Meetup - indexes
paulguerin
 
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09
guest9d79e073
 
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09
Mark Ginnebaugh
 
15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance
guest9912e5
 
Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007
paulguerin
 
Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008
paulguerin
 
Basics on SQL queries
Basics on SQL queries
Knoldus Inc.
 
SQL Server 2008 Development for Programmers
SQL Server 2008 Development for Programmers
Adam Hutson
 
PostgreSQL - Object Relational Database
PostgreSQL - Object Relational Database
Mubashar Iqbal
 
Ad

More from Márton Kodok (20)

AI Agents with Gemini 2.0 - Beyond the Chatbot
AI Agents with Gemini 2.0 - Beyond the Chatbot
Márton Kodok
 
Gemini 2.0 and Vertex AI for Innovation Workshop
Gemini 2.0 and Vertex AI for Innovation Workshop
Márton Kodok
 
Function Calling with the Vertex AI Gemini API
Function Calling with the Vertex AI Gemini API
Márton Kodok
 
Vector search and multimodal embeddings in BigQuery
Vector search and multimodal embeddings in BigQuery
Márton Kodok
 
BigQuery Remote Functions for Dynamic Mapping of E-mobility Charging Networks
BigQuery Remote Functions for Dynamic Mapping of E-mobility Charging Networks
Márton Kodok
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
Márton Kodok
 
DevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflows
Márton Kodok
 
Discover BigQuery ML, build your own CREATE MODEL statement
Discover BigQuery ML, build your own CREATE MODEL statement
Márton Kodok
 
Cloud Run - the rise of serverless and containerization
Cloud Run - the rise of serverless and containerization
Márton Kodok
 
BigQuery best practices and recommendations to reduce costs with BI Engine, S...
BigQuery best practices and recommendations to reduce costs with BI Engine, S...
Márton Kodok
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Márton Kodok
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
Márton Kodok
 
Cloud Workflows What's new in serverless orchestration and automation
Cloud Workflows What's new in serverless orchestration and automation
Márton Kodok
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud Workflows
Márton Kodok
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud Workflows
Márton Kodok
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud Workflows
Márton Kodok
 
BigdataConference Europe - BigQuery ML
BigdataConference Europe - BigQuery ML
Márton Kodok
 
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
Márton Kodok
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQL
Márton Kodok
 
AI Agents with Gemini 2.0 - Beyond the Chatbot
AI Agents with Gemini 2.0 - Beyond the Chatbot
Márton Kodok
 
Gemini 2.0 and Vertex AI for Innovation Workshop
Gemini 2.0 and Vertex AI for Innovation Workshop
Márton Kodok
 
Function Calling with the Vertex AI Gemini API
Function Calling with the Vertex AI Gemini API
Márton Kodok
 
Vector search and multimodal embeddings in BigQuery
Vector search and multimodal embeddings in BigQuery
Márton Kodok
 
BigQuery Remote Functions for Dynamic Mapping of E-mobility Charging Networks
BigQuery Remote Functions for Dynamic Mapping of E-mobility Charging Networks
Márton Kodok
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
Márton Kodok
 
DevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflows
Márton Kodok
 
Discover BigQuery ML, build your own CREATE MODEL statement
Discover BigQuery ML, build your own CREATE MODEL statement
Márton Kodok
 
Cloud Run - the rise of serverless and containerization
Cloud Run - the rise of serverless and containerization
Márton Kodok
 
BigQuery best practices and recommendations to reduce costs with BI Engine, S...
BigQuery best practices and recommendations to reduce costs with BI Engine, S...
Márton Kodok
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Márton Kodok
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
Márton Kodok
 
Cloud Workflows What's new in serverless orchestration and automation
Cloud Workflows What's new in serverless orchestration and automation
Márton Kodok
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud Workflows
Márton Kodok
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud Workflows
Márton Kodok
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud Workflows
Márton Kodok
 
BigdataConference Europe - BigQuery ML
BigdataConference Europe - BigQuery ML
Márton Kodok
 
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
Márton Kodok
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQL
Márton Kodok
 
Ad

Recently uploaded (20)

Open Source Software Development Methods
Open Source Software Development Methods
VICTOR MAESTRE RAMIREZ
 
dp-700 exam questions sample docume .pdf
dp-700 exam questions sample docume .pdf
pravkumarbiz
 
wAIred_RabobankIgniteSession_12062025.pptx
wAIred_RabobankIgniteSession_12062025.pptx
SimonedeGijt
 
GDG Douglas - Google AI Agents: Your Next Intern?
GDG Douglas - Google AI Agents: Your Next Intern?
felipeceotto
 
Wondershare PDFelement Pro 11.4.20.3548 Crack Free Download
Wondershare PDFelement Pro 11.4.20.3548 Crack Free Download
Puppy jhon
 
Enable Your Cloud Journey With Microsoft Trusted Partner | IFI Tech
Enable Your Cloud Journey With Microsoft Trusted Partner | IFI Tech
IFI Techsolutions
 
Async-ronizing Success at Wix - Patterns for Seamless Microservices - Devoxx ...
Async-ronizing Success at Wix - Patterns for Seamless Microservices - Devoxx ...
Natan Silnitsky
 
Reimagining Software Development and DevOps with Agentic AI
Reimagining Software Development and DevOps with Agentic AI
Maxim Salnikov
 
Rierino Commerce Platform - CMS Solution
Rierino Commerce Platform - CMS Solution
Rierino
 
Step by step guide to install Flutter and Dart
Step by step guide to install Flutter and Dart
S Pranav (Deepu)
 
Integrating Survey123 and R&H Data Using FME
Integrating Survey123 and R&H Data Using FME
Safe Software
 
Who will create the languages of the future?
Who will create the languages of the future?
Jordi Cabot
 
Shell Skill Tree - LabEx Certification (LabEx)
Shell Skill Tree - LabEx Certification (LabEx)
VICTOR MAESTRE RAMIREZ
 
Zoneranker’s Digital marketing solutions
Zoneranker’s Digital marketing solutions
reenashriee
 
On-Device AI: Is It Time to Go All-In, or Do We Still Need the Cloud?
On-Device AI: Is It Time to Go All-In, or Do We Still Need the Cloud?
Hassan Abid
 
How the US Navy Approaches DevSecOps with Raise 2.0
How the US Navy Approaches DevSecOps with Raise 2.0
Anchore
 
Software Testing & it’s types (DevOps)
Software Testing & it’s types (DevOps)
S Pranav (Deepu)
 
UPDASP a project coordination unit ......
UPDASP a project coordination unit ......
withrj1
 
Migrating to Azure Cosmos DB the Right Way
Migrating to Azure Cosmos DB the Right Way
Alexander (Alex) Komyagin
 
Insurance Underwriting Software Enhancing Accuracy and Efficiency
Insurance Underwriting Software Enhancing Accuracy and Efficiency
Insurance Tech Services
 
Open Source Software Development Methods
Open Source Software Development Methods
VICTOR MAESTRE RAMIREZ
 
dp-700 exam questions sample docume .pdf
dp-700 exam questions sample docume .pdf
pravkumarbiz
 
wAIred_RabobankIgniteSession_12062025.pptx
wAIred_RabobankIgniteSession_12062025.pptx
SimonedeGijt
 
GDG Douglas - Google AI Agents: Your Next Intern?
GDG Douglas - Google AI Agents: Your Next Intern?
felipeceotto
 
Wondershare PDFelement Pro 11.4.20.3548 Crack Free Download
Wondershare PDFelement Pro 11.4.20.3548 Crack Free Download
Puppy jhon
 
Enable Your Cloud Journey With Microsoft Trusted Partner | IFI Tech
Enable Your Cloud Journey With Microsoft Trusted Partner | IFI Tech
IFI Techsolutions
 
Async-ronizing Success at Wix - Patterns for Seamless Microservices - Devoxx ...
Async-ronizing Success at Wix - Patterns for Seamless Microservices - Devoxx ...
Natan Silnitsky
 
Reimagining Software Development and DevOps with Agentic AI
Reimagining Software Development and DevOps with Agentic AI
Maxim Salnikov
 
Rierino Commerce Platform - CMS Solution
Rierino Commerce Platform - CMS Solution
Rierino
 
Step by step guide to install Flutter and Dart
Step by step guide to install Flutter and Dart
S Pranav (Deepu)
 
Integrating Survey123 and R&H Data Using FME
Integrating Survey123 and R&H Data Using FME
Safe Software
 
Who will create the languages of the future?
Who will create the languages of the future?
Jordi Cabot
 
Shell Skill Tree - LabEx Certification (LabEx)
Shell Skill Tree - LabEx Certification (LabEx)
VICTOR MAESTRE RAMIREZ
 
Zoneranker’s Digital marketing solutions
Zoneranker’s Digital marketing solutions
reenashriee
 
On-Device AI: Is It Time to Go All-In, or Do We Still Need the Cloud?
On-Device AI: Is It Time to Go All-In, or Do We Still Need the Cloud?
Hassan Abid
 
How the US Navy Approaches DevSecOps with Raise 2.0
How the US Navy Approaches DevSecOps with Raise 2.0
Anchore
 
Software Testing & it’s types (DevOps)
Software Testing & it’s types (DevOps)
S Pranav (Deepu)
 
UPDASP a project coordination unit ......
UPDASP a project coordination unit ......
withrj1
 
Insurance Underwriting Software Enhancing Accuracy and Efficiency
Insurance Underwriting Software Enhancing Accuracy and Efficiency
Insurance Tech Services
 

Introduction to Databases - query optimizations for MySQL

  • 1. Databases Kodok Márton ■ simb.ro ■ kodokmarton.eu ■ twitter.com/martonkodok ■ facebook.com/marton.kodok ■ stackoverflow.com/users/243782/pentium10 23 May, 2013 @Sapientia
  • 2. Relational Databases ● A relational database is essentially a group of tables (entities). ● Tables are made up of columns and rows. ● Those tables have constraints, and relationships are defined between them. ● Relational databases are queried using SQL ● Multiple tables being accessed in a single query are "joined" together, typically by a criteria defined in the table relationship columns. ● Normalization is a data-structuring model used with relational databases that ensures data consistency and removes data duplication.
  • 3. Non-Relational Databases (NoSQL) ● Key-value stores are the simplest NoSQL databases. Every single item in the database is stored as an attribute name, or key, together with its value. Examples of key-value stores are Riak and MongoDB. Some key-value stores, such as Redis, allow each value to have a type, such as "integer", which adds functionality. ● Document databases can contain many different key-value pairs, or key-array pairs, or even nested documents ● Graph stores are used to store information about networks, such as social connections ● Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows.
  • 4. SQL vs NoSQL The purpose of this presentation is NOT about SQL vs NoSQL. Let’s be blunt: none of them are difficult, we need both of them. We evolve.
  • 5. Spreadsheets/Frontends Excel and Access are not a database. Let’s be blunt: Excel does not need more than 256 columns and 65 536 rows.
  • 6. DDL (Data Definition Language) CREATE TABLE employees ( id INTEGER(11) PRIMARY KEY, first_name VARCHAR(50) NULL, last_name VARCHAR(75) NOT NULL, dateofbirth DATE NULL ); ALTER TABLE sink ADD bubbles INTEGER; ALTER TABLE sink DROP COLUMN bubbles; DROP TABLE employees; RENAME TABLE My_table TO Tmp_table; TRUNCATE TABLE My_table; CREATE, ALTER, DROP, RENAME, TRUNCATE
  • 7. SQL (Structured Query Language) INSERT INTO My_table (field1, field2, field3) VALUES ('test', 'N', NULL); SELECT Book.title AS Title, COUNT(*) AS Authors FROM Book JOIN Book_author ON Book.isbn = Book_author.isbn GROUP BY Book.title; UPDATE My_table SET field1 = 'updated value' WHERE field2 = 'N'; DELETE FROM My_table WHERE field2 = 'N'; CRUD (CREATE, READ, UPDATE, DELETE)
  • 8. Indexes (fast lookup + constraints) Constraint: a. PRIMARY b. UNIQUE c. FOREIGN KEY ● Index reduce the amount of data the server has to examine ● can speed up reads but can slow down inserts and updates ● is used to enforce constraints Type: 1. BTREE - can be used for look-ups and sorting - can match the full value - can match a leftmost prefix ( LIKE 'ma%' ) 2. HASH - only supports equality comparisons: =, IN() - can't be used for sorting 3. FULLTEXT - only MyISAM tables - compares words or phrases - returns a relevance value
  • 9. Some Queries That Can Use BTREE Index ● point look-up SELECT * FROM students WHERE grade = 100; ● open range SELECT * FROM students WHERE grade > 75; ● closed range SELECT * FROM students WHERE 70 < grade < 80; ● special range SELECT * FROM students WHERE name LIKE 'ma%'; Multi Column Indexes Useful for sorting/where CREATE INDEX `salary_name_idx` ON emp(salary, name); SELECT salary, name FROM emp ORDER BY salary, name; (5000, 'john') < (5000, 'michael') (9000, 'philip') < (9999, 'steve')
  • 10. Indexing InnoDB Tables ● data is clustered by primary key ● primary key is implicitly appended to all indexes CREATE INDEX fname_idx ON emp(firstname); actually creates KEY(firstname, id) internally Avoid long primary keys! TS-09061982110055-12345 349950002348857737488334 supercalifragilisticexpialidocious
  • 11. How MySQL Uses Indexes ● looking up data ● joining tables ● sorting ● avoiding reading data MySQL chooses only ONE index per table.
  • 12. DON'Ts ● don't follow optimization rules blindly ● don't create an index for every column in your table thinking that it will make things faster ● don't create duplicate indexes ex. BAD: create index firstname_ix on Employee(firstname); create index lastname_ix on Employee(lastname); GOOD: create index first_last_ix on Employee(firstname, lastname); create index id_ix on Employee(id);
  • 13. DOs ● use index for optimizing look-ups, sorting and retrieval of data ● use short primary keys if possible when using the InnoDB storage engine ● extend index if you can, instead of creating new indexes ● validate performance impact as you're doing changes ● remove unused indexes
  • 14. Speeding it up ● proper table design (3nf) ● understand query cache (internal of MySQL) ● EXPLAIN syntax ● proper indexes ● MySQL server daemon optimization ● Slow Query Logs ● Stored Procedures ● Profiling ● Redundancy -> Master : Slave ● Sharding ● mix database servers based on business logic -> Memcache, Redis, MongoDB
  • 15. EXPLAIN EXPLAIN SELECT * FROM attendees WHERE conference_id = 123 AND registration_status > 0 table possible_keys key rows attendees NULL NULL 14052 The three most important columns returned by EXPLAIN 1) Possible keys ● All the possible indexes which MySQL could have used ● Based on a series of very quick lookups and calculations 2) Chosen key 3) Rows scanned ● Indication of effort required to identify your result set -> Interpreting the results
  • 16. Interpreting the results EXPLAIN SELECT * FROM attendees WHERE conference_id = 123 AND registration_status > 0 table possible_keys key rows attendees NULL NULL 14052 ● No suitable indexes for this query ● MySQL had to do a full table scan ● Full table scans are almost always the slowest query ● Full table scans, while not always bad, are usually an indication that an index is required -> Adding indexes
  • 17. Adding indexes ALTER TABLE ADD INDEX conf (conference_id); ALTER TABLE ADD INDEX reg (registration_status); EXPLAIN SELECT * FROM attendees WHERE conference_id = 123 AND registration_status > 0 table possible_keys key rows attendees conf, reg conf 331 ● MySQL had two indexes to choose from, but discarded “reg” ● “reg” isn't sufficiently unique ● The spread of values can also be a factor (e.g when 99% of rows contain the same value) ● Index “uniqueness” is called cardinality ● There is scope for some performance increase... Lower server load, quicker response -> Choosing a better index
  • 18. Choosing a better index ALTER TABLE ADD INDEX reg_conf_index (registration_status, conference_id); EXPLAIN SELECT * FROM attendees WHERE registration_status > 0 AND conference_id = 123 table possible_keys key rows attendees reg, conf, reg_conf_index reg_conf_index 204 ● reg_conf_index is a much better choice ● Note that the other two keys are still available, just not as effective ● Our query is now served well by the new index -> Using it wrong
  • 19. Watch for WHERE column order DELETE INDEX conf; DELETE INDEX reg; EXPLAIN SELECT * FROM attendees WHERE conference_id = 123 table possible_keys key rows attendees NULL NULL 14052 ● Without the “conf” index, we're back to square one ● The order in which fields were defined in a composite index affects whether it is available for use in a query ● Remember, we defined our index : (registration_status, conference_id) Potential workaround: EXPLAIN SELECT * FROM attendees WHERE registration_status >= -1 AND conference_id = 123 table possible_keys key rows attendees reg_conf_index reg_conf_index 204
  • 20. JOINs ● JOINing together large data sets (>= 10,000) is really where EXPLAIN becomes useful ● Each JOIN in a query gets its own row in EXPLAIN ● Make sure each JOIN condition is FAST ● Make sure each joined table is getting to its result set as quickly as possible ● The benefits compound if each join requires less effort
  • 21. Simple JOIN example EXPLAIN SELECT * FROM conferences c JOIN attendees a ON c.conference_id = a.conference_id WHERE conferences.location_id = 2 AND conferences.topic_id IN (4,6,1) AND attendees.registration_status > 1 table type possible_keys key rows conferences ref conference_topic conference_topic 15 attendees ALL NULL NULL 14052 ● Looks like I need an index on attendees.conference_id ● Another indication of effort, aside from rows scanned ● Here, “ALL” is bad – we should be aiming for “ref” ● There are 13 different values for “type” ● Common values are: const, eq_ref, ref, fulltext, index_merge, range, all http://dev.mysql.com/doc/refman/5.0/en/using-explain.html
  • 22. The "extra" column With every EXPLAIN, you get an “extra” column, which shows additional operations invoked to get your result set. Some example “extra” values: ● Using index ● Using where ● Using temporary table ● Using filesort There are many more “extra” values which are discussed in the MySQL manual: Distinct, Full scan, Impossible HAVING, Impossible WHERE, Not exists http://dev.mysql.com/doc/refman/5.0/en/explain-output.html#explain-join-types table type possible_keys key rows extra attendees ref conf conf 331 Using where Using filesort
  • 23. Using filesort Avoid, because: ● Doesn't use an index ● Involves a full scan of your result set ● Employs a generic (i.e. one size fits all) algorithm ● Creates temporary tables ● Uses the filesystem (seek) ● Will get slower with more data It's not all bad... ● Perfectly acceptable provided you get to your ● result set as quickly as possible, and keep it predictably small ● Sometimes unavoidable - ORDER BY RAND() ● ORDER BY operations can use indexes to do the sorting!
  • 24. Using filesort EXPLAIN SELECT * FROM attendees WHERE conference_id = 123 ORDER BY surname ALTER TABLE attendees ADD INDEX conf_surname (conference_id, surname); We've avoided a filesort! table possible_keys key rows extra attendees conference_id conference_id 331 Using filesort MySQL is using an index, but it's sorting the results slowly table possible_keys key rows extra attendees conference_id, conf_surname conf_surname 331
  • 25. NoSQL engines ● Redis ● MongoDB ● Cassandra ● CouchDB ● DynamoDB ● Riak ● Membase ● HBase Data is created, updated, deleted, retrieved using API calls. All application and data integrity logic is contained in the application code.
  • 26. Redis ● Written In: C/C++ ● Main point: Blazing fast ● License: BSD ● Protocol: Telnet-like ● Disk-backed in-memory database, ● Master-slave replication ● Simple values or hash tables by keys, ● but complex operations like ZREVRANGEBYSCORE. ● INCR & co (good for rate limiting or statistics) ● Has sets (also union/diff/inter) ● Has lists (also a queue; blocking pop) ● Has hashes (objects of multiple fields) ● Sorted sets (high score table, good for range queries) ● Redis has transactions (!) ● Values can be set to expire (as in a cache) ● Pub/Sub lets one implement messaging (!) Best used: For rapidly changing data with a foreseeable database size (should fit mostly in memory). For example: Stock prices. Analytics. Real-time data collection. Real-time communication. SET uid:1000:username antirezr uid:1000:followers uid:1000:following GET foo => bar INCR foo => 11 LPUSH mylist a (now mylist holds one element list 'a') LPUSH mylist b (now mylist holds 'b,a') LPUSH mylist c (now mylist holds 'c,b,a') SADD myset a SADD myset b SADD myset foo SADD myset bar SCARD myset => 4 SMEMBERS myset => bar,a,foo,b
  • 27. MongoDB ● Written In: C++ ● Main point: Retains some friendly properties of SQL. (Query, index) ● License: AGPL (Drivers: Apache) ● Protocol: Custom, binary (BSON) ● Master/slave replication (auto failover with replica sets) ● Sharding built-in ● Queries are javascript expressions / Run arbitrary javascript functions server-side ● Uses memory mapped files for data storage ● Performance over features ● Journaling (with --journal) is best turned on ● On 32bit systems, limited to ~2.5Gb ● An empty database takes up 192Mb ● Has geospatial indexing Best used: If you need dynamic queries. If you prefer to define indexes, not map/reduce functions. If you need good performance on a big DB. For example: For most things that you would do with MySQL or PostgreSQL, but having predefined columns really holds you back.
  • 28. MongoDB -> JSON The MongoDB examples assume a collection named users that contain documents of the following prototype: { _id: ObjectID("509a8fb2f3f4948bd2f983a0"), user_id: "abc123", age: 55, status: 'A' }
  • 29. MongoDB -> insert SQL MongoDB CREATE TABLE users ( id MEDIUMINT NOT NULL AUTO_INCREMENT, user_id Varchar(30), age Number, status char(1), PRIMARY KEY (id) ) db.users.insert( { user_id: "abc123", age: 55, status: "A" } ) Implicitly created on first insert operation. The primary key _id is automatically added if _id field is not specified.
  • 30. MongoDB -> Alter, Index, Select SQL MongoDB ALTER TABLE users ADD join_date DATETIME db.users.update( { }, { $set: { join_date: new Date() } }, { multi: true } ) CREATE INDEX idx_user_id_asc ON users(user_id) db.users.ensureIndex( { user_id: 1 } ) SELECT user_id, status FROM users WHERE status = "A" db.users.find( { status: "A" }, { user_id: 1, status: 1, _id: 0 } ) SELECT * FROM users WHERE status = "A" OR age = 50 db.users.find( { $or: [ { status: "A" } , { age: 50 } ] } )
  • 31. Job Trends from Indeed.com