SlideShare a Scribd company logo
Performance Tuning
ColumnStore
What will we have today?
● Relevant but rarely used Columnstore.xml settings
● Computer resources overview
● CS monitoring and insights tools
● Query performance tips
...No magic wand certificates though
Query Performance Overview
● Does a query run fast enough?
● If not, why?
● Can we speed the query processing up?
Row-oriented vs. Column-oriented format
ID Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
ID
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
SELECT Fname FROM Table 1 WHERE State = 'NY' ● Row oriented
○ Rows stored
sequentially in a file
○ Scans through every
record row by row
● Column oriented
○ Each column is
stored in a separate
file
○ Scans the only
relevant column
Data Loading and Extents
CSV File
Extent 1
Min 1
Max 100
Extent 2
Min 105
Max 200
8 million rows
8 million rows
Data loadData Range
1 ~ 200
Rows 16 million
New CSV File
Data Range
150 ~ 210
Rows 16 million
Extent 3
Min 150
Max 165
Extent 4
Min 162
Max 192
8 million rows
8 million rows
Data load
Second Data Load
Extent Elimination
+8M values
SELECT Item, sum(Quantity) FROM Orders
WHERE ShipDate between ‘2016-01-01’ and ‘2016-01-31’
GROUP BY Item
Extent 3:
ShipDate: 2016-09-24 - 2017-01-06
Id OrderId Line Item Quantity Price Supplier ShipDate ShipMode
1 1 1 Laptop 5 1000 Dell
2016-01-1
2 G
... ... ... ... ... ... ... ... ...
8M 2016-03-05
8M+1 2016-03-05
... ... ... ... ... ... ... ... ...
16M 2016-09-23
16M+1 2016-09-24
... ... ... ... ... ... ... ... ...
24M 2017-01-06
ELIMINATED PARTITION
ELIMINATED PARTITION
-8M values
-8M values
Extent 1:
ShipDate: 2016-01-12 - 2016-03-05
Extent 2:
ShipDate: 2016-03-05 - 2016-09-23
Data Ingestion
● Load data ordered by the columns you filter most often for maximum IO elimination
● If you want to drop partitions based on a particular column, order by that column first
Data Modeling
● Conservative data typing reduces IO, compute, and memory requirements
❖ Short strings (up to char(8) and varchar(7)) are handled internally as integers
● Star-schema optimizations are generally a good idea
● Break down compound fields into individual fields
❖ Trivializes searching for sub-fields
❖ Can allow greater usage of short strings
Take Advantage of Push-Down Operations
● Filters
● Aggregates
● Functions & expressions
● Joins
What is not Pushed Down
● Having
● Window Functions
● ORDER BY
● LIMIT
Common Pitfalls
● It is OLAP, not OLTP
❖ single-row inserts
● Updating columns that upset the import sort order
● Top-level order by clause
Troubleshooting Queries that are Still Too Slow
● Given what you know about ColumnStore operation, can the query be improved?
● What does your resource usage look like? Are there bottlenecks?
Computer resources & bottlenecks
● CPU
● Storage: SSD, HDD
● Memory
● Network
...and there could be algorithmic bottlenecks
Computer resources utilization
● Utilization is a broad metric and gives no details
● < 100% utilization doesn’t mean you can improve the situation
● 100% utilization doesn’t mean you can’t improve the situation
Computer resources: CPU
● Use top, htop and friends for CPU utilization
❖ instructions Per Clock rate differs - use HyperThreading (perf stat)
● A CPU core could be 100% utilized:
❖ CPU may be busy waiting for data from cache or RAM (perf record)
❖ CPU frequency could scaled down by the OS (turbostat, dmesg)
Computer resources: CPU
● CPU is 50 % utilized
❖ code is optimized thus Hyper Threading won’t give a gain
❖ algorithmic limitations or waiting for Storage or Network
Performance tuning ColumnStore
Computer resources: Memory
● Default Linux memory allocator doesn’t reuse mmap-segments
● Tooling: free, vmstat, top:
❖ top shows both Virtual and Resident memory
● And the most important don’t ever use swap on production DBMS systems.
❖ free doesn’t show how much memory is actually available
❖ jemalloc works using madvice though
Performance tuning ColumnStore
Performance tuning ColumnStore
Computer resources: Storage
❖ The application’s read buffer isn’t big enough if O_DIRECT is used or
readahead isn’t set
❖ There could be very short and undetected 100% spikes
● Data at rest compression is important
● Application fully utilizes CPU but Storage is underutilized
● Tooling: iostat, iotop, dstat, sar
Performance tuning ColumnStore
Computer resources: Network
● Data transmission compression is important
● Tooling: iftop, ip, sar, sysstat
Queries and where to find them
● mcsadmin getActiveSQLStatements
mcsadmin> getActiveSQLStatements
getactivesqlstatements Wed Oct 7 08:38:32 2015
Get List of Active SQL Statements
=================================
Start Time Time (hh:mm:ss) Session ID SQL Statement
---------------- ---------------- --------------------
------------------------------------------------------------
Oct 7 08:38:30 00:00:03 73 select c_name,sum(lo_revenue) from customer, lineorder where
lo_custkey = c_custkey and c_custkey = 6 group by c_name
https://mariadb.com/kb/en/library/analyzing-queries-in-columnstore/#getactivesqlstatements
Queries and where to find them
● Query log structure
● debug.log produced by syslog
Feb 5 08:36:02 0bc58638bf11 ExeMgr[26783]: 02.772767 |10|0|0| D 16 CAL0041: Start SQL statement: select * from cs1; |test|
Feb 5 08:36:02 log timestamp
0bc58638bf11 hostname
ExeMgr process name
[26783] PID
02.772767 log timestamp in microseconds
10 session ID
0 id1
0 id2
D syslog facility
16 CS facility ID
CAL0041 log message type
Start SQL statement: select * from cs1; Message body
|test| database name
❖ MariaDB show log also could be used
What does the query do?
● Use calgettrace/calsettrace to get actual execution plan
● CS has its internal query representation
Performance tuning ColumnStore
Extent Elimination
+8M values
SELECT Item, sum(Quantity) FROM Orders
WHERE ShipDate between ‘2016-01-01’ and ‘2016-01-31’
GROUP BY Item
Extent 3:
ShipDate: 2016-09-24 - 2017-01-06
Id OrderId Line Item Quantity Price Supplier ShipDate ShipMode
1 1 1 Laptop 5 1000 Dell
2016-01-1
2 G
... ... ... ... ... ... ... ... ...
8M 2016-03-05
8M+1 2016-03-05
... ... ... ... ... ... ... ... ...
16M 2016-09-23
16M+1 2016-09-24
... ... ... ... ... ... ... ... ...
24M 2017-01-06
ELIMINATED PARTITION
ELIMINATED PARTITION
-8M values
-8M values
Extent 1:
ShipDate: 2016-01-12 - 2016-03-05
Extent 2:
ShipDate: 2016-03-05 - 2016-09-23
IO optimization: read
● Extent partitioning data could be marked valid or invalid
● CS doesn’t consider invalid extents for extent elimination
● Use I_S(columnstore_extents) or editem to look at extents
❖ Becomes valid the next time the extent is scanned
Performance tuning ColumnStore
Data insertion. Who is the fastest ?
But we are going to make them blazingly fast
● Try to avoid DELETE and UPDATE for the same reason
● cpimport (fast, native)
● INSERT..SELECT (uses disabled vtable mode for SELECT)
● mcsimport (works from Windows, uses bulk write API)
● INSERT (Don’t use INSERT. It is slow)
IO optimization: cpimport writes
● set RowsPerBatch to reduce per record cost
● Use ramdisk for TmpDir b/c cpimport saves extra data for rollback
❖ disk path must be used for TempFilePath
HASH GROUP BY operation:
Vic |1.0
Robert |25.2
Vic |999.9
Maria |41.1
Kevin |90.25
Robert |2.01
name | money
1 | 41.1 | Maria
1 | 25.2 | Robert
1 | 2.01 | Robert
hash(name)|sum(money)|name
2 | 999.9 | Vic
2 | 1.0 | Vic
hash(name)|sum(money)|name
3 | 90.25 | Kevin
hash(name)|sum(money)|name
HASH table
Bucket 1
Bucket 2
Bucket 3
Kevin | 90.25
Maria | 41.1
Robert | 27.21
Vic | 1000.9
name | sum(money)
GROUP BY optimization:
● XML settings
● per session infinidb_um_mem_limit
❖ RowAggrThreads
❖ RowAggrBuckets
HASH JOIN operation:
● HASH join settings
● per session select calsetparms("pmmaxmemorysmallside","2048000000");
❖ PmMaxMemorySmallSide
❖ TotalUmMemory
● per session infinidb_um_mem_limit
QoS: long queries VS short queries
● XML settings:
❖ MaxOutstandingRequests (MOR value is 20 by default)
MariaDB AX(Columnstore)
mysqld
ExeMgr
User Module
WriteEngine/ProProc
Columnstore Storage
WriteEngine/PrimProc
Columnstore Storage
Performance Module 1
Performance Module 2
MOR
MOR
THANK YOU!

More Related Content

PDF
Understanding the architecture of MariaDB ColumnStore
PDF
MariaDB ColumnStore
PDF
MariaDB ColumnStore
PPT
Cassandraのしくみ データの読み書き編
PDF
PostgreSQLのgitレポジトリから見える2022年の開発状況(第38回PostgreSQLアンカンファレンス@オンライン 発表資料)
PDF
PostgreSQL: XID周回問題に潜む別の問題
PDF
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
PDF
使いこなそうGUC
Understanding the architecture of MariaDB ColumnStore
MariaDB ColumnStore
MariaDB ColumnStore
Cassandraのしくみ データの読み書き編
PostgreSQLのgitレポジトリから見える2022年の開発状況(第38回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQL: XID周回問題に潜む別の問題
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
使いこなそうGUC

What's hot (20)

PPTX
Intro to Big Data and NoSQL
PDF
Big Data Analytics with MariaDB ColumnStore
PDF
PostgreSQLの運用・監視にまつわるエトセトラ
PPTX
Cassandra
PDF
openCypher: Introducing subqueries
PPTX
LINEのMySQL運用について 修正版
PDF
AWS Black Belt Online Seminar 2017 Amazon Aurora
PDF
Spark SQL Deep Dive @ Melbourne Spark Meetup
PDF
Big Data Analytics with MariaDB ColumnStore
PPTX
オンライン物理バックアップの排他モードと非排他モードについて ~PostgreSQLバージョン15対応版~(第34回PostgreSQLアンカンファレンス...
DOC
AWR reports-Measuring CPU
PDF
Oracle GoldenGate アーキテクチャと基本機能
ODP
Memory management in Linux
PDF
Advanced RAC troubleshooting: Network
PDF
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1
PDF
シンプルでシステマチックな Oracle Database, Exadata 性能分析
PDF
Oracle Active Data Guard 12c: Far Sync Instance, Real-Time Cascade and Other ...
PDF
HDFSのスケーラビリティの限界を突破するためのさまざまな取り組み | Hadoop / Spark Conference Japan 2019 #hc...
PDF
Vacuum徹底解説
PPTX
Hive and HiveQL - Module6
Intro to Big Data and NoSQL
Big Data Analytics with MariaDB ColumnStore
PostgreSQLの運用・監視にまつわるエトセトラ
Cassandra
openCypher: Introducing subqueries
LINEのMySQL運用について 修正版
AWS Black Belt Online Seminar 2017 Amazon Aurora
Spark SQL Deep Dive @ Melbourne Spark Meetup
Big Data Analytics with MariaDB ColumnStore
オンライン物理バックアップの排他モードと非排他モードについて ~PostgreSQLバージョン15対応版~(第34回PostgreSQLアンカンファレンス...
AWR reports-Measuring CPU
Oracle GoldenGate アーキテクチャと基本機能
Memory management in Linux
Advanced RAC troubleshooting: Network
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1
シンプルでシステマチックな Oracle Database, Exadata 性能分析
Oracle Active Data Guard 12c: Far Sync Instance, Real-Time Cascade and Other ...
HDFSのスケーラビリティの限界を突破するためのさまざまな取り組み | Hadoop / Spark Conference Japan 2019 #hc...
Vacuum徹底解説
Hive and HiveQL - Module6
Ad

Similar to Performance tuning ColumnStore (20)

PDF
Sesión técnica: Big Data Analytics con MariaDB ColumnStore
PDF
In-depth session: Big Data Analytics with MariaDB AX
PDF
M|18 Understanding the Architecture of MariaDB ColumnStore
PDF
Big Data Analytics with MariaDB ColumnStore
PDF
Big-Data-Analysen mit MariaDB ColumnStore
PDF
Transactional and Analytics together: MariaDB and ColumnStore
PDF
Percona live-2012-optimizer-tuning
PDF
Delivering fast, powerful and scalable analytics #OPEN18
PDF
MariaDB ColumnStore - LONDON MySQL Meetup
PDF
[db tech showcase OSS 2017] A23: Analytics with MariaDB ColumnStore by MariaD...
PDF
[db tech showcase OSS 2017] A25: Replacing Oracle Database at DBS Bank by Mar...
PDF
Big Data Analytics with MariaDB AX
PDF
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
PDF
04 2017 emea_roadshowmilan_mariadb columnstore
PPTX
7 Database Mistakes YOU Are Making -- Linuxfest Northwest 2019
PPTX
My Database Skills Killed the Server
PDF
What to expect from MariaDB Platform X5, part 2
PPTX
MySQL performance tuning
PDF
Modeling data for scalable, ad hoc analytics
PDF
Scaling MySQL Strategies for Developers
Sesión técnica: Big Data Analytics con MariaDB ColumnStore
In-depth session: Big Data Analytics with MariaDB AX
M|18 Understanding the Architecture of MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStore
Big-Data-Analysen mit MariaDB ColumnStore
Transactional and Analytics together: MariaDB and ColumnStore
Percona live-2012-optimizer-tuning
Delivering fast, powerful and scalable analytics #OPEN18
MariaDB ColumnStore - LONDON MySQL Meetup
[db tech showcase OSS 2017] A23: Analytics with MariaDB ColumnStore by MariaD...
[db tech showcase OSS 2017] A25: Replacing Oracle Database at DBS Bank by Mar...
Big Data Analytics with MariaDB AX
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
04 2017 emea_roadshowmilan_mariadb columnstore
7 Database Mistakes YOU Are Making -- Linuxfest Northwest 2019
My Database Skills Killed the Server
What to expect from MariaDB Platform X5, part 2
MySQL performance tuning
Modeling data for scalable, ad hoc analytics
Scaling MySQL Strategies for Developers
Ad

More from MariaDB plc (20)

PDF
MariaDB Berlin Roadshow Slides - 8 April 2025
PDF
MariaDB München Roadshow - 24 September, 2024
PDF
MariaDB Paris Roadshow - 19 September 2024
PDF
MariaDB Amsterdam Roadshow: 19 September, 2024
PDF
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
PDF
MariaDB Paris Workshop 2023 - Newpharma
PDF
MariaDB Paris Workshop 2023 - Cloud
PDF
MariaDB Paris Workshop 2023 - MariaDB Enterprise
PDF
MariaDB Paris Workshop 2023 - Performance Optimization
PDF
MariaDB Paris Workshop 2023 - MaxScale
PDF
MariaDB Paris Workshop 2023 - novadys presentation
PDF
MariaDB Paris Workshop 2023 - DARVA presentation
PDF
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
PDF
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
PDF
Einführung : MariaDB Tech und Business Update Hamburg 2023
PDF
Hochverfügbarkeitslösungen mit MariaDB
PDF
Die Neuheiten in MariaDB Enterprise Server
PDF
Global Data Replication with Galera for Ansell Guardian®
PDF
Introducing workload analysis
PDF
Under the hood: SkySQL monitoring
MariaDB Berlin Roadshow Slides - 8 April 2025
MariaDB München Roadshow - 24 September, 2024
MariaDB Paris Roadshow - 19 September 2024
MariaDB Amsterdam Roadshow: 19 September, 2024
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
Einführung : MariaDB Tech und Business Update Hamburg 2023
Hochverfügbarkeitslösungen mit MariaDB
Die Neuheiten in MariaDB Enterprise Server
Global Data Replication with Galera for Ansell Guardian®
Introducing workload analysis
Under the hood: SkySQL monitoring

Recently uploaded (20)

PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPTX
Transform Your Business with a Software ERP System
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
Computer Software and OS of computer science of grade 11.pptx
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
top salesforce developer skills in 2025.pdf
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Digital Strategies for Manufacturing Companies
PPTX
Operating system designcfffgfgggggggvggggggggg
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PPTX
assetexplorer- product-overview - presentation
VVF-Customer-Presentation2025-Ver1.9.pptx
Transform Your Business with a Software ERP System
How to Migrate SBCGlobal Email to Yahoo Easily
Computer Software and OS of computer science of grade 11.pptx
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Which alternative to Crystal Reports is best for small or large businesses.pdf
How to Choose the Right IT Partner for Your Business in Malaysia
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Navsoft: AI-Powered Business Solutions & Custom Software Development
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
top salesforce developer skills in 2025.pdf
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PTS Company Brochure 2025 (1).pdf.......
Understanding Forklifts - TECH EHS Solution
Digital Strategies for Manufacturing Companies
Operating system designcfffgfgggggggvggggggggg
CHAPTER 2 - PM Management and IT Context
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
assetexplorer- product-overview - presentation

Performance tuning ColumnStore

  • 2. What will we have today? ● Relevant but rarely used Columnstore.xml settings ● Computer resources overview ● CS monitoring and insights tools ● Query performance tips ...No magic wand certificates though
  • 3. Query Performance Overview ● Does a query run fast enough? ● If not, why? ● Can we speed the query processing up?
  • 4. Row-oriented vs. Column-oriented format ID Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F ID 1 2 3 4 5 Fname Bugs Yosemite Daffy Elmer Witch Lname Bunny Sam Duck Fudd Hazel State NY CA NY ME MA Zip 11217 95389 10013 04578 01970 Phone (718) 938-3235 (209) 375-6572 (212) 227-1810 (207) 882-7323 (978) 744-0991 Age 34 52 35 43 57 Sex M M M M F SELECT Fname FROM Table 1 WHERE State = 'NY' ● Row oriented ○ Rows stored sequentially in a file ○ Scans through every record row by row ● Column oriented ○ Each column is stored in a separate file ○ Scans the only relevant column
  • 5. Data Loading and Extents CSV File Extent 1 Min 1 Max 100 Extent 2 Min 105 Max 200 8 million rows 8 million rows Data loadData Range 1 ~ 200 Rows 16 million New CSV File Data Range 150 ~ 210 Rows 16 million Extent 3 Min 150 Max 165 Extent 4 Min 162 Max 192 8 million rows 8 million rows Data load Second Data Load
  • 6. Extent Elimination +8M values SELECT Item, sum(Quantity) FROM Orders WHERE ShipDate between ‘2016-01-01’ and ‘2016-01-31’ GROUP BY Item Extent 3: ShipDate: 2016-09-24 - 2017-01-06 Id OrderId Line Item Quantity Price Supplier ShipDate ShipMode 1 1 1 Laptop 5 1000 Dell 2016-01-1 2 G ... ... ... ... ... ... ... ... ... 8M 2016-03-05 8M+1 2016-03-05 ... ... ... ... ... ... ... ... ... 16M 2016-09-23 16M+1 2016-09-24 ... ... ... ... ... ... ... ... ... 24M 2017-01-06 ELIMINATED PARTITION ELIMINATED PARTITION -8M values -8M values Extent 1: ShipDate: 2016-01-12 - 2016-03-05 Extent 2: ShipDate: 2016-03-05 - 2016-09-23
  • 7. Data Ingestion ● Load data ordered by the columns you filter most often for maximum IO elimination ● If you want to drop partitions based on a particular column, order by that column first
  • 8. Data Modeling ● Conservative data typing reduces IO, compute, and memory requirements ❖ Short strings (up to char(8) and varchar(7)) are handled internally as integers ● Star-schema optimizations are generally a good idea ● Break down compound fields into individual fields ❖ Trivializes searching for sub-fields ❖ Can allow greater usage of short strings
  • 9. Take Advantage of Push-Down Operations ● Filters ● Aggregates ● Functions & expressions ● Joins
  • 10. What is not Pushed Down ● Having ● Window Functions ● ORDER BY ● LIMIT
  • 11. Common Pitfalls ● It is OLAP, not OLTP ❖ single-row inserts ● Updating columns that upset the import sort order ● Top-level order by clause
  • 12. Troubleshooting Queries that are Still Too Slow ● Given what you know about ColumnStore operation, can the query be improved? ● What does your resource usage look like? Are there bottlenecks?
  • 13. Computer resources & bottlenecks ● CPU ● Storage: SSD, HDD ● Memory ● Network ...and there could be algorithmic bottlenecks
  • 14. Computer resources utilization ● Utilization is a broad metric and gives no details ● < 100% utilization doesn’t mean you can improve the situation ● 100% utilization doesn’t mean you can’t improve the situation
  • 15. Computer resources: CPU ● Use top, htop and friends for CPU utilization ❖ instructions Per Clock rate differs - use HyperThreading (perf stat) ● A CPU core could be 100% utilized: ❖ CPU may be busy waiting for data from cache or RAM (perf record) ❖ CPU frequency could scaled down by the OS (turbostat, dmesg)
  • 16. Computer resources: CPU ● CPU is 50 % utilized ❖ code is optimized thus Hyper Threading won’t give a gain ❖ algorithmic limitations or waiting for Storage or Network
  • 18. Computer resources: Memory ● Default Linux memory allocator doesn’t reuse mmap-segments ● Tooling: free, vmstat, top: ❖ top shows both Virtual and Resident memory ● And the most important don’t ever use swap on production DBMS systems. ❖ free doesn’t show how much memory is actually available ❖ jemalloc works using madvice though
  • 21. Computer resources: Storage ❖ The application’s read buffer isn’t big enough if O_DIRECT is used or readahead isn’t set ❖ There could be very short and undetected 100% spikes ● Data at rest compression is important ● Application fully utilizes CPU but Storage is underutilized ● Tooling: iostat, iotop, dstat, sar
  • 23. Computer resources: Network ● Data transmission compression is important ● Tooling: iftop, ip, sar, sysstat
  • 24. Queries and where to find them ● mcsadmin getActiveSQLStatements mcsadmin> getActiveSQLStatements getactivesqlstatements Wed Oct 7 08:38:32 2015 Get List of Active SQL Statements ================================= Start Time Time (hh:mm:ss) Session ID SQL Statement ---------------- ---------------- -------------------- ------------------------------------------------------------ Oct 7 08:38:30 00:00:03 73 select c_name,sum(lo_revenue) from customer, lineorder where lo_custkey = c_custkey and c_custkey = 6 group by c_name https://mariadb.com/kb/en/library/analyzing-queries-in-columnstore/#getactivesqlstatements
  • 25. Queries and where to find them ● Query log structure ● debug.log produced by syslog Feb 5 08:36:02 0bc58638bf11 ExeMgr[26783]: 02.772767 |10|0|0| D 16 CAL0041: Start SQL statement: select * from cs1; |test| Feb 5 08:36:02 log timestamp 0bc58638bf11 hostname ExeMgr process name [26783] PID 02.772767 log timestamp in microseconds 10 session ID 0 id1 0 id2 D syslog facility 16 CS facility ID CAL0041 log message type Start SQL statement: select * from cs1; Message body |test| database name ❖ MariaDB show log also could be used
  • 26. What does the query do? ● Use calgettrace/calsettrace to get actual execution plan ● CS has its internal query representation
  • 28. Extent Elimination +8M values SELECT Item, sum(Quantity) FROM Orders WHERE ShipDate between ‘2016-01-01’ and ‘2016-01-31’ GROUP BY Item Extent 3: ShipDate: 2016-09-24 - 2017-01-06 Id OrderId Line Item Quantity Price Supplier ShipDate ShipMode 1 1 1 Laptop 5 1000 Dell 2016-01-1 2 G ... ... ... ... ... ... ... ... ... 8M 2016-03-05 8M+1 2016-03-05 ... ... ... ... ... ... ... ... ... 16M 2016-09-23 16M+1 2016-09-24 ... ... ... ... ... ... ... ... ... 24M 2017-01-06 ELIMINATED PARTITION ELIMINATED PARTITION -8M values -8M values Extent 1: ShipDate: 2016-01-12 - 2016-03-05 Extent 2: ShipDate: 2016-03-05 - 2016-09-23
  • 29. IO optimization: read ● Extent partitioning data could be marked valid or invalid ● CS doesn’t consider invalid extents for extent elimination ● Use I_S(columnstore_extents) or editem to look at extents ❖ Becomes valid the next time the extent is scanned
  • 31. Data insertion. Who is the fastest ? But we are going to make them blazingly fast ● Try to avoid DELETE and UPDATE for the same reason ● cpimport (fast, native) ● INSERT..SELECT (uses disabled vtable mode for SELECT) ● mcsimport (works from Windows, uses bulk write API) ● INSERT (Don’t use INSERT. It is slow)
  • 32. IO optimization: cpimport writes ● set RowsPerBatch to reduce per record cost ● Use ramdisk for TmpDir b/c cpimport saves extra data for rollback ❖ disk path must be used for TempFilePath
  • 33. HASH GROUP BY operation: Vic |1.0 Robert |25.2 Vic |999.9 Maria |41.1 Kevin |90.25 Robert |2.01 name | money 1 | 41.1 | Maria 1 | 25.2 | Robert 1 | 2.01 | Robert hash(name)|sum(money)|name 2 | 999.9 | Vic 2 | 1.0 | Vic hash(name)|sum(money)|name 3 | 90.25 | Kevin hash(name)|sum(money)|name HASH table Bucket 1 Bucket 2 Bucket 3 Kevin | 90.25 Maria | 41.1 Robert | 27.21 Vic | 1000.9 name | sum(money)
  • 34. GROUP BY optimization: ● XML settings ● per session infinidb_um_mem_limit ❖ RowAggrThreads ❖ RowAggrBuckets
  • 35. HASH JOIN operation: ● HASH join settings ● per session select calsetparms("pmmaxmemorysmallside","2048000000"); ❖ PmMaxMemorySmallSide ❖ TotalUmMemory ● per session infinidb_um_mem_limit
  • 36. QoS: long queries VS short queries ● XML settings: ❖ MaxOutstandingRequests (MOR value is 20 by default) MariaDB AX(Columnstore) mysqld ExeMgr User Module WriteEngine/ProProc Columnstore Storage WriteEngine/PrimProc Columnstore Storage Performance Module 1 Performance Module 2 MOR MOR