SlideShare a Scribd company logo
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
Tips for Data Warehouses
and Other
Very Large DB's
September 19th, 2018
Bert Scalzo, PhD & Oracle ACE
Email: bertscalzo2@gmail.com
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
Presenter
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
New book Q4-2018…
Provide evidence-based answers that can be measured and relied
upon by your business. Database administrators will be able to
make sound architectural decisions in a fast-changing landscape of
virtualized servers and container-based solutions based on the
empirical method presented in this book for answering “what if”
questions about database performance.
Today’s database administrators face numerous questions such as:
• What if we consolidate databases using multitenant features?
• What if we virtualize database servers as Docker containers?
• What if we deploy the latest in NVMe flash disks to speed up IO
access?
• Do features such as compression, partitioning, and in-memory
OLTP earn back their price?
• What if we move our databases to the cloud?
As an administrator, do you know the answers or even how to test
the assumptions?
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
Abstract
Whether on-premise or in the cloud, DBA's are often asked to create and manage optimal database
designs for data warehouses, data lakes, and many other very large databases (VLDBs) using
relational database management systems. These databases will be used for business intelligence,
data mining, and data analytics. They are radically different than traditional online transaction
processing (OLTP) systems.
• So what special design concerns will be faced?
• What database editions and features to rely upon?
• What kind of query execution plans should be sought?
This webcast will cover all pertinent issues, which some may even consider best practices, for such
highly specialized database requirements. While the basic concepts will be universally applicable,
examples will be primarily in Oracle, with some also in MySQL, SQL Server, and PostgreSQL.
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
Goal
I hope that everyone, no matter how
experienced, walks away with at
least one useful new tip & trick
I’ll call it a really big success if a few
or more of you walk away hopefully
with many great new tips & tricks
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
Ever Growing DATA Demand
• Businesses are addicted to information since they see it
as an edge
• Technology improvements have lowered costs to keep
historical data
• Data mining, data analytics and data science all added
fuel to this fire
• The cloud makes all this quicker and cheaper to deploy …
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
What is “big” is growing …
That’s 180 billion terabytes of data!!!
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
Overriding Principle
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
Partitioning objects
• Partitioning enables you to decompose very large tables and indexes into smaller and more
manageable pieces called partitions.
• Each partition is an independent object with its own name and optionally its own storage
characteristics.
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
Compare DB Partitioning
• PostgreSQL
• 9.X offered basic partitioning (lots manual work)
• Master table (no data)
• Child tables (constraints)
• Triggers to redirect inserts
• 10.X declarative partitioning
• 11.X (beta) improved declarative partitioning
• MySQL
• 5.X works with any storage engine
• 8.X works with INNODB only
• SQL Server
• Debuted 2014, gotten better with each release
• 2014 required Enterprise Edition
• Included SQL Server 2016 (13.x) SP1 and 2017
• Oracle
• Debuted 10g, gotten better with each release
• Requires Enterprise Edition + extra fee to add partitioning option
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
Horizontal partitioning
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
Vertical partitioning
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
Partitioning benefits
• Increased availability
• The unavailability of a partition does not entail the unavailability of the object. The query
optimizer automatically removes unreferenced partitions from the query plan so queries
are not affected when the partitions are unavailable.
• Easier administration of schema objects
• A partitioned object has pieces that can be managed either collectively or individually.
DDL statements can manipulate partitions rather than entire tables or indexes. Thus, you
can break up resource-intensive tasks such as rebuilding an index or table. For
example, you can move one table partition at a time. If a problem occurs, then only the
partition move must be redone, not the table move. Also, dropping a partition avoids
executing numerous DELETE statements.
• Reduced contention for shared resources in OLTP systems
• In some OLTP systems, partitions can decrease contention for a shared resource. For
example, DML is distributed over many segments rather than one segment.
• Enhanced query performance in data warehouses
• In a data warehouse, partitioning can speed processing of ad hoc queries. For example,
a sales table containing a million rows can be partitioned by quarter.
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
Basic partitioning
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
Extended partitioning
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
Tiered storage
Magnetic Disk - SATA
Magnetic Disk - SAS
SSD - SATA
SSD - SAS
PCIe - NVMe
Tape or Cloud
Partitioning
permits this
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
Tiered Example
Magnetic Disk - SATA
Magnetic Disk - SAS
SSD - SATA
SSD - SAS
PCIe - NVMe
Tape or Cloud
2018 Data
2017 Data
2015-16 Data
2013-14 Data
<= 2012 Data
TEMP
Group by, grouping
functions, order by
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
What is an index (conceptually)
Traditional definition:
• Wikipedia: A database index is a data structure that improves the speed of data retrieval
operations on a database table at the cost of additional writes and extra storage space to
maintain the index data structure.
• Oracle Docs: An index is an optional structure, associated with a table or table cluster, that can
sometimes speed data access. Indexes are schema objects that are logically and physically
independent of the data in the objects with which they are associated. Thus, you can drop or
create an index without physically affecting the indexed table.
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
What is an index (commonly)
B-tree index:
• Wikipedia: In computer science, a B-tree is a self-balancing tree data structure that keeps data
sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. The
B-tree is a generalization of a binary search tree in that a node can have more than two children.
• Oracle Docs: B-trees, short for balanced trees, are the most common type of database index. A
B-tree index is an ordered list of values divided into ranges. By associating a key with a row or
range of rows, B-trees provide excellent retrieval performance for a wide range of queries,
including exact match and range searches.
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
What is an index (commonly)
Fetch row where
col1 = 222
col1 col2 col3 col4
222 X Y Z
TABLE
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
B-tree index options
• Unique or not unique
• Ascending or descending
• Single column or multi column (composite)
• Reverse Key (can help with some RAC leaf block contention issues)
• Reverse key index is a type of B-tree index that physically reverses the bytes of each
index key while keeping the column order. For example, if the index key is 20, and if the
two bytes stored for this key in hexadecimal are C1,15 in a standard B-tree index, then a
reverse key index stores the bytes as 15,C1.
• Compressed (repeated key values removed)
• Prefix compression (also known as key compression) to compress portions of the
primary key column values in a B-tree index or an index-organized table. Prefix
compression can greatly reduce the space consumed by the index.
• Starting with Oracle 12c advanced index compression improves on traditional prefix
compression for indexes on heap-organized tables. Unlike prefix compression, which
uses fixed duplicate key elimination for every block, advanced compression uses
adaptive duplicate key elimination on a per-block basis.
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
Bitmap indexes
• Database stores a bitmap for each index key.
• A mapping function converts each bit in the bitmap to a rowid.
• In a conventional B-tree index, one index entry points to a single row. In a bitmap index, each
index key stores pointers to multiple rows.
• Bitmap indexes are primarily designed for data warehousing or environments in which queries
reference many columns in an ad hoc fashion.
• Situations that may call for a bitmap index include:
• The indexed columns have low cardinality, that is, the number of distinct values is small
compared to the number of table rows.
• The indexed table is either read-only or not subject to significant modification by DML
statements.
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
How bitmap indexes work
Fetch row where
color = green
Mapping
Function
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
Bitmap join indexes
• A bitmap join index is a bitmap index
for the join of two or more tables. For
each value in a table column, the
index stores the rowid of the
corresponding row in the indexed
table. In contrast, a standard bitmap
index is created on a single table.
• A bitmap join index is an efficient
means of reducing the volume of data
that must be joined by performing
restrictions in advance.
• Redbrick database (Ralph Kimball)
pioneered this and called it “Star Join”
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
How bitmap join indexes work
Fetch row where
Fact.col1 = X
Dim1.col2 = Y
Dim2.col3 = Z
Index stores
rowid for each
table in the join
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
Function based indexes
• Function-based indexes are efficient for evaluating statements that contain functions in their
WHERE clauses.
• The database only uses the function-based index when the function is included in a query.
• When the database processes INSERT and UPDATE statements, however, it must still evaluate
the function to process the statement.
• A function-based index is also useful for indexing only specific rows in a table. For example, the
cust_valid column in the sh.customers table has either I or A as a value. To index only the A
rows, you could write a function that returns a null value for any rows other than the A rows.
via index on
computed
column
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
Index organized table (IOT)
• An index-organized table is a table stored in a variation of a B-tree index structure.
• Rows are stored in an index defined on the primary key for the table.
• Each index entry in the B-tree also stores the non-key column values.
• You can also specify a separate segment as a row overflow area.
• Thus, the index is the data, and the data is the index.
• If a row overflow area is specified, then the database can divide a row in an index-organized
table into the following parts:
• The index entry: This part contains column values for all the primary key columns, a
physical rowid that points to the overflow part of the row, and optionally a few of the non-
key columns. This part is stored in the index segment.
• The overflow part: This part contains column values for the remaining non-key columns.
This part is stored in the overflow storage area segment.
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
How IOTs work
Branch nodes
Leaf nodes
Overflow area
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
IOT secondary indexes
• A secondary index is an index on an
index-organized table. In a sense, it is an
index on an index.
• The secondary index is an independent
schema object and is stored separately
from the index-organized table.
• A secondary index on an index-organized
table can be a bitmap index.
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
Invisible indexes
• An invisible index is maintained by DML operations, but is not used by default by the optimizer.
• Making an index invisible is an alternative to making it unusable or dropping it.
• Invisible indexes are especially useful for testing the removal of an index before dropping it or
using indexes temporarily without affecting the overall application.
requires
extension
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
How invisible indexes work
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
Confused yet? There’s more!
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
Indexing under partitioning
• Local
• (1) Local partitioned index: the index is partitioned on the same columns, with the same
number of partitions and the same partition bounds as its table.
• (2) Local prefixed index: the partition keys are on the leading edge of the index
definition.
• (3) Local nonprefixed index: the partition keys are not on the leading edge of the
indexed column list and need not be in the list at all.
• Global
• (4) Global partitioned index: that is partitioned independently of the underlying table on
which it is created.
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
It’s very complicated
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
Use indexes judiciously
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
How indexes are used
• B-tree
• Index Unique Scan
• Index Range Scan
• Index Full can
• Index Fast Full Scan
• Index Skip Scan
• Index Join Scan
• Bitmap
• Bitmap Index Single Value
• Bitmap Index Range Scan
• Bitmap Merge
• Bitmap Index Range Scan
Too complex to explain
today, look for a future
webcast on query tuning
– indexes vs. exec plans
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
Index usage summary
Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
Right Choice Greatly Matters
Thank you!
• Bert Scalzo
– bertscalzo2@gmail.com
– community.idera.com/members/bscalzo/blogs
• http://community.idera.com

More Related Content

PPTX
Geek Sync | The Universe of Oracle Indexing
PPTX
Storing and managing your content in share point spsnyc
PPT
Main MeMory Data Base
PDF
Sloupcové uložení dat a použití in-memory technologií u řešení Exadata
PPTX
Oceangraphic data formats
PPTX
Comparative study of modern databases
PPTX
Chapter 5 design of keyvalue databses from nosql for mere mortals
Geek Sync | The Universe of Oracle Indexing
Storing and managing your content in share point spsnyc
Main MeMory Data Base
Sloupcové uložení dat a použití in-memory technologií u řešení Exadata
Oceangraphic data formats
Comparative study of modern databases
Chapter 5 design of keyvalue databses from nosql for mere mortals

What's hot (16)

PDF
Comparison between rdbms and nosql
PPTX
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
PDF
L17 Data Source Layer
PPT
358 33 powerpoint-slides_16-files-their-organization_chapter-16
PPTX
Data warehouse - Nivetha Durganathan
PPTX
L15 Data Source Layer
PPTX
Accessing ims data from your java environments
PDF
Introduction to ETL and Data Integration
PPTX
No SQL- The Future Of Data Storage
PDF
Introduction to Databases
PDF
NoSQL-Database-Concepts
PPTX
Entity framework
PPTX
Chapter 7(documnet databse termininology) no sql for mere mortals
PPTX
PPTX
Data storage format in hdfs
Comparison between rdbms and nosql
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
L17 Data Source Layer
358 33 powerpoint-slides_16-files-their-organization_chapter-16
Data warehouse - Nivetha Durganathan
L15 Data Source Layer
Accessing ims data from your java environments
Introduction to ETL and Data Integration
No SQL- The Future Of Data Storage
Introduction to Databases
NoSQL-Database-Concepts
Entity framework
Chapter 7(documnet databse termininology) no sql for mere mortals
Data storage format in hdfs
Ad

Similar to Geek Sync | Tips for Data Warehouses and Other Very Large Databases (20)

PPT
The thinking persons guide to data warehouse design
PDF
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...
PDF
Best Practices – Extreme Performance with Data Warehousing on Oracle Database
PPTX
Advance Sqlite3
PPT
Five Tuning Tips For Your Datawarehouse
PPT
Tunning overview
PPTX
Relational Database Management System
PDF
Indexes overview
PDF
Indexing Strategies for Oracle Databases - Beyond the Create Index Statement
PDF
Emerging database landscape july 2011
PDF
PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PPT
9223301.ppt
PPTX
Getting to know oracle database objects iot, mviews, clusters and more…
PPTX
Performance By Design
PDF
Big Data Analytics with MariaDB AX
PDF
MySQL Partitioning 5.6
PDF
Types of Databases
PPT
Making MySQL Great For Business Intelligence
PPTX
Advance sqlite3
PDF
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
The thinking persons guide to data warehouse design
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...
Best Practices – Extreme Performance with Data Warehousing on Oracle Database
Advance Sqlite3
Five Tuning Tips For Your Datawarehouse
Tunning overview
Relational Database Management System
Indexes overview
Indexing Strategies for Oracle Databases - Beyond the Create Index Statement
Emerging database landscape july 2011
PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
9223301.ppt
Getting to know oracle database objects iot, mviews, clusters and more…
Performance By Design
Big Data Analytics with MariaDB AX
MySQL Partitioning 5.6
Types of Databases
Making MySQL Great For Business Intelligence
Advance sqlite3
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
Ad

More from IDERA Software (20)

PPTX
The role of the database administrator (DBA) in 2020: Changes, challenges, an...
PPTX
Problems and solutions for migrating databases to the cloud
PPTX
Public cloud uses and limitations
PPTX
Optimize the performance, cost, and value of databases.pptx
PPTX
Monitor cloud database with SQL Diagnostic Manager for SQL Server
PPTX
Database administrators (dbas) face increasing pressure to monitor databases
PPTX
Six tips for cutting sql server licensing costs
PDF
Idera live 2021: The Power of Abstraction by Steve Hoberman
PDF
Idera live 2021: Why Data Lakes are Critical for AI, ML, and IoT By Brian Flug
PDF
Idera live 2021: Will Data Vault add Value to Your Data Warehouse? 3 Signs th...
PDF
Idera live 2021: Managing Digital Transformation on a Budget by Bert Scalzo
PDF
Idera live 2021: Keynote Presentation The Future of Data is The Data Cloud b...
PDF
Idera live 2021: Managing Databases in the Cloud - the First Step, a Succes...
PDF
Idera live 2021: Database Auditing - on-Premises and in the Cloud by Craig M...
PDF
Idera live 2021: Performance Tuning Azure SQL Database by Monica Rathbun
PPTX
Geek Sync | How to Be the DBA When You Don't Have a DBA - Eric Cobb | IDERA
PPTX
How Users of a Performance Monitoring Tool Can Benefit from an Inventory Mana...
PPTX
Benefits of Third Party Tools for MySQL | IDERA
PPTX
Achieve More with Less Resources | IDERA
PPTX
Benefits of SQL Server 2017 and 2019 | IDERA
The role of the database administrator (DBA) in 2020: Changes, challenges, an...
Problems and solutions for migrating databases to the cloud
Public cloud uses and limitations
Optimize the performance, cost, and value of databases.pptx
Monitor cloud database with SQL Diagnostic Manager for SQL Server
Database administrators (dbas) face increasing pressure to monitor databases
Six tips for cutting sql server licensing costs
Idera live 2021: The Power of Abstraction by Steve Hoberman
Idera live 2021: Why Data Lakes are Critical for AI, ML, and IoT By Brian Flug
Idera live 2021: Will Data Vault add Value to Your Data Warehouse? 3 Signs th...
Idera live 2021: Managing Digital Transformation on a Budget by Bert Scalzo
Idera live 2021: Keynote Presentation The Future of Data is The Data Cloud b...
Idera live 2021: Managing Databases in the Cloud - the First Step, a Succes...
Idera live 2021: Database Auditing - on-Premises and in the Cloud by Craig M...
Idera live 2021: Performance Tuning Azure SQL Database by Monica Rathbun
Geek Sync | How to Be the DBA When You Don't Have a DBA - Eric Cobb | IDERA
How Users of a Performance Monitoring Tool Can Benefit from an Inventory Mana...
Benefits of Third Party Tools for MySQL | IDERA
Achieve More with Less Resources | IDERA
Benefits of SQL Server 2017 and 2019 | IDERA

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
A Presentation on Artificial Intelligence
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PPT
Teaching material agriculture food technology
PDF
Mushroom cultivation and it's methods.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
August Patch Tuesday
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Approach and Philosophy of On baking technology
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
A comparative study of natural language inference in Swahili using monolingua...
Heart disease approach using modified random forest and particle swarm optimi...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Spectral efficient network and resource selection model in 5G networks
A Presentation on Artificial Intelligence
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Teaching material agriculture food technology
Mushroom cultivation and it's methods.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
August Patch Tuesday
A comparative analysis of optical character recognition models for extracting...
NewMind AI Weekly Chronicles - August'25-Week II
Approach and Philosophy of On baking technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf

Geek Sync | Tips for Data Warehouses and Other Very Large Databases

  • 1. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level Tips for Data Warehouses and Other Very Large DB's September 19th, 2018 Bert Scalzo, PhD & Oracle ACE Email: [email protected]
  • 2. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level Presenter
  • 3. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level New book Q4-2018… Provide evidence-based answers that can be measured and relied upon by your business. Database administrators will be able to make sound architectural decisions in a fast-changing landscape of virtualized servers and container-based solutions based on the empirical method presented in this book for answering “what if” questions about database performance. Today’s database administrators face numerous questions such as: • What if we consolidate databases using multitenant features? • What if we virtualize database servers as Docker containers? • What if we deploy the latest in NVMe flash disks to speed up IO access? • Do features such as compression, partitioning, and in-memory OLTP earn back their price? • What if we move our databases to the cloud? As an administrator, do you know the answers or even how to test the assumptions?
  • 4. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level Abstract Whether on-premise or in the cloud, DBA's are often asked to create and manage optimal database designs for data warehouses, data lakes, and many other very large databases (VLDBs) using relational database management systems. These databases will be used for business intelligence, data mining, and data analytics. They are radically different than traditional online transaction processing (OLTP) systems. • So what special design concerns will be faced? • What database editions and features to rely upon? • What kind of query execution plans should be sought? This webcast will cover all pertinent issues, which some may even consider best practices, for such highly specialized database requirements. While the basic concepts will be universally applicable, examples will be primarily in Oracle, with some also in MySQL, SQL Server, and PostgreSQL.
  • 5. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level Goal I hope that everyone, no matter how experienced, walks away with at least one useful new tip & trick I’ll call it a really big success if a few or more of you walk away hopefully with many great new tips & tricks
  • 6. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level Ever Growing DATA Demand • Businesses are addicted to information since they see it as an edge • Technology improvements have lowered costs to keep historical data • Data mining, data analytics and data science all added fuel to this fire • The cloud makes all this quicker and cheaper to deploy …
  • 7. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level What is “big” is growing … That’s 180 billion terabytes of data!!!
  • 8. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level Overriding Principle
  • 9. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level Partitioning objects • Partitioning enables you to decompose very large tables and indexes into smaller and more manageable pieces called partitions. • Each partition is an independent object with its own name and optionally its own storage characteristics.
  • 10. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level Compare DB Partitioning • PostgreSQL • 9.X offered basic partitioning (lots manual work) • Master table (no data) • Child tables (constraints) • Triggers to redirect inserts • 10.X declarative partitioning • 11.X (beta) improved declarative partitioning • MySQL • 5.X works with any storage engine • 8.X works with INNODB only • SQL Server • Debuted 2014, gotten better with each release • 2014 required Enterprise Edition • Included SQL Server 2016 (13.x) SP1 and 2017 • Oracle • Debuted 10g, gotten better with each release • Requires Enterprise Edition + extra fee to add partitioning option
  • 11. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level Horizontal partitioning
  • 12. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level Vertical partitioning
  • 13. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level Partitioning benefits • Increased availability • The unavailability of a partition does not entail the unavailability of the object. The query optimizer automatically removes unreferenced partitions from the query plan so queries are not affected when the partitions are unavailable. • Easier administration of schema objects • A partitioned object has pieces that can be managed either collectively or individually. DDL statements can manipulate partitions rather than entire tables or indexes. Thus, you can break up resource-intensive tasks such as rebuilding an index or table. For example, you can move one table partition at a time. If a problem occurs, then only the partition move must be redone, not the table move. Also, dropping a partition avoids executing numerous DELETE statements. • Reduced contention for shared resources in OLTP systems • In some OLTP systems, partitions can decrease contention for a shared resource. For example, DML is distributed over many segments rather than one segment. • Enhanced query performance in data warehouses • In a data warehouse, partitioning can speed processing of ad hoc queries. For example, a sales table containing a million rows can be partitioned by quarter.
  • 14. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level Basic partitioning
  • 15. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level Extended partitioning
  • 16. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level Tiered storage Magnetic Disk - SATA Magnetic Disk - SAS SSD - SATA SSD - SAS PCIe - NVMe Tape or Cloud Partitioning permits this
  • 17. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level Tiered Example Magnetic Disk - SATA Magnetic Disk - SAS SSD - SATA SSD - SAS PCIe - NVMe Tape or Cloud 2018 Data 2017 Data 2015-16 Data 2013-14 Data <= 2012 Data TEMP Group by, grouping functions, order by
  • 18. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level What is an index (conceptually) Traditional definition: • Wikipedia: A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and extra storage space to maintain the index data structure. • Oracle Docs: An index is an optional structure, associated with a table or table cluster, that can sometimes speed data access. Indexes are schema objects that are logically and physically independent of the data in the objects with which they are associated. Thus, you can drop or create an index without physically affecting the indexed table.
  • 19. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level What is an index (commonly) B-tree index: • Wikipedia: In computer science, a B-tree is a self-balancing tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree is a generalization of a binary search tree in that a node can have more than two children. • Oracle Docs: B-trees, short for balanced trees, are the most common type of database index. A B-tree index is an ordered list of values divided into ranges. By associating a key with a row or range of rows, B-trees provide excellent retrieval performance for a wide range of queries, including exact match and range searches.
  • 20. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level What is an index (commonly) Fetch row where col1 = 222 col1 col2 col3 col4 222 X Y Z TABLE
  • 21. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level B-tree index options • Unique or not unique • Ascending or descending • Single column or multi column (composite) • Reverse Key (can help with some RAC leaf block contention issues) • Reverse key index is a type of B-tree index that physically reverses the bytes of each index key while keeping the column order. For example, if the index key is 20, and if the two bytes stored for this key in hexadecimal are C1,15 in a standard B-tree index, then a reverse key index stores the bytes as 15,C1. • Compressed (repeated key values removed) • Prefix compression (also known as key compression) to compress portions of the primary key column values in a B-tree index or an index-organized table. Prefix compression can greatly reduce the space consumed by the index. • Starting with Oracle 12c advanced index compression improves on traditional prefix compression for indexes on heap-organized tables. Unlike prefix compression, which uses fixed duplicate key elimination for every block, advanced compression uses adaptive duplicate key elimination on a per-block basis.
  • 22. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level Bitmap indexes • Database stores a bitmap for each index key. • A mapping function converts each bit in the bitmap to a rowid. • In a conventional B-tree index, one index entry points to a single row. In a bitmap index, each index key stores pointers to multiple rows. • Bitmap indexes are primarily designed for data warehousing or environments in which queries reference many columns in an ad hoc fashion. • Situations that may call for a bitmap index include: • The indexed columns have low cardinality, that is, the number of distinct values is small compared to the number of table rows. • The indexed table is either read-only or not subject to significant modification by DML statements.
  • 23. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level How bitmap indexes work Fetch row where color = green Mapping Function
  • 24. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level Bitmap join indexes • A bitmap join index is a bitmap index for the join of two or more tables. For each value in a table column, the index stores the rowid of the corresponding row in the indexed table. In contrast, a standard bitmap index is created on a single table. • A bitmap join index is an efficient means of reducing the volume of data that must be joined by performing restrictions in advance. • Redbrick database (Ralph Kimball) pioneered this and called it “Star Join”
  • 25. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level How bitmap join indexes work Fetch row where Fact.col1 = X Dim1.col2 = Y Dim2.col3 = Z Index stores rowid for each table in the join
  • 26. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level Function based indexes • Function-based indexes are efficient for evaluating statements that contain functions in their WHERE clauses. • The database only uses the function-based index when the function is included in a query. • When the database processes INSERT and UPDATE statements, however, it must still evaluate the function to process the statement. • A function-based index is also useful for indexing only specific rows in a table. For example, the cust_valid column in the sh.customers table has either I or A as a value. To index only the A rows, you could write a function that returns a null value for any rows other than the A rows. via index on computed column
  • 27. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level Index organized table (IOT) • An index-organized table is a table stored in a variation of a B-tree index structure. • Rows are stored in an index defined on the primary key for the table. • Each index entry in the B-tree also stores the non-key column values. • You can also specify a separate segment as a row overflow area. • Thus, the index is the data, and the data is the index. • If a row overflow area is specified, then the database can divide a row in an index-organized table into the following parts: • The index entry: This part contains column values for all the primary key columns, a physical rowid that points to the overflow part of the row, and optionally a few of the non- key columns. This part is stored in the index segment. • The overflow part: This part contains column values for the remaining non-key columns. This part is stored in the overflow storage area segment.
  • 28. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level How IOTs work Branch nodes Leaf nodes Overflow area
  • 29. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level IOT secondary indexes • A secondary index is an index on an index-organized table. In a sense, it is an index on an index. • The secondary index is an independent schema object and is stored separately from the index-organized table. • A secondary index on an index-organized table can be a bitmap index.
  • 30. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level Invisible indexes • An invisible index is maintained by DML operations, but is not used by default by the optimizer. • Making an index invisible is an alternative to making it unusable or dropping it. • Invisible indexes are especially useful for testing the removal of an index before dropping it or using indexes temporarily without affecting the overall application. requires extension
  • 31. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level How invisible indexes work
  • 32. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level Confused yet? There’s more!
  • 33. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level Indexing under partitioning • Local • (1) Local partitioned index: the index is partitioned on the same columns, with the same number of partitions and the same partition bounds as its table. • (2) Local prefixed index: the partition keys are on the leading edge of the index definition. • (3) Local nonprefixed index: the partition keys are not on the leading edge of the indexed column list and need not be in the list at all. • Global • (4) Global partitioned index: that is partitioned independently of the underlying table on which it is created.
  • 34. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level It’s very complicated
  • 35. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level Use indexes judiciously
  • 36. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level How indexes are used • B-tree • Index Unique Scan • Index Range Scan • Index Full can • Index Fast Full Scan • Index Skip Scan • Index Join Scan • Bitmap • Bitmap Index Single Value • Bitmap Index Range Scan • Bitmap Merge • Bitmap Index Range Scan Too complex to explain today, look for a future webcast on query tuning – indexes vs. exec plans
  • 37. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level Index usage summary
  • 38. Topics  Click to edit Master text styles • Second level • Third level − Fourth level • Fifth level Right Choice Greatly Matters
  • 39. Thank you! • Bert Scalzo – [email protected] – community.idera.com/members/bscalzo/blogs • http://community.idera.com