Geek Sync | Tips for Data Warehouses and Other Very Large Databases

Topics
 Click to edit Master text styles
• Second level
• Third level
− Fourth level
• Fifth level
Tips for Data Warehouses
and Other
Very Large DB's
September 19th, 2018
Bert Scalzo, PhD & Oracle ACE
Email: bertscalzo2@gmail.com

Topics
• Second level
• Third level
− Fourth level
• Fifth level
Presenter

Topics
• Second level
• Third level
− Fourth level
• Fifth level
New book Q4-2018…
Provide evidence-based answers that can be measured and relied
upon by your business. Database administrators will be able to
make sound architectural decisions in a fast-changing landscape of
virtualized servers and container-based solutions based on the
empirical method presented in this book for answering “what if”
questions about database performance.
Today’s database administrators face numerous questions such as:
• What if we consolidate databases using multitenant features?
• What if we virtualize database servers as Docker containers?
• What if we deploy the latest in NVMe flash disks to speed up IO
access?
• Do features such as compression, partitioning, and in-memory
OLTP earn back their price?
• What if we move our databases to the cloud?
As an administrator, do you know the answers or even how to test
the assumptions?

Topics
• Second level
• Third level
− Fourth level
• Fifth level
Abstract
Whether on-premise or in the cloud, DBA's are often asked to create and manage optimal database
designs for data warehouses, data lakes, and many other very large databases (VLDBs) using
relational database management systems. These databases will be used for business intelligence,
data mining, and data analytics. They are radically different than traditional online transaction
processing (OLTP) systems.
• So what special design concerns will be faced?
• What database editions and features to rely upon?
• What kind of query execution plans should be sought?
This webcast will cover all pertinent issues, which some may even consider best practices, for such
highly specialized database requirements. While the basic concepts will be universally applicable,
examples will be primarily in Oracle, with some also in MySQL, SQL Server, and PostgreSQL.

Topics
• Second level
• Third level
− Fourth level
• Fifth level
Goal
I hope that everyone, no matter how
experienced, walks away with at
least one useful new tip & trick
I’ll call it a really big success if a few
or more of you walk away hopefully
with many great new tips & tricks

Topics
• Second level
• Third level
− Fourth level
• Fifth level
Ever Growing DATA Demand
• Businesses are addicted to information since they see it
as an edge
• Technology improvements have lowered costs to keep
historical data
• Data mining, data analytics and data science all added
fuel to this fire
• The cloud makes all this quicker and cheaper to deploy …

Topics
• Second level
• Third level
− Fourth level
• Fifth level
What is “big” is growing …
That’s 180 billion terabytes of data!!!

Topics
• Second level
• Third level
− Fourth level
• Fifth level
Overriding Principle

Topics
• Second level
• Third level
− Fourth level
• Fifth level
Partitioning objects
• Partitioning enables you to decompose very large tables and indexes into smaller and more
manageable pieces called partitions.
• Each partition is an independent object with its own name and optionally its own storage
characteristics.

Topics
• Second level
• Third level
− Fourth level
• Fifth level
Compare DB Partitioning
• PostgreSQL
• 9.X offered basic partitioning (lots manual work)
• Master table (no data)
• Child tables (constraints)
• Triggers to redirect inserts
• 10.X declarative partitioning
• 11.X (beta) improved declarative partitioning
• MySQL
• 5.X works with any storage engine
• 8.X works with INNODB only
• SQL Server
• Debuted 2014, gotten better with each release
• 2014 required Enterprise Edition
• Included SQL Server 2016 (13.x) SP1 and 2017
• Oracle
• Debuted 10g, gotten better with each release
• Requires Enterprise Edition + extra fee to add partitioning option

Topics
• Second level
• Third level
− Fourth level
• Fifth level
Horizontal partitioning

Topics
• Second level
• Third level
− Fourth level
• Fifth level
Vertical partitioning

Topics
• Second level
• Third level
− Fourth level
• Fifth level
Partitioning benefits
• Increased availability
• The unavailability of a partition does not entail the unavailability of the object. The query
optimizer automatically removes unreferenced partitions from the query plan so queries
are not affected when the partitions are unavailable.
• Easier administration of schema objects
• A partitioned object has pieces that can be managed either collectively or individually.
DDL statements can manipulate partitions rather than entire tables or indexes. Thus, you
can break up resource-intensive tasks such as rebuilding an index or table. For
example, you can move one table partition at a time. If a problem occurs, then only the
partition move must be redone, not the table move. Also, dropping a partition avoids
executing numerous DELETE statements.
• Reduced contention for shared resources in OLTP systems
• In some OLTP systems, partitions can decrease contention for a shared resource. For
example, DML is distributed over many segments rather than one segment.
• Enhanced query performance in data warehouses
• In a data warehouse, partitioning can speed processing of ad hoc queries. For example,
a sales table containing a million rows can be partitioned by quarter.

Topics
• Second level
• Third level
− Fourth level
• Fifth level
Basic partitioning

Topics
• Second level
• Third level
− Fourth level
• Fifth level
Extended partitioning

Topics
• Second level
• Third level
− Fourth level
• Fifth level
Tiered storage
Magnetic Disk - SATA
Magnetic Disk - SAS
SSD - SATA
SSD - SAS
PCIe - NVMe
Tape or Cloud
Partitioning
permits this

Topics
• Second level
• Third level
− Fourth level
• Fifth level
Tiered Example
Magnetic Disk - SATA
Magnetic Disk - SAS
SSD - SATA
SSD - SAS
PCIe - NVMe
Tape or Cloud
2018 Data
2017 Data
2015-16 Data
2013-14 Data
<= 2012 Data
TEMP
Group by, grouping
functions, order by

Topics
• Second level
• Third level
− Fourth level
• Fifth level
What is an index (conceptually)
Traditional definition:
• Wikipedia: A database index is a data structure that improves the speed of data retrieval
operations on a database table at the cost of additional writes and extra storage space to
maintain the index data structure.
• Oracle Docs: An index is an optional structure, associated with a table or table cluster, that can
sometimes speed data access. Indexes are schema objects that are logically and physically
independent of the data in the objects with which they are associated. Thus, you can drop or
create an index without physically affecting the indexed table.

Topics
• Second level
• Third level
− Fourth level
• Fifth level
What is an index (commonly)
B-tree index:
• Wikipedia: In computer science, a B-tree is a self-balancing tree data structure that keeps data
sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. The
B-tree is a generalization of a binary search tree in that a node can have more than two children.
• Oracle Docs: B-trees, short for balanced trees, are the most common type of database index. A
B-tree index is an ordered list of values divided into ranges. By associating a key with a row or
range of rows, B-trees provide excellent retrieval performance for a wide range of queries,
including exact match and range searches.

Topics
• Second level
• Third level
− Fourth level
• Fifth level
What is an index (commonly)
Fetch row where
col1 = 222
col1 col2 col3 col4
222 X Y Z
TABLE

Topics
• Second level
• Third level
− Fourth level
• Fifth level
B-tree index options
• Unique or not unique
• Ascending or descending
• Single column or multi column (composite)
• Reverse Key (can help with some RAC leaf block contention issues)
• Reverse key index is a type of B-tree index that physically reverses the bytes of each
index key while keeping the column order. For example, if the index key is 20, and if the
two bytes stored for this key in hexadecimal are C1,15 in a standard B-tree index, then a
reverse key index stores the bytes as 15,C1.
• Compressed (repeated key values removed)
• Prefix compression (also known as key compression) to compress portions of the
primary key column values in a B-tree index or an index-organized table. Prefix
compression can greatly reduce the space consumed by the index.
• Starting with Oracle 12c advanced index compression improves on traditional prefix
compression for indexes on heap-organized tables. Unlike prefix compression, which
uses fixed duplicate key elimination for every block, advanced compression uses
adaptive duplicate key elimination on a per-block basis.

Topics
• Second level
• Third level
− Fourth level
• Fifth level
Bitmap indexes
• Database stores a bitmap for each index key.
• A mapping function converts each bit in the bitmap to a rowid.
• In a conventional B-tree index, one index entry points to a single row. In a bitmap index, each
index key stores pointers to multiple rows.
• Bitmap indexes are primarily designed for data warehousing or environments in which queries
reference many columns in an ad hoc fashion.
• Situations that may call for a bitmap index include:
• The indexed columns have low cardinality, that is, the number of distinct values is small
compared to the number of table rows.
• The indexed table is either read-only or not subject to significant modification by DML
statements.

Topics
• Second level
• Third level
− Fourth level
• Fifth level
How bitmap indexes work
Fetch row where
color = green
Mapping
Function

Topics
• Second level
• Third level
− Fourth level
• Fifth level
Bitmap join indexes
• A bitmap join index is a bitmap index
for the join of two or more tables. For
each value in a table column, the
index stores the rowid of the
corresponding row in the indexed
table. In contrast, a standard bitmap
index is created on a single table.
• A bitmap join index is an efficient
means of reducing the volume of data
that must be joined by performing
restrictions in advance.
• Redbrick database (Ralph Kimball)
pioneered this and called it “Star Join”

Topics
• Second level
• Third level
− Fourth level
• Fifth level
How bitmap join indexes work
Fetch row where
Fact.col1 = X
Dim1.col2 = Y
Dim2.col3 = Z
Index stores
rowid for each
table in the join

Topics
• Second level
• Third level
− Fourth level
• Fifth level
Function based indexes
• Function-based indexes are efficient for evaluating statements that contain functions in their
WHERE clauses.
• The database only uses the function-based index when the function is included in a query.
• When the database processes INSERT and UPDATE statements, however, it must still evaluate
the function to process the statement.
• A function-based index is also useful for indexing only specific rows in a table. For example, the
cust_valid column in the sh.customers table has either I or A as a value. To index only the A
rows, you could write a function that returns a null value for any rows other than the A rows.
via index on
computed
column

Topics
• Second level
• Third level
− Fourth level
• Fifth level
Index organized table (IOT)
• An index-organized table is a table stored in a variation of a B-tree index structure.
• Rows are stored in an index defined on the primary key for the table.
• Each index entry in the B-tree also stores the non-key column values.
• You can also specify a separate segment as a row overflow area.
• Thus, the index is the data, and the data is the index.
• If a row overflow area is specified, then the database can divide a row in an index-organized
table into the following parts:
• The index entry: This part contains column values for all the primary key columns, a
physical rowid that points to the overflow part of the row, and optionally a few of the non-
key columns. This part is stored in the index segment.
• The overflow part: This part contains column values for the remaining non-key columns.
This part is stored in the overflow storage area segment.

Topics
• Second level
• Third level
− Fourth level
• Fifth level
How IOTs work
Branch nodes
Leaf nodes
Overflow area

Topics
• Second level
• Third level
− Fourth level
• Fifth level
IOT secondary indexes
• A secondary index is an index on an
index-organized table. In a sense, it is an
index on an index.
• The secondary index is an independent
schema object and is stored separately
from the index-organized table.
• A secondary index on an index-organized
table can be a bitmap index.

Topics
• Second level
• Third level
− Fourth level
• Fifth level
Invisible indexes
• An invisible index is maintained by DML operations, but is not used by default by the optimizer.
• Making an index invisible is an alternative to making it unusable or dropping it.
• Invisible indexes are especially useful for testing the removal of an index before dropping it or
using indexes temporarily without affecting the overall application.
requires
extension

Topics
• Second level
• Third level
− Fourth level
• Fifth level
How invisible indexes work

Topics
• Second level
• Third level
− Fourth level
• Fifth level
Confused yet? There’s more!

Topics
• Second level
• Third level
− Fourth level
• Fifth level
Indexing under partitioning
• Local
• (1) Local partitioned index: the index is partitioned on the same columns, with the same
number of partitions and the same partition bounds as its table.
• (2) Local prefixed index: the partition keys are on the leading edge of the index
definition.
• (3) Local nonprefixed index: the partition keys are not on the leading edge of the
indexed column list and need not be in the list at all.
• Global
• (4) Global partitioned index: that is partitioned independently of the underlying table on
which it is created.

Topics
• Second level
• Third level
− Fourth level
• Fifth level
It’s very complicated

Topics
• Second level
• Third level
− Fourth level
• Fifth level
Use indexes judiciously

Topics
• Second level
• Third level
− Fourth level
• Fifth level
How indexes are used
• B-tree
• Index Unique Scan
• Index Range Scan
• Index Full can
• Index Fast Full Scan
• Index Skip Scan
• Index Join Scan
• Bitmap
• Bitmap Index Single Value
• Bitmap Index Range Scan
• Bitmap Merge
• Bitmap Index Range Scan
Too complex to explain
today, look for a future
webcast on query tuning
– indexes vs. exec plans

Topics
• Second level
• Third level
− Fourth level
• Fifth level
Index usage summary

Topics
• Second level
• Third level
− Fourth level
• Fifth level
Right Choice Greatly Matters

Thank you!
• Bert Scalzo
– bertscalzo2@gmail.com
– community.idera.com/members/bscalzo/blogs
• http://community.idera.com

Geek Sync | Tips for Data Warehouses and Other Very Large Databases

More Related Content

What's hot (16)

Similar to Geek Sync | Tips for Data Warehouses and Other Very Large Databases (20)

More from IDERA Software (20)

Recently uploaded (20)

Geek Sync | Tips for Data Warehouses and Other Very Large Databases