PostgreSQL Perfomance
Tables partitioning vs. Aggregated data tables
Here’s a classic scenario. You work on a project that stores data in a
relational database. The application gets deployed to production
and early on the performance is great, selecting data from the
database is snappy and insert latency goes unnoticed. Over a time
period of days/weeks/months the database starts to get bigger and
queries slow down.
A Database Administrator (DBA) will take a look and see that the database is
tuned. They offer suggestions to add certain indexes, move logging to
separate disk partitions, adjust database engine parameters and verify that
the database is healthy. This will buy you more time and may resolve this
issues to a degree.
At a certain point you realize the data in the database is the bottleneck.

There are various approaches that can help you make your application and
database run faster. Let’s take a look at two of them:
• Table partitioning
• Aggregated data tables
Main idea: you take one massive table (master table) and split it into many
smaller tables – these smaller tables are called partitions or child tables.
Master Table
Also referred to as a Master Partition Table, this table is the template child
tables are created from. This is a normal table, but it doesn’t contain any data
and requires a trigger.
Child Table
These tables inherit their structure (in other words, their Data Definition
Language or DDL for short) from the master table and belong to a single
master table. The child tables contain all of the data. These tables are also
referred to as Table Partitions.
Partition Function
A partition function is a Stored Procedure that determines which child table
should accept a new record. The master table has a trigger which calls a
partition function.
Here’s a summary of what should be done:
1.
Create a master table
2.
Create a partition function
3.
Create a table trigger
Let’s assume that we have a rather large table ( ~ 2 500k rows) containing
reports for different dates.
There are two typical methodologies for routing records to child tables:
•
By Date Values
•
By Fixed Values

The trigger function does the following:
Creates child table by dynamically generated “CREATE TABLE” statement if
the child table does not exist.
Partitions (child tables) are determined by the values in the “date” column.
One partition per calendar month is created.
The name of each child table will be in the format of
“master_table_name_yyyy-mm”
CREATE OR REPLACE FUNCTION partition_function() RETURNS trigger AS
$BODY$
DECLARE
table_master varchar(255) := ‘SOME_LARGE_TABLE';
table_part varchar(255) := ‘';
…
BEGIN
------------------------------------------generate partition name----------------------------------------------------…
table_part := table_master|| '_y' || DATE_PART( 'year', rec_date )::TEXT
|| '_m' || DATE_PART( 'month', rec_date )::TEXT;
-----------------------------------------check if partition already exists--------------------------------------------…
-----------------------------------------if not yet then create new---------------------------------------------------EXECUTE 'CREATE TABLE public.' || quote_ident(table_part) || ' (
CHECK( “RECORD_DATE" >= DATE ' || quote_literal(start_date) || ' AND “RECORD_DATE" <
DATE ' || quote_literal(end_date) || ')) INHERITS ( public.' || quote_ident(table_master) || ')
----------------------------------------create indexes for current partition----------------------------------------EXECUTE 'CREATE INDEX …
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
Now that the Partition Function has been created an Insert Trigger needs to be
added to the Master Table which will call the partition function when new records
are inserted.
CREATE TRIGGER insert_trigger
BEFORE INSERT
ON “SOME_LARGE_TABLE"
FOR EACH ROW
EXECUTE PROCEDURE partition_function();

At this point you can start inserting rows against the Master Table and see the
rows being inserted into the correct child table.
Constraint exclusion is a query optimization technique that improves
performance for partitioned tables
SET constraint_exclusion = on;

The default (and recommended) setting of constraint_exclusion is actually
neither on nor off, but an intermediate setting called partition, which causes the
technique to be applied only to queries that are likely to be working on
partitioned tables. The on setting causes the planner to examine CHECK
constraints in all queries, even simple ones that are unlikely to benefit.
SELECT * FROM “SOME_LARGE_TABLE" WHERE “ID" = '0000e124-e7ff-4859-8d4fa3d7b37b521b' AND “RECORD_DATE" BETWEEN '2013-10-01' AND '2013-10-30';

Without partitioning:

With partitioning:
Benefits:
• Query performance can be improved dramatically in certain situations;
• Bulk loads and deletes can be accomplished by adding or removing
partitions;
• Seldom-used data can be migrated to cheaper and slower storage media.
Caveats:
• Partitioning should be organized so that queries reference as few tables as
possible.
• The partition key column(s) of a row should never change, or at least do not
change enough to require it to move to another partition.
• Constraint exclusion only works when the query's WHERE clause contains
constants.
• All constraints on all partitions of the master table are examined during
constraint exclusion, so large numbers of partitions are likely to increase
query planning time considerably.
Another approach to boost performance is using pre-aggregated data.
One real feature of relational databases is that complex objects are built from
their atomic components at runtime, but this can cause excessive stress if the
same things are being done, over and over.
Without using pre-aggregated data you may see unnecessary repeating largetable full-table scans, as summaries are computed, over and over.
Data aggregation can be used to pre-join tables, presort solution sets, and presummarize complex data information. Because this work is completed in
advance, it gives end users the illusion of instantaneous response time.
You can use a set of ordinary tables with triggers and stored procedures for
these purpose but there is another solution available out of the box –
materialized views (PostgreSQL v. 9.3 natively supports materialized views)

A materialized view is a database object that contains the results of a query
Materialized views in PostgreSQL use the rule system like views do, but
persist the results in a table-like form.
Let’s assume that we have a two tables: ‘machines’ (2 abstract machines) and
‘reports’ containing reports for each machine (~100k rows).
Let’s create materialized view:
CREATE MATERIALIZED VIEW mvw_reports AS
SELECT reports.id, machines.name || ' ' || machines.location AS
machine_name, reports.reports_qty
FROM reports
INNER JOIN machines ON machines.id = reports.machine_id;

And a simple view for comparison:
CREATE VIEW vw_reports AS
SELECT reports.id, machines.name || ' ' || machines.location AS
machine_name, reports.reports_qty
FROM reports
INNER JOIN machines ON machines.id = reports.machine_id;
Executing the same query to simple view:
EXPLAIN ANALYZE SELECT * FROM vw_reports WHERE machines_name = ‘Machine1
Location1';

And for materialized view:
EXPLAIN ANALYZE SELECT * FROM mvw_reports WHERE machines_name = ‘Machine1
Location1';
Another advantage compared with simple views is that we can add indexes to
materialized views like for ordinary tables.
CREATE INDEX idx_report_machines_name ON mvw_reports ( machines_name );

Executing the query once more:
EXPLAIN ANALYZE SELECT * FROM mvw_reports WHERE machines_name =
‘Machine1 Location1';
In order to have actual data in materialized view it should be refreshed after
each DML operation (INSERT, UPDATE, DELETE) on the target tables.
REFRESH MATERIALIZED VIEW mvw_reports;

This can be done using triggers:
CREATE TRIGGER machines_refresh AFTER INSERT OR UPDATE OR DELETE ON
machines FOR EACH STATEMENT EXECUTE PROCEDURE mvw_reports_refresh( );

CREATE TRIGGER reports_refresh AFTER INSERT OR UPDATE OR DELETE ON
reports FOR EACH STATEMENT EXECUTE PROCEDURE mvw_reports_refresh ( );
Benefits:
Query performance can be improved dramatically in situations when there are
relatively few data modifications compared to the queries being performed,
and the queries are very complicated and heavy-weight.
Caveats:
• Materialized views contain a duplicate of data from base tables;
• Depending on the complexity of the underlying query for each MV, and the
amount of data involved, the computation required for refreshing may be
very expensive, and frequent refreshing of MVs may impose an
unacceptable workload on the database server.
Table partitioning and aggregated data tables can help a lot. But there is no
ideal solution that always works. Both approaches have their own pluses and
minuses. It all depends on certain situation and circumstances. Hopefully
presented overview gave few tips on when each technique can be useful.

Any questions?

More Related Content

PDF
Power BI with Essbase in the Oracle Cloud
PDF
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
PDF
Introduction to Azure Data Lake
PDF
Site Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
PDF
Apache Spark Overview
PPTX
MicroServices at Netflix - challenges of scale
PDF
What is MLOps
PDF
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
Power BI with Essbase in the Oracle Cloud
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Introduction to Azure Data Lake
Site Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
Apache Spark Overview
MicroServices at Netflix - challenges of scale
What is MLOps
How to Use a Semantic Layer to Deliver Actionable Insights at Scale

What's hot (20)

PPTX
Azure data platform overview
PPTX
Azure DevOps
PDF
DMBOK and Data Governance
PDF
Building Lakehouses on Delta Lake with SQL Analytics Primer
PPTX
Cloud Storage in Azure, AWS and Google Cloud
PDF
Serverless with Google Cloud
PDF
Databricks Delta Lake and Its Benefits
PDF
Change Data Feed in Delta
PDF
Introduction SQL Analytics on Lakehouse Architecture
PPTX
Data Lake Overview
PPTX
Azure DataBricks for Data Engineering by Eugene Polonichko
PDF
What you need to know about Generative AI and Data Management?
PPTX
Introduction to Data Engineering
PDF
Getting Started with Databricks SQL Analytics
PDF
Data Governance Best Practices
PPTX
Building the Data Lake with Azure Data Factory and Data Lake Analytics
PDF
Microservices = Death of the Enterprise Service Bus (ESB)?
PPTX
Data Lakehouse Symposium | Day 4
PDF
Hive Bucketing in Apache Spark with Tejas Patil
PDF
Understanding InfluxDB Basics: Tags, Fields and Measurements
Azure data platform overview
Azure DevOps
DMBOK and Data Governance
Building Lakehouses on Delta Lake with SQL Analytics Primer
Cloud Storage in Azure, AWS and Google Cloud
Serverless with Google Cloud
Databricks Delta Lake and Its Benefits
Change Data Feed in Delta
Introduction SQL Analytics on Lakehouse Architecture
Data Lake Overview
Azure DataBricks for Data Engineering by Eugene Polonichko
What you need to know about Generative AI and Data Management?
Introduction to Data Engineering
Getting Started with Databricks SQL Analytics
Data Governance Best Practices
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Microservices = Death of the Enterprise Service Bus (ESB)?
Data Lakehouse Symposium | Day 4
Hive Bucketing in Apache Spark with Tejas Patil
Understanding InfluxDB Basics: Tags, Fields and Measurements
Ad

Viewers also liked (20)

PDF
5 Steps to PostgreSQL Performance
PDF
Best Practices for Becoming an Exceptional Postgres DBA
 
PDF
Postgres in Production - Best Practices 2014
 
PPTX
The Magic of Tuning in PostgreSQL
PDF
Linux tuning to improve PostgreSQL performance
PDF
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
ODP
PostgreSQL Administration for System Administrators
PDF
Deep dive into PostgreSQL statistics.
PDF
Postgresql database administration volume 1
PDF
PostgreSQL performance improvements in 9.5 and 9.6
PDF
Secure PostgreSQL deployment
PDF
Best Practices for a Complete Postgres Enterprise Architecture Setup
 
PDF
PostgreSQL 9.6 Performance-Scalability Improvements
PDF
Mastering PostgreSQL Administration
 
PDF
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
PDF
Postgres Scaling Opportunities and Options
 
PDF
An Introduction To PostgreSQL Triggers
PDF
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
PDF
Converting from MySQL to PostgreSQL
PPTX
Mysql vs postgresql
5 Steps to PostgreSQL Performance
Best Practices for Becoming an Exceptional Postgres DBA
 
Postgres in Production - Best Practices 2014
 
The Magic of Tuning in PostgreSQL
Linux tuning to improve PostgreSQL performance
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
PostgreSQL Administration for System Administrators
Deep dive into PostgreSQL statistics.
Postgresql database administration volume 1
PostgreSQL performance improvements in 9.5 and 9.6
Secure PostgreSQL deployment
Best Practices for a Complete Postgres Enterprise Architecture Setup
 
PostgreSQL 9.6 Performance-Scalability Improvements
Mastering PostgreSQL Administration
 
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
Postgres Scaling Opportunities and Options
 
An Introduction To PostgreSQL Triggers
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
Converting from MySQL to PostgreSQL
Mysql vs postgresql
Ad

Similar to PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables (20)

PPTX
Geek Sync | Tips for Data Warehouses and Other Very Large Databases
PPT
SQL Server 2008 Performance Enhancements
PPTX
Tech-Spark: Scaling Databases
PDF
Data Organisation: Table Partitioning in PostgreSQL
PDF
PostgreSQL Table Partitioning / Sharding
PPTX
Implementing Tables and Views.pptx
PPTX
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
PPT
The thinking persons guide to data warehouse design
PDF
Diseño fisico particiones_3
PPTX
Large scale sql server best practices
PPTX
SQL Server 2012 Best Practices
PPT
Optimizing Data Accessin Sq Lserver2005
PDF
Practical Partitioning in Production with Postgres
 
PPT
Five Tuning Tips For Your Datawarehouse
PDF
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...
PPTX
DOODB_LAB.pptx
PDF
Uncovering SQL Server query problems with execution plans - Tony Davis
PDF
Why PostgreSQL for Analytics Infrastructure (DW)?
PPTX
Sql server lesson7
PPTX
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Geek Sync | Tips for Data Warehouses and Other Very Large Databases
SQL Server 2008 Performance Enhancements
Tech-Spark: Scaling Databases
Data Organisation: Table Partitioning in PostgreSQL
PostgreSQL Table Partitioning / Sharding
Implementing Tables and Views.pptx
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
The thinking persons guide to data warehouse design
Diseño fisico particiones_3
Large scale sql server best practices
SQL Server 2012 Best Practices
Optimizing Data Accessin Sq Lserver2005
Practical Partitioning in Production with Postgres
 
Five Tuning Tips For Your Datawarehouse
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...
DOODB_LAB.pptx
Uncovering SQL Server query problems with execution plans - Tony Davis
Why PostgreSQL for Analytics Infrastructure (DW)?
Sql server lesson7
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)

More from Sperasoft (20)

PDF
особенности работы с Locomotion в Unreal Engine 4
PDF
концепт и архитектура геймплея в Creach: The Depleted World
PPTX
Опыт разработки VR игры для UE4
PPTX
Организация работы с UE4 в команде до 20 человек
PPTX
Gameplay Tags
PDF
Data Driven Gameplay in UE4
PPTX
Code and Memory Optimisation Tricks
PPTX
The theory of relational databases
PPTX
Automated layout testing using Galen Framework
PDF
Sperasoft talks: Android Security Threats
PDF
Sperasoft Talks: RxJava Functional Reactive Programming on Android
PDF
Sperasoft‬ talks j point 2015
PDF
Effective Мeetings
PDF
Unreal Engine 4 Introduction
PDF
JIRA Development
PDF
Introduction to Elasticsearch
PDF
MOBILE DEVELOPMENT with HTML, CSS and JS
PDF
Quick Intro Into Kanban
PDF
ECMAScript 6 Review
PDF
Console Development in 15 minutes
особенности работы с Locomotion в Unreal Engine 4
концепт и архитектура геймплея в Creach: The Depleted World
Опыт разработки VR игры для UE4
Организация работы с UE4 в команде до 20 человек
Gameplay Tags
Data Driven Gameplay in UE4
Code and Memory Optimisation Tricks
The theory of relational databases
Automated layout testing using Galen Framework
Sperasoft talks: Android Security Threats
Sperasoft Talks: RxJava Functional Reactive Programming on Android
Sperasoft‬ talks j point 2015
Effective Мeetings
Unreal Engine 4 Introduction
JIRA Development
Introduction to Elasticsearch
MOBILE DEVELOPMENT with HTML, CSS and JS
Quick Intro Into Kanban
ECMAScript 6 Review
Console Development in 15 minutes

Recently uploaded (20)

PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
PDF
A review of recent deep learning applications in wood surface defect identifi...
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
STKI Israel Market Study 2025 version august
PDF
Comparative analysis of machine learning models for fake news detection in so...
PPTX
Microsoft Excel 365/2024 Beginner's training
PPTX
Modernising the Digital Integration Hub
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
Architecture types and enterprise applications.pdf
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PPTX
2018-HIPAA-Renewal-Training for executives
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
Developing a website for English-speaking practice to English as a foreign la...
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
A review of recent deep learning applications in wood surface defect identifi...
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
Benefits of Physical activity for teenagers.pptx
STKI Israel Market Study 2025 version august
Comparative analysis of machine learning models for fake news detection in so...
Microsoft Excel 365/2024 Beginner's training
Modernising the Digital Integration Hub
The influence of sentiment analysis in enhancing early warning system model f...
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Final SEM Unit 1 for mit wpu at pune .pptx
Architecture types and enterprise applications.pdf
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
Improvisation in detection of pomegranate leaf disease using transfer learni...
2018-HIPAA-Renewal-Training for executives
Convolutional neural network based encoder-decoder for efficient real-time ob...
Taming the Chaos: How to Turn Unstructured Data into Decisions

PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables

  • 1. PostgreSQL Perfomance Tables partitioning vs. Aggregated data tables
  • 2. Here’s a classic scenario. You work on a project that stores data in a relational database. The application gets deployed to production and early on the performance is great, selecting data from the database is snappy and insert latency goes unnoticed. Over a time period of days/weeks/months the database starts to get bigger and queries slow down.
  • 3. A Database Administrator (DBA) will take a look and see that the database is tuned. They offer suggestions to add certain indexes, move logging to separate disk partitions, adjust database engine parameters and verify that the database is healthy. This will buy you more time and may resolve this issues to a degree. At a certain point you realize the data in the database is the bottleneck. There are various approaches that can help you make your application and database run faster. Let’s take a look at two of them: • Table partitioning • Aggregated data tables
  • 4. Main idea: you take one massive table (master table) and split it into many smaller tables – these smaller tables are called partitions or child tables.
  • 5. Master Table Also referred to as a Master Partition Table, this table is the template child tables are created from. This is a normal table, but it doesn’t contain any data and requires a trigger. Child Table These tables inherit their structure (in other words, their Data Definition Language or DDL for short) from the master table and belong to a single master table. The child tables contain all of the data. These tables are also referred to as Table Partitions. Partition Function A partition function is a Stored Procedure that determines which child table should accept a new record. The master table has a trigger which calls a partition function.
  • 6. Here’s a summary of what should be done: 1. Create a master table 2. Create a partition function 3. Create a table trigger Let’s assume that we have a rather large table ( ~ 2 500k rows) containing reports for different dates.
  • 7. There are two typical methodologies for routing records to child tables: • By Date Values • By Fixed Values The trigger function does the following: Creates child table by dynamically generated “CREATE TABLE” statement if the child table does not exist. Partitions (child tables) are determined by the values in the “date” column. One partition per calendar month is created. The name of each child table will be in the format of “master_table_name_yyyy-mm”
  • 8. CREATE OR REPLACE FUNCTION partition_function() RETURNS trigger AS $BODY$ DECLARE table_master varchar(255) := ‘SOME_LARGE_TABLE'; table_part varchar(255) := ‘'; … BEGIN ------------------------------------------generate partition name----------------------------------------------------… table_part := table_master|| '_y' || DATE_PART( 'year', rec_date )::TEXT || '_m' || DATE_PART( 'month', rec_date )::TEXT; -----------------------------------------check if partition already exists--------------------------------------------… -----------------------------------------if not yet then create new---------------------------------------------------EXECUTE 'CREATE TABLE public.' || quote_ident(table_part) || ' ( CHECK( “RECORD_DATE" >= DATE ' || quote_literal(start_date) || ' AND “RECORD_DATE" < DATE ' || quote_literal(end_date) || ')) INHERITS ( public.' || quote_ident(table_master) || ') ----------------------------------------create indexes for current partition----------------------------------------EXECUTE 'CREATE INDEX … END; $BODY$ LANGUAGE plpgsql VOLATILE COST 100;
  • 9. Now that the Partition Function has been created an Insert Trigger needs to be added to the Master Table which will call the partition function when new records are inserted. CREATE TRIGGER insert_trigger BEFORE INSERT ON “SOME_LARGE_TABLE" FOR EACH ROW EXECUTE PROCEDURE partition_function(); At this point you can start inserting rows against the Master Table and see the rows being inserted into the correct child table.
  • 10. Constraint exclusion is a query optimization technique that improves performance for partitioned tables SET constraint_exclusion = on; The default (and recommended) setting of constraint_exclusion is actually neither on nor off, but an intermediate setting called partition, which causes the technique to be applied only to queries that are likely to be working on partitioned tables. The on setting causes the planner to examine CHECK constraints in all queries, even simple ones that are unlikely to benefit.
  • 11. SELECT * FROM “SOME_LARGE_TABLE" WHERE “ID" = '0000e124-e7ff-4859-8d4fa3d7b37b521b' AND “RECORD_DATE" BETWEEN '2013-10-01' AND '2013-10-30'; Without partitioning: With partitioning:
  • 12. Benefits: • Query performance can be improved dramatically in certain situations; • Bulk loads and deletes can be accomplished by adding or removing partitions; • Seldom-used data can be migrated to cheaper and slower storage media. Caveats: • Partitioning should be organized so that queries reference as few tables as possible. • The partition key column(s) of a row should never change, or at least do not change enough to require it to move to another partition. • Constraint exclusion only works when the query's WHERE clause contains constants. • All constraints on all partitions of the master table are examined during constraint exclusion, so large numbers of partitions are likely to increase query planning time considerably.
  • 13. Another approach to boost performance is using pre-aggregated data. One real feature of relational databases is that complex objects are built from their atomic components at runtime, but this can cause excessive stress if the same things are being done, over and over. Without using pre-aggregated data you may see unnecessary repeating largetable full-table scans, as summaries are computed, over and over. Data aggregation can be used to pre-join tables, presort solution sets, and presummarize complex data information. Because this work is completed in advance, it gives end users the illusion of instantaneous response time.
  • 14. You can use a set of ordinary tables with triggers and stored procedures for these purpose but there is another solution available out of the box – materialized views (PostgreSQL v. 9.3 natively supports materialized views) A materialized view is a database object that contains the results of a query Materialized views in PostgreSQL use the rule system like views do, but persist the results in a table-like form. Let’s assume that we have a two tables: ‘machines’ (2 abstract machines) and ‘reports’ containing reports for each machine (~100k rows).
  • 15. Let’s create materialized view: CREATE MATERIALIZED VIEW mvw_reports AS SELECT reports.id, machines.name || ' ' || machines.location AS machine_name, reports.reports_qty FROM reports INNER JOIN machines ON machines.id = reports.machine_id; And a simple view for comparison: CREATE VIEW vw_reports AS SELECT reports.id, machines.name || ' ' || machines.location AS machine_name, reports.reports_qty FROM reports INNER JOIN machines ON machines.id = reports.machine_id;
  • 16. Executing the same query to simple view: EXPLAIN ANALYZE SELECT * FROM vw_reports WHERE machines_name = ‘Machine1 Location1'; And for materialized view: EXPLAIN ANALYZE SELECT * FROM mvw_reports WHERE machines_name = ‘Machine1 Location1';
  • 17. Another advantage compared with simple views is that we can add indexes to materialized views like for ordinary tables. CREATE INDEX idx_report_machines_name ON mvw_reports ( machines_name ); Executing the query once more: EXPLAIN ANALYZE SELECT * FROM mvw_reports WHERE machines_name = ‘Machine1 Location1';
  • 18. In order to have actual data in materialized view it should be refreshed after each DML operation (INSERT, UPDATE, DELETE) on the target tables. REFRESH MATERIALIZED VIEW mvw_reports; This can be done using triggers: CREATE TRIGGER machines_refresh AFTER INSERT OR UPDATE OR DELETE ON machines FOR EACH STATEMENT EXECUTE PROCEDURE mvw_reports_refresh( ); CREATE TRIGGER reports_refresh AFTER INSERT OR UPDATE OR DELETE ON reports FOR EACH STATEMENT EXECUTE PROCEDURE mvw_reports_refresh ( );
  • 19. Benefits: Query performance can be improved dramatically in situations when there are relatively few data modifications compared to the queries being performed, and the queries are very complicated and heavy-weight. Caveats: • Materialized views contain a duplicate of data from base tables; • Depending on the complexity of the underlying query for each MV, and the amount of data involved, the computation required for refreshing may be very expensive, and frequent refreshing of MVs may impose an unacceptable workload on the database server.
  • 20. Table partitioning and aggregated data tables can help a lot. But there is no ideal solution that always works. Both approaches have their own pluses and minuses. It all depends on certain situation and circumstances. Hopefully presented overview gave few tips on when each technique can be useful. Any questions?