Taming ReorderBufferWrite - Boost Logical Decoding in PostgresPerformance bottlenecks in Postgres logical replication or Change Data Capture (CDC) stream can be subtle, but one specific wait event, ReorderBufferWrite, often points directly at how your application interacts with the database during these processes. Let's unpack this wait-event and see how your application's workload patterns can
This version requires PostgreSQL 12 - 18beta1, GEOS 3.8 or higher, and Proj 6.1+. To take advantage of all features, GEOS 3.12+ is needed. To take advantage of all SFCGAL features, SFCGAL 2.1.0+ is needed.
The PostGIS Team is pleased to release PostGIS 3.5.3.
This version requires PostgreSQL 12 - 18beta1, GEOS 3.8 or higher, and Proj 6.1+. To take advantage of all features, GEOS 3.12+ is needed. SFCGAL 1.4+ is needed to enable postgis_sfcgal support. To take advantage of all SFCGAL features, SFCGAL 1.5+ is needed.
Backfilling data into a TimescaleDB hypertable in production can be very tricky, especially when automated processes like compression policies are involved. From past experience, we have seen that if backfill operations aren’t handled properly, they can interfere with these automated tasks, sometimes causing them to stop working altogether.
This blog covers a safer and more reliable approach to backfilling hypertables, along with best practices to prevent disruptions to compression and other background processes.
What is a Backfill Operation?
Backfilling means adding old or missing data into the database table after some time has already passed.
Imagine you are collecting temperature readings every hour, but your system was down for a day and didn’t save any data. Later, you get that missing data from the local storage of the device or cloud storage, and want to put it back in the right hypertable, which is called backfilling.
In TimescaleDB, this is common with time-series data, but it needs to be done carefully. That’s because TimescaleDB might already be doing things in the background, like compressing old data to save space. If we are not careful, backfilling can mess up these automatic tasks.
With volunteer help from: Andreas Froede, Christoph Moench-Tegeder, Cornelia Biacsics, Devrim Gündüz, Dirk C. Aumueller, Ellyne Phneah, Floor Drees, Franck Pachot, Jan Wieremjewicz, Jim Jones, Johannes Paul, Julian Markwort, Mathias Franz, Matthis Gördel, Michael Banck, Oleksii Kliukin, Pavlo Golub, Priyanka Chatterjee, Tim Westermann.
The week before this year's PGConf.DE, I attended the Debian MiniDebConf in Hamburg. Adding Berlin to the itinerary made for a busy two weeks on the road. Still, it gave me time to prep for my PGConf talk while catching up on Debian-related discussions.
Conference Kickoff & Venue
We hit the road to Berlin the day before the conference. My colleague Bernd Helmle drove, and after a relaxed six-hour ride, we arrived. The venue was familiar — the same as PGConf.EU 2022 — a hotel with a large conference floor built around a sprawling atrium. While rooms were somewhat dispersed, the layout helped accommodate the crowd of 340 attendees comfortably.
Andreas "ads" Scherbaum kicked off the event with characteristic German punctuality: “We are in Germany, let’s start on time!”
Day 1: Talk Highlights
Practical Lessons & Query Life Cycles
First up, I sat in on Bernd Patolla's session on migrating from Oracle to PostgreSQL. Honestly, I was distracted, as I was still polishing slides for my own talk.
I then partially tuned into Sergey Dudoladov’s session on the life cycle of a query in PostgreSQL — again, nerves and last-minute edits dominated.
However, I fully engaged with András Váczi’s practical insights into running PostgreSQL on Windows. From backup headaches to tool compatibility, it was a reality check on cross-platform database management.
Read Andras Vaczi's talk on running PostgreSQL on Windows here.
Modern VACUUM: My Talk
In my own session, Modern VACUUM, I explored the evolution of PostgreSQL’s VACUUM and autovacuum processes. The talk stemmed from internal CYBERTEC consulting team discussions: VACUUM has changed a lot, and I needed to catch up and share that knowledge.
I’m pleased to welcome Mankirat Singh to the Postgres community as a 2025 Google Summer of Code contributor. Mankirat will be developing an ABI compliance checker and reporting system to help identify and prevent unintentional ABI changes in future minor Postgres releases. This follows on the heels of the addition of ABI and API guidance in Postgres 18, as well as the ABI-breaking Postgres 17.1 release. What timing!
Please follow Mankirat’s blog as he develops the project this summer, under the mentorship of myself and Pavlo Golub. It should also soon be on Planet PostgreSQL. We’ve also set up the #gsoc2025-abi-compliance-checker channel on the community Slack for ad-hoc discussion. Join us!
Physical streaming replication in PostgreSQL allows you to maintain a live copy of your database on a standby server, which continuously receives updates from the primary server’s WAL (Write-Ahead Log). This standby (or hot standby) can handle read-only queries and be quickly promoted to primary in case of failover, providing high availability and disaster recovery.
In this guide, I will walk through provisioning a primary PostgreSQL 16 server and a standby server on Linux, configuring them for streaming replication, and verifying that everything works. I assume you are an experienced engineer familiar with Linux, but new to PostgreSQL replication, so I will keep it friendly and straightforward.
Figure: Real-time data streaming from a primary PostgreSQL server (left) to a standby server (right). The standby constantly applies WAL records received from the primary over a network connection, keeping an up-to-date copy of the database ready for failover.
Step 1: Prepare Two Linux Servers and Install PostgreSQL 16
Before diving into PostgreSQL settings, set up two Linux servers (virtual or physical). One will act as the primary database server, and the other as the standby (read replica). For a smooth replication setup, both servers should be as similar as possible in OS, hardware, and PostgreSQL version. In particular, ensure the following prerequisites:
PostgreSQL 16 is installed on both servers via the official PostgreSQL repositories. Both servers must run the same major PostgreSQL version and architecture (mixing different versions won’t work for physical replication). If you haven’t installed PostgreSQL yet, do so now (e.g., on Ubuntu: sudo apt install postgresql-16, or on RHEL/CentOS: use the PostgreSQL Yum repository). Make sure the PostgreSQL service is running on the primary server.
Network connectivity: The standby must be able to reach the primary on the PostgreSQL port (default 5432). If the servers are in a cloud environment like AWS EC2, configure the security group or firewall
PGConf.DE 2025, the 9th Annual PostgreSQL Conference Germany, was held on May 8–9, 2025, at the Marriott Hotel near Potsdamer Platz in Berlin. The event interconnected PostgreSQL enthusiasts, developers, DBAs, and industry sponsors for two days of fascinating talks across four parallel tracks. It was the biggest event so far, with 347 attendees. The whole conference was very well organized, and therefore special thanks are due to all the organizers—in particular Andreas Scherbaum, the main organizer—for their efforts and hard work.
Our company, credativ GmbH, being independent again, participated as a gold sponsor. The credativ CTO Alexander Wirt, Head of Sales & Marketing Peter Dreuw and team leader of Database team Tobias Kauder, were available for attendees at the credativ booth. Many thanks to our team colleague Sascha Spettmann for delivering all the stands and billboards to the conference and back again.
In total, we held four talks at the conference. Michael Banck, technical leader of our database team, presented the German-language talk “PostgreSQL Performance Tuning.” He provided a deep and comprehensive overview of the most important performance-tuning parameters in PostgreSQL and explained how they influence the database’s behavior. His talk attracted a large audience and was very well received.
I had an absolutely unique opportunity to present three different talks in the English track. In my regular talk “PostgreSQL Connections Memory Usage: How Much, Why and When,” I presented the results of my research and tests on PostgreSQL connections’ memory usage. After explaining the most important aspects of Linux memory management and measurements of memory usage reported by standard commands, I detailed PostgreSQL connection memory usage during query execution based on numbers reported in smaps files. I intend to publish detailed blog posts about my findings soon. My other talk, “Building a Data Lakehouse with PostgreSQL,” was originally c
This week saw the routine quarterly round of PostgreSQL minor version updates , so now is your chance to upgrade. Note that PostgreSQL 13 is now officially EOL , so now is your chance to start planning upgrades.
PostgreSQL 18 changes this week
As expected, the PostgreSQL beta1 release is now available, and the PostgreSQL 18 documentation is now available, together with the release notes . Note that as the REL_18_STABLE branch has not yet been created, the PostgreSQL 18 and devel documentation are currently identical.
PostgreSQL 18 articles PostgreSQL 18 Beta Preview – Export or Amend Statistics with Ease (2025-05-10) - Deepak Mahto Waiting for Postgres 18: Accelerating Disk Reads with Asynchronous I/O (2025-05-07) - Lukas Fittl / pganalyze Waiting for PostgreSQL 18 - Add function to get memory context stats for processes (2025-05-05) - Hubert 'depesz' Lubaczewski discusses pg_get_process_memory_contexts()
We’re now in the “feature freeze” phase of Postgres 18 development. That means no new features will get in - only bugfixes and cleanups of already committed changes. The goal is to test and stabilize the code before a release. PG 18 beta1 was released a couple days ago, so it’s a perfect time to do some testing and benchmarking.
One of the fundamental changes in PG 18 is going to be support for asynchronous I/O. And with beta1 out, it’s the right time to run your tests and benchmarks to test this new feature. Both for correctness and regression.
PostgreSQL 18 beta has been released, and it’s the perfect time to start exploring the new features we can expect in the General Availability (GA) release. One feature that particularly caught my attention relates to improvements in statistics collection and usage.
Here’s an excerpt from the official PostgreSQL release notes:
Add functions to modify per-relation and per-column optimizer statistics (Corey Huinker)
Add pg_dump, pg_dumpall, and pg_restore options --statistics-only, --no-statistics, --no-data, and --no-schema (Corey Huinker, Jeff Davis)
One of the key ingredients in performance analysis is understanding the underlying statistics of tables, columns, and indexes. Often, we encounter cases where queries behave differently in production compared to Pre-Prod or UAT environments. A common reason for this discrepancy is the difference in data distribution and statistics. How many times have we wished for a way to replicate production-like statistics in lower environments—without the overhead of copying data and running manual ANALYZE?
With PostgreSQL 18, we can look forward to features that allow us to export, import, or even modify statistics—making it easier to simulate production behavior in non-production environments without needing actual data loads.
Dump Statistics.
pg_dump --statistics-only --table=public.skewed_data_int
Reset Table, Index or Materialize View Statspg_restore_relation_statsReset Attributes level statisticspg_restore_attribute_stats
The following is a sample output from pg_dump that includes statistics. This can also be used within the same instance—or in another environment—to overwrite existing statistics and influence execution plans.
FOR KEY SHARE optimization and the SLRU TrapOptimizing database concurrency is a constant balancing act. We often tweak locking strategies in PostgreSQL, aiming to allow more simultaneous operations without compromising data integrity. A common scenario involves shifting from stricter row-level locks to more lenient ones. But sometimes, what seems like a straightforward optimization can
Henrietta Dombrovskaya organized PGDay Chicago 2025, with help from Devrim Gündüz, Mark Wong, Pat Wright and Carlos Aranibar. Speakers: Ryan Booz, Chelsea Dole, BAOVOLA Marie Anna, Paul Whalen, Henrietta Dombrovskaya, Gabriele Quaresima, Melih Mutlu, Keiko Oda, Shaun Thomas, Gülçin Yıldırım Jelínek, David George Andrew Pitts, Teresa Lopes, Jay Miller, Jimmy Zelinskie and Doğaç Eldenk.
Physical streaming replication in PostgreSQL allows you to maintain a live copy of your database on a standby server, which continuously receives updates from the primary server’s WAL (Write-Ahead Log). This standby (or hot standby) can handle read-only queries and be quickly promoted to primary in case of failover, providing high availability and disaster recovery.
In this guide, I will walk through provisioning a primary PostgreSQL 16 server and a standby server on Linux, configuring them for streaming replication, and verifying that everything works. I assume you are an experienced engineer familiar with Linux, but new to PostgreSQL replication, so I will keep it friendly and straightforward.
Figure: Real-time data streaming from a primary PostgreSQL server (left) to a standby server (right). The standby constantly applies WAL records received from the primary over a network connection, keeping an up-to-date copy of the database ready for failover.
Step 1: Prepare Two Linux Servers and Install PostgreSQL 16
Before diving into PostgreSQL settings, set up two Linux servers (virtual or physical). One will act as the primary database server, and the other as the standby (read replica). For a smooth replication setup, both servers should be as similar as possible in OS, hardware, and PostgreSQL version. In particular, ensure the following prerequisites:
My recommended methodology for performance improvement of PostgreSQL starts with query optimization. The second step is architectural improvements, part of which is the partitioning of large tables.
Partitioning in PostgreSQL is one of those advanced features that can be a powerful performance booster. If your PostgreSQL tables are becoming very large and sluggish, partitioning might be the cure.
The Big Table Problem
Large tables tend to grow uncontrollably, especially in OLTP or time-series workloads. As millions or billions of rows accumulate, you begin to notice:
Slow queries due to full table scans or massive indexes.
Heavy I/O usage, especially when indexes cannot fit in memory.
Bloated memory during operations like sorting or joining.
Increased maintenance cost, with longer VACUUM, ANALYZE, and REINDEX times.
Hard-to-manage retention policies, as purging old rows becomes expensive.
These problems are amplified in cloud-hosted databases, where every IOPS, GB, or CPU upgrade increases cost.
Jenn and I headed to Bellingham April 26 & 27th for LinuxFest Northwest, a 100% volunteer-run, free event. This is always a fun conference and there were lots of familiar faces this year! Some stats:Over 100 booth visitorsJust wanted to say how much they love Postgres: 18Asked about migrating from MySQL to Postgres: 1“We love […]
Earlier this year, I had the incredible opportunity to present at PGConf India, where I delved into the intricacies of Write-Ahead Logging (WAL) in PostgreSQL. My presentation aimed to demystify this crucial database feature that ensures data integrity and enhances performance.
Bridging Java and C combines the strengths of both languages. A C application may rely on Java for modern libraries, cloud APIs, or UI and web capabilities, while a Java app might need C for low-level system access or performance-critical tasks. Sometimes, there’s simply no alternative—certain features only exist in one language. While modern languages like C++ and Go offer both high- and low-level control, many systems aren’t written in them. For existing C or Java codebases, bridging is often the most practical way to extend functionality without a full rewrite.
In my case, the goal was to build a C-based PostgreSQL extension called SynchDB that integrates with the Java-based Debezium Embedded library to enable heterogeneous database replication into PostgreSQL. Debezium already provides mature connectors for databases like MySQL, SQL Server, and Oracle, so rather than reinventing the wheel in C, I chose to bridge the two runtimes using JNI. This approach allows PostgreSQL to consume change data from other systems in real time. However, maintaining both C and Java components within the PostgreSQL extension framework introduces unique challenges—such as cross-language memory management, threading compatibility, signal handling, and debugging across runtime boundaries. Let’s explore some of those next.
SynchDB – A Heterogeneous Database Replication Tool for PostgreSQL
This project serves as a practical case study for the complexities of bridging Java and C, highlighting the technical challenges and design decisions involved in maintaining two runtimes under PostgreSQL’s extension framework. The basic architecture diagram is shown as below where the yellow box represents the Java space (also known as a Java Virtual Machine (JVM)), the blue box represents PostgreSQL extension (C space) and the orange box represents the PostgreSQL core.
The working principles of SynchDB extension starts by instantiating a JVM and running a p
With the Postgres 18 Beta 1 release this week a multi-year effort, and significant architectural shift in Postgres is taking shape: Asynchronous I/O (AIO). These capabilities are still under active development, but they represent a fundamental change in how Postgres handles I/O, offering the potential for significant performance gains, particularly in cloud environments where latency is often the bottleneck. Why asynchronous I/O matters How Postgres 17’s read streams paved the way New io_method…
In this blog, we're continuing to explore the power of multi-master replication (MMR) with the pgEdge distributed Postgres platform and its open source extension, Spock. In the first Part 1 of this blog topic, we discussed different replication methods and deployment models for PostgreSQL replication. The blog also discusses the pros and cons of MMR replication, and how pgEdge Distributed Postgres uses the Spock extension to perform conflict management to ensure data integrity while implementing multi-master replication.https://docs.google.com/document/d/1td8x6h_n9RvNRRm4AVKqVcl67Vc6AwNJdCPN6_lP9ZY/edit?pli=1&tab=t.0In this blog we'll focus on conflict management. Conflict-free delta-apply columnsare a distinguishing feature of the pgEdge Spock extension that provides a definitive way to apply data updates in the correct order, preventing data conflicts and facilitating efficient and accurate replication of incremental changes and aggregate values. Effective conflict avoidance tooling is essential to maintain the integrity of your data and ensure smooth operation in a MMR environment.
Understanding Conflict-Free-Delta-Apply Columns
To recap the issues we discussed in the first blog: a conflict arises in an MMR cluster when the same data is updated or inserted by concurrent connections on multiple distributed nodes. Unlike a single master replication (SMR) cluster, where the master node accepts transactions and supporting nodes answer requests, all nodes in a distributed MMR cluster are tasked with handling both and operations for improved performance and efficiency.The improved performance and efficiency of MMR comes with caveats; foresight and planning are your best defense against data integrity issues. For example, the following scenarios can cause a data conflict in a distributed MMR cluster:The traditional (and most often used) method of managing conflict resolution is last update wins, where the most recent change overwrites an earlier one. By itself, this approach can lead to data inconsistencies[...]
Before we delve into the main subject of this blog, it is essential to understand the benefits of PostgreSQL replication, and the difference between single-master replication (SMR) and multi-master replication (MMR). In every modern business application, the database is becoming a critical part of the architecture and the demand for making the database performant and highly available is growing tremendously.
Planning Ahead for Better Performance
Our goal when designing a system for high performance is to make the database more efficient when handling an application request - this ensures that the database is not becoming a business bottleneck. If your database resides on a single host, the resources of the system that is hosting the database can be easily exhausted; having a system that supports scaling the database so it can more effectively respond to the application's heavy load.With pgEdge Distributed Postgres and the power of PostgreSQL, you can perform both horizontal and vertical scaling:The technique of replicating data across multiple PostgreSQL databases that are running on multiple servers can also be considered horizontal scaling. The data is not distributed, but database changes are replicated to each cluster node so the application load can be divided across multiple machines to achieve better performance.Reliability and high-availability are also crucial for a powerful and responsive system:
Reliability means that the database is able to respond to user/application requests at all times with consistency and without any server interruption.
High-availability is also a critical consideration that ensures that database operations are not interrupted and the database downtime is minimized.
Statistically, downtime per year reflects the ability of your database and application to handle failures and outages without user downtime. Often, downtime per year is negotiated into a service level agreement (SLA) for applications that require high-availability; this clause specifies the cumula[...]
In the PostgreSQL world, as well as in order database systems, data types play a crucial role in ensuring optimal performance, efficiency, as well as semantics. Moreover, some data types are inherently easier and faster to index than others. Many people are not aware of the fact that indexes indeed make a difference, so let us take a look and see how long it takes to index the very same data using different
types.
Creating some sample data
To show which differences a data type makes, we first have to create some sample table. In this case, it contains 5 data types that we want to inspect to understand how they behave:
Finally, we use our good old friend (the generate_series function), which has served me very well over the years:
blog=# INSERT INTO t_demo
SELECT x, x, x, x, x FROM (
SELECT (random()*10000000)::int4 AS x
FROM generate_series(1, 50000000)
) AS y;
INSERT 0 50000000
What this does is generate 50 million random rows and put them into my table. Note that all column entries are identical to ensure fairness for our index creation.
The last Extension Ecosystem Mini-Summit is upon us. How did that happen?
Join use for a virtual conference session featuring Gabriele Bartolini, who will be discussing Extension Management in CNPG. I’m psyched for this one, as the PostgresSQL community has contributed quite a lot to improving extensions management in CloudNativePG in the past year, some of which we covered in previously. If you miss it, the video, slides, and transcript will appear here soon.
Though it may be a week or two to get the transcripts done, considering that PGConf.dev is next week, and featuring the Extension Ecosystem Summit on Tuesday, 13 May in Montreál, CA. Hope to see you there; be sure to say “hi!”
The PostgreSQL release is slowly taking shape, with the first draft of release notes now available. Corrections and additions can be sent via the pgsql-hackers thread " PG 18 release notes draft committed ".
PostgreSQL 18 articles Postgres 18 Release Notes (2025-05-02) - Bruce Momjian Waiting for PostgreSQL 18 – Allow NOT NULL constraints to be added as NOT VALID (2025-05-01) - Hubert 'depesz' Lubaczewski PostgreSQL 18: part 4 or CommitFest 2025-01 (2025-04-29) - Pavel Luzanov / PostgresPro Update Your Control Files (2025-04-28) - David E. Wheeler
You know I had to do it as soon as I found it was possible. Yes, I installed and enabled AI in the DBeaver Query Editor so I can use AI with my PostgreSQL database work. Let’s face it. It was inevitable. However, the setup isn’t intuitive. Setting Up in DBeaver I’m going to assume […]