SlideShare a Scribd company logo
Replication in PostgreSQL - Deep Dive EnterpriseDB
Table of Contents
1 Objectives........................................................................................................................................................3
2 Presenter...........................................................................................................................................................3
3 What is Replication..........................................................................................................................................4
4 Why use Replication........................................................................................................................................4
5 Models of Replication (Single Master & Multi Master)..................................................................................5
6 Classes of Replication (Unidirectional & Bidirectional).................................................................................5
7 Modes of Replication (Asynchronous & Synchronous)..................................................................................6
8 Types of Replication (Physical & Logical)......................................................................................................7
9 Methods Of Replication...................................................................................................................................8
9.1 Disk Based Replication.................................................................................................................................8
9.1.1 Introduction................................................................................................................................................8
9.1.2 Setup..........................................................................................................................................................8
9.1.3 Configuring PostgreSQL Replication using NAS.....................................................................................8
9.1.4 Steps to perform Failover........................................................................................................................12
9.2 File System Based.......................................................................................................................................13
9.2.1 Introduction to DRBD.............................................................................................................................13
9.2.2 Setup........................................................................................................................................................15
9.2.3 Configuring PostgreSQL Replication using DRBD with Protocol C......................................................18
9.2.4 Steps to perform Failover........................................................................................................................26
9.3 Trigger Based..............................................................................................................................................28
9.3.1 Introduction to Slony-I.............................................................................................................................28
9.3.2 Advantages and Disadvantages of Slony.................................................................................................29
9.3.3 Setup........................................................................................................................................................30
9.3.4 Configuring PostgreSQL Replication using Slony-I...............................................................................30
9.3.5 Steps to perform controlled switchover...................................................................................................43
9.4 Introduction to WAL...................................................................................................................................44
9.4.1 What is WAL and Why is it required.......................................................................................................44
9.4.2 Transaction Log and WAL Segment Files...............................................................................................48
9.4.3 WAL Writer..............................................................................................................................................48
9.4.4 WAL Segment File Management.............................................................................................................49
9.4.5 WAL Example..........................................................................................................................................50
9.4.6 Overview of Replication Options based on WAL....................................................................................55
9.5 Log Shipping Based - File Level................................................................................................................57
9.5.1 Setup........................................................................................................................................................57
9.5.2 Configuring Replication using Log Shipping..........................................................................................57
9.5.3 Steps to perform Failover........................................................................................................................64
9.6 Log Shipping Based - Block Level.............................................................................................................65
9.6.1 Physical Streaming Replication...............................................................................................................65
9.6.2 WAL Sender & WAL Receiver................................................................................................................65
9.6.3 WAL Streaming Protocol Details.............................................................................................................66
9.6.4 Setup........................................................................................................................................................67
9.6.5 Configuring PostgreSQL Replication using WAL Streaming..................................................................67
9.6.6 Steps to perform Failover........................................................................................................................71
9.7 Logical Decoding Based.............................................................................................................................72
9.7.1 What is Logical Replication.....................................................................................................................72
1/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
9.7.2 Comparison of Physical and Logical Replication....................................................................................73
9.7.3 Publication & Subscription......................................................................................................................74
9.7.4 Logical Decoding Plugin.........................................................................................................................75
9.7.5 Logical replication slots...........................................................................................................................75
9.7.6 test_decoding and pg_recvlogical............................................................................................................75
9.7.7 Setup........................................................................................................................................................80
9.7.8 Configuring PostgreSQL Replication using Logical Decoding...............................................................80
9.7.9 Logical Replication Protocol Details.......................................................................................................83
9.8 Statement Based..........................................................................................................................................87
9.8.1 Introduction to pgpool-II.........................................................................................................................87
9.8.2 Setup........................................................................................................................................................88
9.8.3 Configuring PostgreSQL replication using pgpool-II..............................................................................88
9.9 Other possibilities.......................................................................................................................................96
2/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
1 Objectives
A) Familiarize with Replication in PostgreSQL.
B) Learn configuration and fail-over for each method of replication in PostgreSQL using a two node
cluster.
2 Presenter
My name is Abbas, I have a Masters in Computer Engineering. I have spent most of my career in product
development. I work as a Senior Architect at EnterpriseDB. My work highlights are as follows:
• Migration Portal for online schema migration from Oracle to PostgreSQL
• xDB Replication Server
• Schema Cloning with support for parallelism using Background Workers
• Distributed Transactions (XA) Compliance for PostgreSQL using PgBouncer
• Oracle Compatible Packages for IBM DB2 : UTL_ENCODE, UTL_TCP, UTL_SMTP, UTL_MAIL
• HDFS_FDW, Mongo_FDW, MySQL FDW
• Postgres-XC
Email : abbas.butt@enterprisedb.com
Linkedin : https://pk.linkedin.com/in/abbasbutt
Blog : https://abbas-technical.blogspot.com
3/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
3 What is Replication
Replication is the process of copying data from one database server to a another database
server. The source database server is usually called Master Server, whereas the target database
server is called Slave Server.
4 Why use Replication
Replication of data can have many use cases. For Example:
• Remove reporting queries load from the production OLTP system. This improves
reporting queries time as well as transaction processing performance.
• Fault tolerance : In the event of failure of the master database server, the slave database
server can take over since it is already up to date with the master server. In this
configuration the slave server can also be called standby server. This configuration can
also be used for regular maintenance of the primary server.
• Data migration : To upgrade database server hardware, or to deploy the same system for
another customer.
• Testing systems in parallel : In case we decide to port the application from one DBMS
to another, the results from old and new systems on the same data must be compared to
ensure whether the new system works as expected.
4/96
Master SlaveData
Replication in PostgreSQL - Deep Dive EnterpriseDB
5 Models of Replication (Single Master & Multi Master)
In Single-Master Replication (SMR) changes to table rows in a designated master database
server are replicated to one or more slave database servers. The replicated tables in the slave
database are not permitted to accept any changes (except from the master) and even if they do,
changes are not replicated back to the master server.
In Multi-Master Replication (MMR) changes to table rows in more than one designated
masters are replicated to their counterpart tables in every other database. In this model often
conflict resolution schemes are employed to avoid duplicate primary keys for example.
MMR adds to the use cases of replication in the following manner:
• Write availability and scalability
• Multi-master replication allows you to employ a WAN connected network of master
databases that can be geographically close to groups of clients, yet maintain data
consistency across master databases.
6 Classes of Replication (Unidirectional & Bidirectional)
Single-Master Replication (SMR) is also termed as unidirectional since replication data flows
in one direction only from master to slave, whereas in Multi-Master Replication (MMR)
replication data flows in both directions, it is therefore called bidirectional replication.
5/96
Master-I Master - IIData
Replication in PostgreSQL - Deep Dive EnterpriseDB
7 Modes of Replication (Asynchronous & Synchronous)
In synchronous mode of replication transactions on the master database are declared complete
only when the changes have been replicated to all the slaves in addition to the master. All
slaves have to be available all the time for the transactions to complete on the master.
In Asynchronous mode the transactions on the master server are declared complete when the
changes have been done on the master server. These changes are replicated to the slaves later in
time. In this mode the slaves can remain out-of-sync for a certain duration which is called
replication lag.
6/96
Master
Slave - I
Slave - II
Time
An insert to a replicated table
Master
Slave - I
Slave - II
Time
An insert to a replicated table
Replication Lag
Replication in PostgreSQL - Deep Dive EnterpriseDB
8 Types of Replication (Physical & Logical)
Before we discuss physical and logical replication replication, lets first discuss the context of
the terms physical and logical here.
Logical Operation Physical Operation
1 initdb Creates a base directory for the cluster
2 CREATE DATABASE Create a sub-directory in the base directory
3 CREATE TABLE Creates a file within the sub-directory of the database
4 INSERT Changes the file that was created for this particular table
and writes new WAL Records in the current WAL segment
For example:
ramp=# create table sample_tbl(a int, b varchar(255));
CREATE TABLE
ramp=# SELECT pg_relation_filepath('sample_tbl');
pg_relation_filepath
----------------------
base/34740/706736
(1 row)
ramp=# SELECT datname, oid FROM pg_database WHERE datname = 'ramp';
datname | oid
---------+-------
ramp | 34740
(1 row)
ramp=# SELECT relname, oid FROM pg_class WHERE relname = 'sample_tbl';
relname | oid
------------+--------
sample_tbl | 706736
(1 row)
Physical replication deals with files and directories, it has no knowledge of what these files and
directories represent. It is done at file system level or disk level.
Logical replication on the other hand deals with databases, tables and DML operations. It is
therefore possible in logical replication to replicate a certain set of tables only. It is done at
database cluster level.
7/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
9 Methods Of Replication
9.1 Disk Based Replication
9.1.1 Introduction
A network attached storage with at least two disks can provide transparent replication by using
mirroring i.e. RAID-1. Mirroring provides replication by copying all data from one disk to the
other as if the second disk was mirror image of the first. This configuration provides fault
tolerance in case of a single disk failure.
9.1.2 Setup
The setup consists of one Centos 7 machine with PostgreSQL 10.7 installed and a Western
Digital My Cloud Home 4 TB NAS.
9.1.3 Configuring PostgreSQL Replication using NAS
Step 1: Connect the NAS device to the Internet
The device needs a DHCP server running on the network and needs Internet for
first time configuration.
Step 2: Make sure PostgreSQL machine is connected to the same network as your device
Step 3: Make sure you are able to access the device through the web interface
mycloud.com/hello
Create an account with email and password.
8/96
PostgreSQL
WD My Cloud Home
4 TB
Internet
Replication in PostgreSQL - Deep Dive EnterpriseDB
Step 4: Find the Mac address of your NAS device.
The MAC address for my device is 00:00:c0:08:d7:01
Step 5: Find the IP address of the device
On the PostgreSQL machine run the command
arp -a
and look for an entry like this
? (172.24.37.136) at 00:00:c0:08:d7:01 [ether] on ens160u3u1c2
The IP address of the NAS device is therefore
172.24.37.136
9/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Step 6: Check the public share on the device
smbclient -N -L 172.24.37.136
Sharename Type Comment
--------- ---- -------
Public Disk
IPC$ IPC IPC Service (MyCloudDevice)
Reconnecting with SMB1 for workgroup listing.
Server Comment
--------- -------
Workgroup Master
--------- -------
WORKGROUP BFAS91-WIN
Step 7: Mount the public share of the device on a local folder on the PostgreSQL machine
mkdir /home/abbas/mc2
Step 7.1: Create a local folder
mkdir /home/abbas/mc2
Step 7.2: Edit the /etc/fstab file and add the following line in it, modes are important
//172.24.37.136/public/
/home/abbas/mc2/
cifs
credentials=/home/abbas/.smbcredentials,
uid=abbas,gid=abbas,rw,dir_mode=0700,file_mode=0700 0 0
Step 7.3: Create the credentials file as follows
vim ~/.smbcredentials
username=abbas
password=abc123
Step 7.4: Mount
sudo mount -a
Step 7.5: Check the mounted folder
ls -l /home/abbas/mc2
total 4
-rwx------. 1 abbas abbas 1135 Mar 10 05:30 for_nas.txt
drwx------. 2 abbas abbas 0 Mar 11 07:08 for_pg
10/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Step 8: Create a folder to initdb, note the permissions, that’s why modes are important in
step 7.2
mkdir /home/abbas/mc2/data
ls -l /home/abbas/mc2
total 4
drwx------. 2 abbas abbas 0 Mar 11 07:42 data
-rwx------. 1 abbas abbas 1135 Mar 10 05:30 for_nas.txt
drwx------. 2 abbas abbas 0 Mar 11 07:08 for_pg
Step 8: Initialize cluster
./initdb -D /home/abbas/mc2/data/
Step 9: Run the server
./postgres -D /home/abbas/mc2/data -p 7654
Step 10: Create a new table
./psql -p 7654 postgres
create table test_tab(a int, b varchar(10));
SELECT pg_relation_filepath('test_tab');
pg_relation_filepath
----------------------
base/13212/16384
(1 row)
11/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Step 11: Connect to the device using nautilus
Step 12: Check the relation file and its path
9.1.4 Steps to perform Failover
In a two disk NAS device that has RAID-1 build in to it, the user can simply remove the faulty
disk and replace it with a new disk, the database server will never notice the absence of the
second disk or its replacement.
12/96
Enter device address
smb://172.24.37.136
Press connect button
smb://172.24.37.136/public/data/base/13212
Replication in PostgreSQL - Deep Dive EnterpriseDB
9.2 File System Based
9.2.1 Introduction to DRBD
Distributed Replicated Block Device (DRBD) is a software module that provides disk or
partition mirroring between network hosts. DRBD is a virtual block device driver implemented
as a kernel module. It provides replication solution which is independent of the application that
is generating the data to be replicated.
PostgreSQL is configured to use data directory on the DRBD controlled partition. When
PostgreSQL writes any data, DRBD module not only writes that data on the disk but also sends
the same data on the network to the connected secondary. The DRBD module on the secondary
receives the data from the network and writes it to the disk.
In DRBD the most commonly used data synchronization mode is Single-Primary. In the
single primary mode only one cluster node manipulates the data at any moment. DRBD can
also support Dual-Primay mode. We are using Single Primary Mode with ext4 file system.
DRBD Supports three replication protocols:
Protocol A - Asynchronous replication protocol. Local write operations on the primary node
are considered completed as soon as the local disk write has finished, and the replication
packet has been placed in the local TCP send buffer. In the event of forced fail-over, data loss
may occur.
13/96
PostgreSQL
DRBD
Write
Primary Secondary
DRBD
sda2sda2
Replication
Replication in PostgreSQL - Deep Dive EnterpriseDB
Protocol B - Memory synchronous (semi-synchronous) replication protocol. Local write
operations on the primary node are considered completed as soon as the local disk write has
occurred, and the replication packet has reached the peer node. Normally, no writes are lost in
case of forced fail-over.
Protocol C - Synchronous replication protocol. Local write operations on the primary node are
considered completed only after both the local and the remote disk write have been confirmed.
As a result, loss of a single node is guaranteed not to lead to any data loss.
Most commonly used replication protocol in DRBD setup is Protocol C.
14/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
9.2.2 Setup
The setup consists of two CentOS 7 machines connected via LAN installed with two partitions.
While installing CentOS 7, choose "Installation Destination" option
Deselect "Automatically configure partitioning" and
Select "I will configure partitioning"
After clicking Done, Manual Partitioning screen will appear
Click the + button to add a mount point
Mount Point /
Desired Capacity 15 GiB
File System ext4
15/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Click the + button to add a mount point
For swap Enter Desired Capacity 4GiB
Click the + button to add a mount point
Mount Point /for_data
Desired Capacity 12 GiB
File System ext4
16/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Click Done
Accept Changes
17/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
9.2.3 Configuring PostgreSQL Replication using DRBD with Protocol C
All steps are for both primary and secondary node, unless mentioned otherwise.
Step 1: Disable and stop firewall on both the nodes
sudo firewall-cmd --state
sudo systemctl stop firewalld
sudo systemctl disable firewalld
sudo systemctl mask --now firewalld
Step 2: Change hostname
sudo hostnamectl set-hostname primary
sudo hostnamectl set-hostname secondary
Step 3: Install Extra Packages for Enterprise Linux (EPEL) repository
sudo yum install epel-release
sudo rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
sudo rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm
Step 4: Install DRBD
sudo yum install drbd90-utils kmod-drbd90
Step 5: Restart the System
Step 6: Install the kernel module
sudo modprobe drbd
18/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Step 7: Create configuration file
sudo vim /etc/drbd.d/pgconf.res
resource pgconf
{
protocol C;
on primary
{
device /dev/drbd0;
disk /dev/sda2;
address 172.16.214.151:7788;
meta-disk internal;
}
on secondary
{
device /dev/drbd0;
disk /dev/sda2;
address 172.16.214.150:7788;
meta-disk internal;
}
}
Step 8: Unmount the disk
df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 15G 5.7G 8.2G 41% /
devtmpfs 1.4G 0 1.4G 0% /dev
tmpfs 1.4G 58M 1.3G 5% /dev/shm
tmpfs 1.4G 11M 1.4G 1% /run
tmpfs 1.4G 0 1.4G 0% /sys/fs/cgroup
/dev/sda2 12G 41M 12G 1% /for_data
tmpfs 278M 4.0K 278M 1% /run/user/42
tmpfs 278M 44K 278M 1% /run/user/1000
19/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
sudo umount /dev/sda2
df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 15G 5.7G 8.2G 41% /
devtmpfs 1.4G 0 1.4G 0% /dev
tmpfs 1.4G 58M 1.3G 5% /dev/shm
tmpfs 1.4G 11M 1.4G 1% /run
tmpfs 1.4G 0 1.4G 0% /sys/fs/cgroup
tmpfs 278M 4.0K 278M 1% /run/user/42
tmpfs 278M 44K 278M 1% /run/user/1000
Step 9: Delete file system from the disk, DRBD needs a disk without any file system
sudo yum install util-linux
sudo wipefs /dev/sda2
offset type
----------------------------------------------------------------
0x438 ext4 [filesystem]
UUID: 8def5959-4dc9-4605-ad61-bd3b597966a3
sudo wipefs -a /dev/sda2
/dev/sda2: 2 bytes were erased at offset 0x00000438 (ext4): 53 ef
Step 10: Create DRBD device meta data
sudo drbdadm create-md pgconf
md_offset 12883849216
al_offset 12883816448
bm_offset 12883423232
Found some data
==> This might destroy existing data! <==
Do you want to proceed?
[need to type 'yes' to confirm] yes
initializing activity log
initializing bitmap (384 KB) to all zero
20/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Writing meta data...
New drbd meta data block successfully created.
success
Step 11: Associate the DRBD disk with the backing device on the both nodes
sudo lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 30G 0 disk
├─sda1 8:1 0 15G 0 part /
├─sda2 8:2 0 12G 0 part
└─sda3 8:3 0 3G 0 part [SWAP]
sr0 11:0 1 1024M 0 rom
NAME This is the device name.
MAJ:MIN This column shows the major and minor device number.
RM This column shows whether the device is removable or not.
SIZE This is column give information on the size of the device.
RO This indicates whether a device is read-only.
TYPE This column shows information whether the block device is a disk or a
partition(part) within a disk. In this example sda is a disk, sda1, sda2
& sda3 are partitions and sr0 is read only memory (rom)
MOUNTPOINT This column indicates mount point on which the device is mounted.
sudo drbdadm up pgconf
sudo lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 30G 0 disk
├─sda1 8:1 0 15G 0 part /
├─sda2 8:2 0 12G 0 part
│ └─drbd0 147:0 0 12G 1 disk
└─sda3 8:3 0 3G 0 part [SWAP]
sr0 11:0 1 1024M 0 rom
21/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Step 12: Start drbd on both the nodes
sudo systemctl start drbd
sudo systemctl enable drbd
Step 13: Start initial full synchronization on the primary node
sudo drbdadm primary pgconf --force
Step 14: Build ext4 file system on DRBD device on the primary node
sudo mkfs -t ext4 /dev/drbd0
mke2fs 1.42.9 (28-Dec-2013)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
786432 inodes, 3145367 blocks
157268 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=2151677952
96 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208
Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
22/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Step 15: Mount the DRBD device on the primary node
sudo mount /dev/drbd0 /for_data
df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 15G 5.7G 8.2G 41% /
devtmpfs 1.4G 0 1.4G 0% /dev
tmpfs 1.4G 58M 1.3G 5% /dev/shm
tmpfs 1.4G 11M 1.4G 1% /run
tmpfs 1.4G 0 1.4G 0% /sys/fs/cgroup
tmpfs 278M 4.0K 278M 1% /run/user/42
tmpfs 278M 44K 278M 1% /run/user/1000
/dev/drbd0 12G 41M 12G 1% /for_data
Step 16: Check the connections between primary and secondary nodes
sudo netstat -n | grep 7788
tcp 0 0 172.16.214.151:47609 172.16.214.150:7788 ESTABLISHED
tcp 0 0 172.16.214.151:7788 172.16.214.150:40336 ESTABLISHED
Step 17: Check DRBD processes
ps -ef --forest | grep drbd
root 11109 2 0 13:35 ? 00:00:00 _ [drbd-reissue]
root 88248 2 0 21:37 ? 00:00:00 _ [drbd_w_pgconf]
root 88250 2 0 21:37 ? 00:00:00 _ [drbd0_submit]
root 88256 2 0 21:37 ? 00:00:02 _ [drbd_s_pgconf]
root 88262 2 6 21:37 ? 00:01:29 _ [drbd_r_pgconf]
root 88269 2 0 21:37 ? 00:00:00 _ [drbd_a_pgconf]
root 88270 2 0 21:37 ? 00:00:00 _ [drbd_as_pgconf]
23/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Step 18: Check the output of drbdmon tool
drbdmon
Step 19: Install PostgreSQL
git clone git://git.postgresql.org/git/postgresql.git
cd postgresql/
git checkout REL_11_STABLE
./configure --prefix=/usr/local/pg11 --enable-debug CFLAGS=-O0
make && make install
24/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Step 20: Initialize the cluster
./initdb -D /for_data/data
Note that we are using the DRBD device to store data, since that device is being
replicated to the secondary.
Step 21: Start the server
./postgres -D /for_data/data/ -p 6543
Step 22: Create a table and insert some rows
./psql -p 6543 postgres
create table for_testing(id int primary key, value varchar(255));
insert into for_testing values(1, 'One');
insert into for_testing values(2, 'Two');
insert into for_testing values(3, 'Three');
Step 23: Simulate disk failure on the primary node
./pg_ctl stop /for_data/data/
sudo umount /for_data
25/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
9.2.4 Steps to perform Failover
Step 1: Install PostgreSQL on the secondary node
Step 2: Check the data directory replicated by DRBD
sudo mkdir /usr/local/pg11
sudo chown abbas:abbas /usr/local/pg11
sudo drbdadm primary pgconf
sudo mount /dev/drbd0 /for_data/
ls -l /for_data/
total 20
drwx------ 19 abbas abbas 4096 Jan 3 05:08 data
drwx------ 2 root root 16384 Jan 2 21:45 lost+found
ls -l /for_data/data/
total 116
drwx------ 5 abbas abbas 4096 Jan 3 03:59 base
drwx------ 2 abbas abbas 4096 Jan 3 04:00 global
drwx------ 2 abbas abbas 4096 Jan 3 03:59 pg_commit_ts
drwx------ 2 abbas abbas 4096 Jan 3 03:59 pg_dynshmem
-rw------- 1 abbas abbas 4513 Jan 3 03:59 pg_hba.conf
-rw------- 1 abbas abbas 1636 Jan 3 03:59 pg_ident.conf
drwx------ 4 abbas abbas 4096 Jan 3 05:08 pg_logical
drwx------ 4 abbas abbas 4096 Jan 3 03:59 pg_multixact
drwx------ 2 abbas abbas 4096 Jan 3 03:59 pg_notify
drwx------ 2 abbas abbas 4096 Jan 3 03:59 pg_replslot
drwx------ 2 abbas abbas 4096 Jan 3 03:59 pg_serial
drwx------ 2 abbas abbas 4096 Jan 3 03:59 pg_snapshots
drwx------ 2 abbas abbas 4096 Jan 3 05:08 pg_stat
drwx------ 2 abbas abbas 4096 Jan 3 05:08 pg_stat_tmp
drwx------ 2 abbas abbas 4096 Jan 3 03:59 pg_subtrans
drwx------ 2 abbas abbas 4096 Jan 3 03:59 pg_tblspc
drwx------ 2 abbas abbas 4096 Jan 3 03:59 pg_twophase
-rw------- 1 abbas abbas 3 Jan 3 03:59 PG_VERSION
drwx------ 3 abbas abbas 4096 Jan 3 03:59 pg_wal
drwx------ 2 abbas abbas 4096 Jan 3 03:59 pg_xact
-rw------- 1 abbas abbas 88 Jan 3 03:59 postgresql.auto.conf
-rw------- 1 abbas abbas 23866 Jan 3 03:59 postgresql.conf
-rw------- 1 abbas abbas 64 Jan 3 03:59 postmaster.opts
26/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Step 3: Start the server on this data directory
./postgres -D /for_data/data/ -p 6543
Step 4: Check the table and the data in it
./psql -p 6543 postgres
psql (11.1)
Type "help" for help.
postgres=# select * from for_testing;
id | value
----+-------
1 | One
2 | Two
3 | Three
(3 rows)
27/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
9.3 Trigger Based
9.3.1 Introduction to Slony-I
Slony is a master to multiple slaves AFTER ROW trigger based asynchronous logical
replication system for PostgreSQL. Slony supports cascading. Direct subscribers put load on
master, indirect subscribers put load on direct subscribers.
Slony uses the following terminology:
Cluster : A named set of PostgreSQL instances between which replication takes place.
Node : A named PostgreSQL instance that participates as master/slave in a replication cluster.
Set : A set of tables that need to be replicated between two nodes.
Origin & Subscriber : Each replication set has an origin (master) and a subscriber. Origin is
where the modifications of the data take place and subscriber is where those changes get
replicated to.
Slon daemon : Slon daemon runs on each node in the cluster. It manages replication activity
for that node. Slon processes replication events. Replication events are of two types:
Configuration events : which occur when the configuration of the cluster is changed. Slon in
this case would replicate the changed configuration to all the nodes. For example adding a
table to a subscribed set.
SYNC events : which occur when replicated tables are updated. Lets look in detail how does
an insert in a table on origin node gets replicated to a slave node. The following diagram shows
a two node cluster that is using Slony for replication between one master and one slave.
Each slon daemon establishes connection with master as well as slave database.
When the slony replication system is installed it performs the following steps:
• Creates an AFTER INSERT UPDATE DELETE ROW trigger on the table to be
replicated on the master node.
• Creates an trigger to deny any writes to the replicated table on the slave.
• Creates tables and functions required to support replication in a separate schema
named after the cluster name .
When the client inserts a row in the table on the master then following happens to do the
replication:
• The after row trigger inserts a log row in the table sl_log_1 or sl_log_2 table.
• The slon daemon on the master inserts a row in sl_event and issues a NOTIFY. This
28/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
generates a SYNC event.
• The slon daemon on the slave listens to the notification and reads the sl_log_1 or
sl_log_2 table form the remote database.
• The slon daemon constructs the insert statement and executes it locally to replicate the
row to the slave.
9.3.2 Advantages and Disadvantages of Slony
Advantages:
• Slony allows to replicate a small subset of tables in a database.
• Slony works across different PostgreSQL major versions.
• Slony provide ability to create additional indexes on slaves.
• Slony can be used to upgrade from an older PostgreSQL version to a newer one.
• Baring tables of the set, slony allows slaves to be used for read/write activity.
• Load of indirect slaves is not put on the master. Only direct slaves are server’s load.
Disadvantages:
• Slony cannot replicate large objects, DDL commands, users and roles.
• Slony is asynchronous and cannot provide ability to failover with zero transaction loss.
• Slony puts load on the master. The more the slaves the more the load.
• Slony mandates the use of primary key on all the tables to be replicated.
29/96
Origin
Slave
PostgreSQL
PostgreSQL
slon
slon
Remote connection
Remote connection
Local
connection
Local
connection
Replication in PostgreSQL - Deep Dive EnterpriseDB
9.3.3 Setup
The setup consists of two CentOS 7 machines connected via LAN on which PostgreSQL
version 10.7 and slony version 2.2.6 is installed.
9.3.4 Configuring PostgreSQL Replication using Slony-I
Step 1: Disable and stop firewall on both the nodes
sudo firewall-cmd --state
sudo systemctl stop firewalld
sudo systemctl disable firewalld
sudo systemctl mask --now firewalld
Step 2: Install PostgreSQL and Slony
Download postgresql-10.7-1-linux-x64.run from EnterpriseDB website and install all the
components.
Run StackBuilder and install Slony 2.2.6.
Step 3: Configure trust authentication in both master and slave
As postgres user do the following
cd /opt/PostgreSQL/10/bin/
./pg_ctl stop -D ../data
vim ../data/pg_hba.conf
host all all 172.16.214.163/24 trust
./pg_ctl start -D ../data
Step 4: Export environment variables
export CLUSTERNAME=slony_example
export MASTERDBNAME=for_slony
export SLAVEDBNAME=for_slony
export MASTERHOST=172.16.214.163
export SLAVEHOST=172.16.214.162
export MASTERPORT=5432
export SLAVEPORT=5432
30/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
export REPLICATIONUSER=postgres
export PATH=$PATH:/opt/PostgreSQL/10/bin/
Step 5: Make sure both servers are accessible from both machines
./psql -h $MASTERHOST -p $MASTERPORT -U $REPLICATIONUSER $MASTERDBNAME
psql.bin (10.7)
Type "help" for help.
postgres=# q
./psql -h $SLAVEHOST -p $SLAVEPORT -U $REPLICATIONUSER $SLAVEDBNAME
psql.bin (10.7)
Type "help" for help.
postgres=# q
Step 6: Create database, tables and insert some values on master
./createdb -h $MASTERHOST -p $MASTERPORT -U $REPLICATIONUSER $MASTERDBNAME
./psql -h $MASTERHOST -p $MASTERPORT -U $REPLICATIONUSER $MASTERDBNAME
CREATE TABLE student(sid INT PRIMARY KEY, sname VARCHAR(255), saddress VARCHAR(255));
CREATE TABLE teacher(tid INT PRIMARY KEY, tname VARCHAR(255), tsubject VARCHAR(255));
INSERT INTO student VALUES(1, 'Edward', 'Main Campus');
INSERT INTO student VALUES(2, 'Linda', 'Girls Hostel');
INSERT INTO student VALUES(3, 'Jason', 'Boys Hostel');
INSERT INTO teacher VALUES(1, 'Gary', 'Physics');
INSERT INTO teacher VALUES(2, 'Karen', 'Maths');
INSERT INTO teacher VALUES(3, 'Carol', 'History');
Step 7: Create database and tables on slave
./createdb -h $SLAVEHOST -p $SLAVEPORT -U $REPLICATIONUSER $SLAVEDBNAME
./psql -h $SLAVEHOST -p $SLAVEPORT -U $REPLICATIONUSER $SLAVEDBNAME
CREATE TABLE student(sid INT PRIMARY KEY, sname VARCHAR(255), saddress VARCHAR(255));
CREATE TABLE teacher(tid INT PRIMARY KEY, tname VARCHAR(255), tsubject VARCHAR(255));
31/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Step 8: Create and execute slony setup script to do the following steps
./slony_setup.sh
<stdin>:21: Possible unsupported PostgreSQL version (100700) 10.7, defaulting to 8.4 support
<stdin>:36: Possible unsupported PostgreSQL version (100700) 10.7, defaulting to 8.4 support
Step 8.1: Define the schema name that slony uses to create all slony objects, in our
example it is _slony_example
cluster name = $CLUSTERNAME;
Step 8.2: Provide connection info that is used by slonik to connect to master and slave
node 1 admin conninfo = 'dbname=$MASTERDBNAME host=$MASTERHOST
port=$MASTERPORT user=$REPLICATIONUSER';
node 2 admin conninfo = 'dbname=$SLAVEDBNAME host=$SLAVEHOST port=$SLAVEPORT
user=$REPLICATIONUSER';
Step 8.3: Initialize the first node. Its id MUST be 1. This creates the schema
_slony_example containing all replication system specific database objects. The main
tables that store change log are _slony_example.sl_log_1 & _slony_example.sl_log_2. The
main function that adds change log to these tables is _slony_example.logtrigger which
calls the C function _Slony_I_logTrigger
init cluster ( id=1, comment = 'Master Node');
Step 8.4: Create a table set that can be subscribed by slaves
create set (id=1, origin=1, comment='some tables');
set add table (set id=1, origin=1, id=1, fully qualified name =
'public.student', comment='student table');
set add table (set id=1, origin=1, id=2, fully qualified name =
'public.teacher', comment='teacher table');
Step 8.5: Create a slave node
store node (id=2, comment = 'Slave node', event node=1);
Step 8.6: Provide connection info for nodes to be able to connect to listen for events
store path (server = 1, client = 2, conninfo='dbname=$MASTERDBNAME
host=$MASTERHOST port=$MASTERPORT user=$REPLICATIONUSER');
store path (server = 2, client = 1, conninfo='dbname=$SLAVEDBNAME
host=$SLAVEHOST port=$SLAVEPORT user=$REPLICATIONUSER');
32/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
The setup script performs the following actions
Action 1: It creates the following triggers on each of the table in the set on the master
for_slony=# d+ student
Table "public.student"
Column | Type |
----------+------------------------+
sid | integer |
sname | character varying(255) |
saddress | character varying(255) |
Indexes:
"student_pkey" PRIMARY KEY, btree (sid)
Triggers:
_slony_example_logtrigger AFTER INSERT OR DELETE OR UPDATE ON student
FOR EACH ROW EXECUTE PROCEDURE
_slony_example.logtrigger('_slony_example','1','k')
_slony_example_truncatetrigger BEFORE TRUNCATE ON student
FOR EACH STATEMENT EXECUTE PROCEDURE
_slony_example.log_truncate('1')
Disabled user triggers:
_slony_example_denyaccess BEFORE INSERT OR DELETE OR UPDATE ON student
FOR EACH ROW EXECUTE PROCEDURE
_slony_example.denyaccess('_slony_example')
_slony_example_truncatedeny BEFORE TRUNCATE ON student
FOR EACH STATEMENT EXECUTE PROCEDURE
_slony_example.deny_truncate()
33/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Action 2: It creates the following triggers on each of the table in the set on the slave
for_slony=# d+ student
Table "public.student"
Column | Type |
----------+------------------------+
sid | integer |
sname | character varying(255) |
saddress | character varying(255) |
Indexes:
"student_pkey" PRIMARY KEY, btree (sid)
Triggers:
_slony_example_denyaccess BEFORE INSERT OR DELETE OR UPDATE ON student
FOR EACH ROW EXECUTE PROCEDURE
_slony_example.denyaccess('_slony_example')
_slony_example_truncatedeny BEFORE TRUNCATE ON student
FOR EACH STATEMENT EXECUTE PROCEDURE
_slony_example.deny_truncate()
Disabled user triggers:
_slony_example_logtrigger AFTER INSERT OR DELETE OR UPDATE ON student
FOR EACH ROW EXECUTE PROCEDURE
_slony_example.logtrigger('_slony_example', '1', 'k')
_slony_example_truncatetrigger BEFORE TRUNCATE ON student
FOR EACH STATEMENT EXECUTE PROCEDURE
_slony_example.log_truncate('1')
34/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Action 3: It creates the _slony_example schema with the following objects on both nodes
for_slony=# dtvs _slony_example.*
List of relations
Schema | Name | Type | Owner
----------------+----------------------------+----------+----------
_slony_example | sl_nodelock_nl_conncnt_seq | sequence | postgres
_slony_example | sl_log_status | sequence | postgres
_slony_example | sl_action_seq | sequence | postgres
_slony_example | sl_local_node_id | sequence | postgres
_slony_example | sl_event_seq | sequence | postgres
_slony_example | sl_log_script | table | postgres
_slony_example | sl_registry | table | postgres
_slony_example | sl_apply_stats | table | postgres
_slony_example | sl_nodelock | table | postgres
_slony_example | sl_setsync | table | postgres
_slony_example | sl_table | table | postgres
_slony_example | sl_sequence | table | postgres
_slony_example | sl_node | table | postgres
_slony_example | sl_listen | table | postgres
_slony_example | sl_path | table | postgres
_slony_example | sl_log_1 | table | postgres
_slony_example | sl_log_2 | table | postgres
_slony_example | sl_subscribe | table | postgres
_slony_example | sl_event | table | postgres
_slony_example | sl_confirm | table | postgres
_slony_example | sl_seqlog | table | postgres
_slony_example | sl_components | table | postgres
_slony_example | sl_set | table | postgres
_slony_example | sl_config_lock | table | postgres
_slony_example | sl_event_lock | table | postgres
_slony_example | sl_archive_counter | table | postgres
_slony_example | sl_failover_targets | view | postgres
_slony_example | sl_seqlastvalue | view | postgres
_slony_example | sl_status | view | postgres
(29 rows)
35/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Action 4: Creates the following triggers
List of triggers
----------------+------------------------------------+
Schema | Name |
----------------+------------------------------------+
_slony_example | logapply
_slony_example | log_truncate
_slony_example | deny_truncate
_slony_example | logtrigger
_slony_example | denyaccess
_slony_example | lockedset
Action 5: Creates around 150 functions in the _slony_example schema
It does not run any daemon on master or slave, i.e. it does not start
replication process, It does not copy any data from master to slave.
Step 9: Start the slon daemon on both the master and slave
slon $CLUSTERNAME "dbname=$MASTERDBNAME user=$REPLICATIONUSER
host=$MASTERHOST port=$MASTERPORT"
slon $CLUSTERNAME "dbname=$SLAVEDBNAME user=$REPLICATIONUSER host=$SLAVEHOST
port=$SLAVEPORT"
slon daemon should emit messages of the sort
INFO remoteWorkerThread_2: SYNC 5000000178 done in 0.003 seconds
INFO remoteWorkerThread_2: SYNC 5000000179 done in 0.003 seconds
NOTICE: Slony-I: log switch to sl_log_2 complete - truncate sl_log_1
INFO cleanupThread: 0.020 seconds for cleanupEvent()
INFO remoteWorkerThread_2: SYNC 5000000180 done in 0.002 seconds
INFO remoteWorkerThread_2: SYNC 5000000181 done in 0.004 seconds
Connection problems result in errors like this
WARN remoteListenThread_2: DB connection failed - sleep 10 seconds
ERROR slon_connectdb: PQconnectdb("dbname=for_slony host=w.x.y.z port=5432 user=postgres")
failed - fe_sendauth: no password supplied
WARN remoteListenThread_2: DB connection failed - sleep 10 seconds
ERROR slon_connectdb: PQconnectdb("dbname=for_slony host=w.x.y.z port=5432 user=postgres")
failed - fe_sendauth: no password supplied
WARN remoteListenThread_2: DB connection failed - sleep 10 seconds
ERROR slon_connectdb: PQconnectdb("dbname=for_slony host=w.x.y.z port=5432 user=postgres")
failed - fe_sendauth: no password supplied
36/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Step 10: Start the subscription
./slony_sub.sh
The following script instructs slony to subscribe set whose id is 1 and
whose provider (master) id is 1 for receiver (slave) whose id is 2
#!/bin/sh
slonik <<_EOF_
# ----
# This defines which namespace the replication system uses
# ----
cluster name = $CLUSTERNAME;
# ----
# Admin conninfo’s are used by the slonik program to connect
# to the node databases. So these are the PQconnectdb arguments
# that connect from the administrators workstation (where
# slonik is executed).
# ----
node 1 admin conninfo = 'dbname=$MASTERDBNAME host=$MASTERHOST
port=$MASTERPORT user=$REPLICATIONUSER';
node 2 admin conninfo = 'dbname=$SLAVEDBNAME host=$SLAVEHOST
port=$SLAVEPORT user=$REPLICATIONUSER';
# ----
# Node 2 subscribes set 1
# ----
subscribe set ( id = 1, provider = 1, receiver = 2, forward = no);
_EOF_
37/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
slon will emit messages similar to the following, which copy initial data
form master to slave
CONFIG version for "dbname=for_slony host=172.16.214.163 port=5432 user=postgres" is 100700
CONFIG remoteWorkerThread_1: connected to provider DB
CONFIG remoteWorkerThread_1: prepare to copy table "public"."student"
CONFIG remoteWorkerThread_1: prepare to copy table "public"."teacher"
CONFIG remoteWorkerThread_1: all tables for set 1 found on subscriber
CONFIG remoteWorkerThread_1: copy table "public"."student"
CONFIG remoteWorkerThread_1: Begin COPY of table "public"."student"
NOTICE: truncate of "public"."student" succeeded
CONFIG remoteWorkerThread_1: 62 bytes copied for table "public"."student"
CONFIG remoteWorkerThread_1: 0.082 seconds to copy table "public"."student"
CONFIG remoteWorkerThread_1: copy table "public"."teacher"
CONFIG remoteWorkerThread_1: Begin COPY of table "public"."teacher"
NOTICE: truncate of "public"."teacher" succeeded
CONFIG remoteWorkerThread_1: 45 bytes copied for table "public"."teacher"
CONFIG remoteWorkerThread_1: 0.031 seconds to copy table "public"."teacher"
INFO remoteWorkerThread_1: copy_set SYNC found, use event seqno 5000000311.
INFO remoteWorkerThread_1: 0.018 seconds to build initial setsync status
INFO copy_set 1 done in 0.172 seconds
CONFIG enableSubscription: sub_set=1
CONFIG storeListen: li_origin=1 li_receiver=2 li_provider=1
CONFIG remoteWorkerThread_1: update provider configuration
CONFIG remoteWorkerThread_1: added active set 1 to provider 1
CONFIG version for "dbname=for_slony host=172.16.214.163 port=5432 user=postgres" is 100700
INFO remoteWorkerThread_1: SYNC 5000000297 done in 0.082 seconds
INFO remoteWorkerThread_1: SYNC 5000000301 done in 0.005 seconds
INFO remoteWorkerThread_1: SYNC 5000000309 done in 0.038 seconds
INFO remoteWorkerThread_1: SYNC 5000000310 done in 0.030 seconds
INFO remoteWorkerThread_1: SYNC 5000000311 done in 0.005 seconds
38/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Step 11: Check replicated data in slave
./psql -h $SLAVEHOST -p $SLAVEPORT -U $REPLICATIONUSER $SLAVEDBNAME
psql.bin (10.7)
Type "help" for help.
for_slony=# select * from teacher;
tid | tname | tsubject
-----+-------+----------
1 | Gary | Physics
2 | Karen | Maths
3 | Carol | History
(3 rows)
for_slony=# select * from student;
sid | sname | saddress
-----+--------+--------------
1 | Edward | Main Campus
2 | Linda | Girls Hostel
3 | Jason | Boys Hostel
(3 rows)
Step 12: Check insert operation on slave
for_slony=# insert into student values(4, 'David', 'Kent');
ERROR: Slony-I: Table student is replicated and cannot be modified on a
subscriber node - role=0
39/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Step 13: Try insert on master
insert into student values(4, 'David', 'Kent');
Check the log entry:
select * from _slony_example.sl_log_2;
------------+----------+-------------+---------------+------------------+
log_origin | log_txid | log_tableid | log_actionseq | log_tablenspname |
------------+----------+-------------+---------------+------------------+
1 | 5446 | 1 | 1 | public |
------------+----------+-------------+---------------+------------------+
------------------+-------------+-----------------+----------------------------------
log_tablerelname | log_cmdtype | log_cmdupdncols | log_cmdargs
------------------+-------------+-----------------+----------------------------------
student | I | 0 | {sid,4,sname,David,saddress,Kent}
------------------+-------------+-----------------+----------------------------------
for_slony=# select ctid,xmin, xmax, cmin, * from student;
ctid | xmin | xmax | cmin | sid | sname | saddress
-------+------+------+------+-----+--------+--------------
(0,1) | 560 | 0 | 0 | 1 | Edward | Main Campus
(0,2) | 561 | 0 | 0 | 2 | Linda | Girls Hostel
(0,3) | 562 | 0 | 0 | 3 | Jason | Boys Hostel
(0,4) | 5446 | 0 | 0 | 4 | David | Kent
(4 rows)
40/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Step 14: Try update on master
for_slony=# update student set saddress = 'Whales' where sid = 4;
UPDATE 1
select * from _slony_example.sl_log_1;
------------+----------+-------------+---------------+------------------+
log_origin | log_txid | log_tableid | log_actionseq | log_tablenspname |
------------+----------+-------------+---------------+------------------+
1 | 6239 | 1 | 2 | public |
------------+----------+-------------+---------------+------------------+
------------------+-------------+-----------------+------------------------
log_tablerelname | log_cmdtype | log_cmdupdncols | log_cmdargs
------------------+-------------+-----------------+------------------------
student | U | 1 | {saddress,Whales,sid,4}
------------------+-------------+-----------------+------------------------
Step 15: Check result on slave
for_slony=# select * from student;
sid | sname | saddress
-----+--------+--------------
1 | Edward | Main Campus
2 | Linda | Girls Hostel
3 | Jason | Boys Hostel
4 | David | Whales
(4 rows)
41/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Step 16: Try delete on master
for_slony=# delete from student where sid = 4;
DELETE 1
select * from _slony_example.sl_log_1;
------------+----------+-------------+---------------+------------------+
log_origin | log_txid | log_tableid | log_actionseq | log_tablenspname |
------------+----------+-------------+---------------+------------------+
1 | 6407 | 1 | 3 | public |
------------+----------+-------------+---------------+------------------+
------------------+-------------+-----------------+------------------------
log_tablerelname | log_cmdtype | log_cmdupdncols | log_cmdargs
------------------+-------------+-----------------+------------------------
student | D | 0 | {sid,4}
------------------+-------------+-----------------+------------------------
Step 17: Check result on slave
for_slony=# select * from student;
sid | sname | saddress
-----+--------+--------------
1 | Edward | Main Campus
2 | Linda | Girls Hostel
3 | Jason | Boys Hostel
(3 rows)
42/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
9.3.5 Steps to perform controlled switchover
A small slonik script can achieve a controlled switch over in which we switch roles of the two
nodes completely. The old master would now become slave and the old slave would be the new
master. Please note that this is a planned activity and it has nothing to do with any type of
failure.
#!/bin/sh
slonik <<_EOF_
cluster name = $CLUSTERNAME;
node 1 admin conninfo = 'dbname=$MASTERDBNAME host=$MASTERHOST port=$MASTERPORT user=$REPLICATIONUSER';
node 2 admin conninfo = 'dbname=$SLAVEDBNAME host=$SLAVEHOST port=$SLAVEPORT user=$REPLICATIONUSER';
lock set (id = 1, origin = 1);
wait for event (origin = 1, confirmed = 2, wait on=1);
move set (id = 1, old origin = 1, new origin = 2);
wait for event (origin = 1, confirmed = 2, wait on=1);
_EOF_
After the command runs the slony trigger definitions on tables in the set would have changed
on the new master. _slony_example_denyaccess & _slony_example_denyaccess triggers would
get disabled and _slony_example_logtrigger & _slony_example_truncatetrigger enabled on the
new master. Changes to the tables in the set would therefore be possible on the new master.
43/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
9.4 Introduction to WAL
9.4.1 What is WAL and Why is it required
In PostgreSQL all changes made by every transaction are first saved in a log file and then the
result of the transaction is sent to the initiating client. Data files are not changed on every
transaction. This is a standard mechanism to prevent data loss in case of situations like OS
crash, hardware failure, PostgreSQL crash etc. This mechanism is called Write Ahead
Logging and the log file is call Write Ahead Log (WAL).
Each change that the transaction performs (INSERT, UPDATE, DELETE, COMMIT) is
written in the log as a WAL record. WAL records are first written into an in-memory WAL
buffer. On transaction commit the records are written into a WAL segment file on the disk.
Log sequence number (LSN) of a WAL record represents the location/position where it is
saved in the log file. LSN is used as a unique id of the WAL record. Logically transaction log is
file whose size is 2^64 bytes. LSN is therefore a 64bit number represented as two 32 bit
hexadecimal numbers separated by a /. For example:
select pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
0/2BDBBD0
(1 row)
44/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
In the event of a system crash the database can recover committed transactions from the WAL.
While recovering PostgreSQL starts recovery from the last REDO point or checkpoint. A
checkpoint is a point in the transaction log at which all data files have been updated to reflect
the information in the log. The process of saving the WAL records from the log file to the
actual data files is called check-pointing.
Lets consider a case where database crashes after two transactions which perform one insert
each and WAL is used for recovery.
1. Assume a CHECKPOINT is issued which stores the location of the latest REDO point in the
current WAL segment. This also flushes all dirty pages in the shared buffer pool to the disk.
This guarantees that WAL records before the REDO point are no longer needed for recovery,
since all data has been flushed to the disk pages.
2. First INSERT statement is issued. The table’s page is loaded from disk to the buffer pool.
3. A tuple is inserted into the loaded page.
4. WAL record of this insert is saved into the WAL buffer at location LSN_1.
5. Update page LSN, which identifies WAL record for last change to this page, from LSN_0 to
LSN_1.
6. First COMMIT statement is issued.
7. WAL record of this commit action is written into the WAL buffer, and then, all WAL records
in the WAL buffer upto this page’s LSN are flushed to the WAL segment file.
8. For the second INSERT and commit steps 2 to 7 are repeated.
45/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Operation
performed
by the
client
CHECKPOINT BEGIN;
INSERT INTO TAB
VALUES (‘A’);
COMMIT; BEGIN;
INSERT INTO TAB
VALUES (‘B’);
COMMIT;
Shared
buffer pool
WAL
Buffer
WAL
Segment
Data files
containing
pages
46/96
TAB
LSN_0 LSN_1
A
A
LSN_1
COMMIT
A
LSN_1
COMMIT
TAB
LSN_1 LSN_2
A
B
LSN_2
COMMIT
B
B
LSN_2
COMMIT
TAB
LSN_0
TAB
LSN_0
REDO Point
CHECKPOINT
REDO Point
CHECKPOINT
Replication in PostgreSQL - Deep Dive EnterpriseDB
In the event of an operating system crash, all of data on the shared buffer pool will be lost,
however all modifications of the page have been written into the WAL segment files as history
data. The following steps show how our database cluster can recover back to the state
immediately before the crash using WAL records. There is no need to do anything special,
since PostgreSQL will automatically enter into the recovery-mode after restarting.
1. PostgreSQL reads the WAL record of the first INSERT statement from the appropriate WAL
segment file.
2. PostgreSQL loads the table's page from the database cluster into the shared buffer pool.
3. PostgreSQL compares the WAL record's LSN (LSN_1) with page LSN (LSN_0). Since
LSN_1 is greater than LSN_0, the tuple in the WAL record is inserted into the page and page's
LSN is updated to LSN_1.
The remaining WAL records are replayed in the similar manner.
Shared
Buffer
pool
WAL Segment
Data
files
containing
pages
47/96
TAB
LSN_0
TAB
LSN_0 LSN_1
A
TAB
LSN_1 LSN_2
AB
TAB
LSN_0
A
LSN_1
COMMIT B
LSN_2
COMMIT
REDO
Point
Replication in PostgreSQL - Deep Dive EnterpriseDB
9.4.2 Transaction Log and WAL Segment Files
In PostgreSQL transaction log is a virtual file with a capacity of 8-byte length.
Physically the log is divided into 16 Mega byte files, each of which is called a WAL
segment.
WAL segment file name is a 24 digit number with the naming rule as follows:
Assuming that current time line ID is 0x00000001 the first WAL segment file names
will be
00000001 00000000 0000000
00000001 00000000 0000001
00000001 00000000 0000002
……….
00000001 00000001 0000000
00000001 00000001 0000001
00000001 00000001 0000002
…………
00000001 FFFFFFFF FFFFFFFD
00000001 FFFFFFFF FFFFFFFE
00000001 FFFFFFFF FFFFFFFF
For Example:
select pg_walfile_name('0/2BDBBD0');
pg_walfile_name
--------------------------
000000010000000000000002
9.4.3 WAL Writer
WAL writer is a background process to check WAL buffer periodically and write all
unwritten WAL records to into the WAL segments. WAL writer avoids burst of IO
activity and spans it over time with little amount of IO activity. The configuration
48/96
Timeline ID
8 digits
Log Sequence Number / 256
8 digits
Log Sequence Number % 256
8 digits
Replication in PostgreSQL - Deep Dive EnterpriseDB
parameter wal_writer_delay controls how often WAL writer flushes the WAL, with
default value of 200 ms.
9.4.4 WAL Segment File Management
WAL segment files are stored in pg_wal sub-directory. PostgreSQL switches for a new
WAL segment file under the following conditions:
1. WAL segment has been filled up.
2. The function pg_switch_wal has been issued.
3. archive_mode is enabled and the time set to archive_timeout has been exceeded.
Switched WAL files can either be removed or recycled i.e. renamed and reused for
future. The number of WAL files that the server would retain at any point in time
depends on server configuration as well as server activity.
Whenever the checkpoint starts, PostgreSQL estimates and prepares the number of WAL
segment files required for this checkpoint cycle. Such estimate is made with regards to
the numbers of files consumed in previous checkpoint cycles. They are counted from the
segment that contains the prior REDO point, and the value is to be between
min_wal_size (by default, 80 MB, i.e. 5 files) and max_wal_size (1 GB, i.e. 64 files). If
a checkpoint starts, necessary files will be held and recycled, while the unnecessary ones
removed.
A specific example is shown in the diagram below. Assuming that there are six files
before checkpoint starts, WAL_3 contains the prior REDO point (or REDO point in
version 11), and PostgreSQL estimates that five files are needed. In this case, WAL_1
will be renamed as WAL_7 for recycling and WAL_2 will be removed.
49/96
WAL_1 WAL_2 WAL_3 WAL_4 WAL_5 WAL_6
WAL_1 WAL_2 WAL_3 WAL_4 WAL_5 WAL_6 WAL_7
REDO Point
Estimated number of WAL segments needed by server
Unneeded file
WAL_1 renamed as WAL_7 to be re-used
Replication in PostgreSQL - Deep Dive EnterpriseDB
9.4.5 WAL Example
Step 1:SELECT datname, oid FROM pg_database WHERE datname = 'postgres';
datname | oid
----------+-------
postgres | 15709
(1 row)
Note the database OID i.e. 15709
Step 2: SELECT oid,* from pg_tablespace;
oid | spcname | spcowner | spcacl | spcoptions
------+------------+----------+--------+------------
1663 | pg_default | 10 | |
1664 | pg_global | 10 | |
(2 rows)
Note the table space OID i.e. 1663
Step 3: SELECT pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
0/1C420B8
(1 row)
Note the LSN i.e. 0/1C420B8
Step 4: CREATE TABLE abc(a VARCHAR(10));
Step 5: SELECT pg_relation_filepath('abc');
pg_relation_filepath
----------------------
base/15709/16384
(1 row)
Note the relation file name base/15709/16384
50/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Step 6: ./pg_waldump --path=/tmp/sd/pg_wal –start=0/1C420B8
and use the Start LSN noted in step 3.
Note that the WAL contains the instruction to create physical file
15709 → database postgres → noted in step 1
16384 → table abc → noted in step 5
rmgr Len(rec
/tot)
tx lsn prev desc
XLOG 30/ 30 0 0/01C420B8 0/01C42080 NEXTOID 24576
Storage 42/ 42 0 0/01C420D8 0/01C420B8 CREATE base/15709/16384
Heap 203/203 1216 0/01C42108 0/01C420D8 INSERT off 2, blkref #0: rel 1663/15709/1247 blk 0
Btree 64/ 64 1216 0/01C421D8 0/01C42108 INSERT_LEAF off 298, blkref #0: rel 1663/15709/2703 blk 2
Btree 64/ 64 1216 0/01C42218 0/01C421D8 INSERT_LEAF off 7, blkref #0: rel 1663/15709/2704 blk 5
Heap 80/ 80 1216 0/01C42258 0/01C42218 INSERT off 30, blkref #0: rel 1663/15709/2608 blk 9
Btree 72/ 72 1216 0/01C422A8 0/01C42258 INSERT_LEAF off 243, blkref #0: rel 1663/15709/2673 blk 51
Btree 72/ 72 1216 0/01C422F0 0/01C422A8 INSERT_LEAF off 170, blkref #0: rel 1663/15709/2674 blk 61
Heap 203/203 1216 0/01C42338 0/01C422F0 INSERT off 6, blkref #0: rel 1663/15709/1247 blk 1
Btree 64/64 1216 0/01C42408 0/01C42338 INSERT_LEAF off 298, blkref #0: rel 1663/15709/2703 blk 2
Btree 72/ 72 1216 0/01C42448 0/01C42408 INSERT_LEAF off 3, blkref #0: rel 1663/15709/2704 blk 1
Heap 80/ 80 1216 0/01C42490 0/01C42448 INSERT off 36, blkref #0: rel 1663/15709/2608 blk 9
Btree 72/ 72 1216 0/01C424E0 0/01C42490 INSERT_LEAF off 243, blkref #0: rel 1663/15709/2673 blk 51
Btree 72/ 72 1216 0/01C42528 0/01C424E0 INSERT_LEAF off 97, blkref #0: rel 1663/15709/2674 blk 57
Heap 199/199 1216 0/01C42570 0/01C42528 INSERT off 2, blkref #0: rel 1663/15709/1259 blk 0
Btree 64/ 64 1216 0/01C42638 0/01C42570 INSERT_LEAF off 257, blkref #0: rel 1663/15709/2662 blk 2
Btree 64/ 64 1216 0/01C42678 0/01C42638 INSERT_LEAF off 8, blkref #0: rel 1663/15709/2663 blk 1
Btree 64/ 64 1216 0/01C426B8 0/01C42678 INSERT_LEAF off 217, blkref #0: rel 1663/15709/3455 blk 5
Heap 171/171 1216 0/01C426F8 0/01C426B8 INSERT off 53, blkref #0: rel 1663/15709/1249 blk 16
Btree 64/ 64 1216 0/01C427A8 0/01C426F8 INSERT_LEAF off 185, blkref #0: rel 1663/15709/2658 blk 25
Btree 64/ 64 1216 0/01C427E8 0/01C427A8 INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16
Heap 171/171 1216 0/01C42828 0/01C427E8 INSERT off 54, blkref #0: rel 1663/15709/1249 blk 16
Btree 72/ 72 1216 0/01C428D8 0/01C42828 INSERT_LEAF off 186, blkref #0: rel 1663/15709/2658 blk 25
Btree 64/ 64 1216 0/01C42920 0/01C428D8 INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16
51/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Heap 171/171 1216 0/01C42960 0/01C42920 INSERT off 55, blkref #0: rel 1663/15709/1249 blk 16
Btree 72/ 72 1216 0/01C42A10 0/01C42960 INSERT_LEAF off 187, blkref #0: rel 1663/15709/2658 blk 25
Btree 64/ 64 1216 0/01C42A58 0/01C42A10 INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16
Heap 171/171 1216 0/01C42A98 0/01C42A58 INSERT off 1, blkref #0: rel 1663/15709/1249 blk 17
Btree 72/ 72 1216 0/01C42B48 0/01C42A98 INSERT_LEAF off 186, blkref #0: rel 1663/15709/2658 blk 25
Btree 64/ 64 1216 0/01C42B90 0/01C42B48 INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16
Heap 171/171 1216 0/01C42BD0 0/01C42B90 INSERT off 3, blkref #0: rel 1663/15709/1249 blk 17
Btree 72/ 72 1216 0/01C42C80 0/01C42BD0 INSERT_LEAF off 188, blkref #0: rel 1663/15709/2658 blk 25
Btree 64/ 64 1216 0/01C42CC8 0/01C42C80 INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16
Heap 171/171 1216 0/01C42D08 0/01C42CC8 INSERT off 5, blkref #0: rel 1663/15709/1249 blk 17
Btree 72/ 72 1216 0/01C42DB8 0/01C42D08 INSERT_LEAF off 186, blkref #0: rel 1663/15709/2658 blk 25
Btree 64/ 64 1216 0/01C42E00 0/01C42DB8 INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16
Heap 171/171 1216 0/01C42E40 0/01C42E00 INSERT off 30, blkref #0: rel 1663/15709/1249 blk 32
Btree 72/ 72 1216 0/01C42EF0 0/01C42E40 INSERT_LEAF off 189, blkref #0: rel 1663/15709/2658 blk 25
Btree 64/ 64 1216 0/01C42F38 0/01C42EF0 INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16
Heap 80/ 80 1216 0/01C42F78 0/01C42F38 INSERT off 25, blkref #0: rel 1663/15709/2608 blk 11
Btree 72/ 72 1216 0/01C42FC8 0/01C42F78 INSERT_LEAF off 131, blkref #0: rel 1663/15709/2673 blk 44
Btree 72/ 72 1216 0/01C43010 0/01C42FC8 INSERT_LEAF off 66, blkref #0: rel 1663/15709/2674 blk 46
Standby 42/ 42 1216 0/01C43058 0/01C43010 LOCK xid 1216 db 15709 rel 16384
Txn 405/405 1216 0/01C43088 0/01C43058 COMMIT 2019-03-04 07:42:23.165514 EST;... snapshot 2608
relcache 16384
Standby 50/ 50 0 0/01C43220 0/01C43088 RUNNING_XACTS nextXid 1217 latestCompletedXid 1216
oldestRunningXid 1217
Step 7: SELECT pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
0/1C43258
(1 row)
Step 8: INSERT INTO abc VALUES('pkn');
52/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Step 9: ./pg_waldump --path=/tmp/sd/pg_wal --start=0/1C43258
and use start LSN from step 7.
1663 → pg_default tablespace → noted in step 2
15709 → database postgres → noted in step 1
16384 → table abc → noted in step 5
rmgr Len
(rec/
tot)
tx lsn prev desc
Heap 59/59 1217 0/01C43258 0/01C43220 INSERT+INIT off 1, blkref #0: rel 1663/15709/16384 blk 0
Transaction 34/34 1217 0/01C43298 0/01C43258 COMMIT 2019-03-04 07:43:45.887511 EST
Standby 54/54 0 0/01C432C0 0/01C43298 RUNNING_XACTS nextXid 1218 latestCompletedXid 1216
oldestRunningXid 1217; 1 xacts: 1217
Step 10: SELECT pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
0/1C432F8
(1 row)
Step 11: INSERT INTO abc VALUES('ujy');
Step 12: ./pg_waldump --path=/tmp/sd/pg_wal –start=0/1C432F8
and use start LSN as noted in step 10.
rmgr Len
(rec/
tot)
tx lsn prev desc
Heap 59/59 1218 0/01C432F8 0/01C432C0 INSERT off 2, blkref #0: rel 1663/15709/16384 blk 0
Transaction 34/34 1218 0/01C43338 0/01C432F8 COMMIT 2019-03-04 07:44:25.449151 EST
Standby 50/50 0 0/01C43360 0/01C43338 RUNNING_XACTS nextXid 1219 latestCompletedXid 1218
oldestRunningXid 1219
53/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Step 13: Check the actual tuples in the WAL segment files.
---------+---------------------------------------------------+----------------+
Offset | Hex Bytes | ASCII chars |
---------+---------------------------------------------------+----------------+
00000060 | 3b 00 00 00 c3 04 00 00 28 00 40 02 00 00 00 00 |;.......(.@.....|
00000070 | 00 0a 00 00 ec 28 75 6e 00 20 0a 00 7f 06 00 00 |.....(un. ......|
00000080 | 5d 3d 00 00 00 40 00 00 00 00 00 00 ff 03 01 00 |]=...@..........|
00000090 | 02 08 18 00 09 70 6b 6e 03 00 00 00 00 00 00 00 |.....pkn........|
000000a0 | 22 00 00 00 c3 04 00 00 60 00 40 02 00 00 00 00 |".......`.@.....|
000000b0 | 00 01 00 00 dd 4c 87 04 ff 08 e4 73 44 e7 41 26 |.....L.....sD.A&|
000000c0 | 02 00 00 00 00 00 00 00 32 00 00 00 00 00 00 00 |........2.......|
000000d0 | a0 00 40 02 00 00 00 00 10 08 00 00 9e 01 36 88 |..@...........6.|
000000e0 | ff 18 00 00 00 00 00 00 00 00 00 03 00 00 c4 04 |................|
000000f0 | 00 00 c4 04 00 00 c3 04 00 00 00 00 00 00 00 00 |................|
00000100 | 3b 00 00 00 c4 04 00 00 c8 00 40 02 00 00 00 00 |;.........@.....|
00000110 | 00 0a 00 00 33 df b4 71 00 20 0a 00 7f 06 00 00 |....3..q. ......|
00000120 | 5d 3d 00 00 00 40 00 00 00 00 00 00 ff 03 01 00 |]=...@..........|
00000130 | 02 08 18 00 09 75 6a 79 04 00 00 00 00 00 00 00 |.....ujy........|
00000140 | 22 00 00 00 c4 04 00 00 00 01 40 02 00 00 00 00 |".........@.....|
00000150 | 00 01 00 00 96 2e 96 a6 ff 08 d8 f3 79 ed 41 26 |............y.A&|
00000160 | 02 00 00 00 00 00 00 00 32 00 00 00 00 00 00 00 |........2.......|
00000170 | 40 01 40 02 00 00 00 00 10 08 00 00 eb 6b 95 36 |@.@..........k.6|
00000180 | ff 18 00 00 00 00 00 00 00 00 00 03 00 00 c5 04 |................|
00000190 | 00 00 c5 04 00 00 c4 04 00 00 00 00 00 00 00 00 |................|
000001a0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
54/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
9.4.6 Overview of Replication Options based on WAL
Continuous WALArchiving
Copying WAL files as they are generated, into any location other than pg_wall sub-
directory for the purpose of archiving them is called WAL archiving. To archive a script
provided by the user is invoked by PostgreSQL each time a WAL file is generated. The
script can use scp command to copy the file to one or more locations. The location can
be an NFS mount. Once archived the WAL segment files can be used to recover
database to any specified point in time.
Log Shipping Based Replication - File Level
The process of copying log files to another PostgreSQL server for the purpose of
creating another standby server by replaying WAL files is called Log Shipping. This
server is configured to be in recovery mode. The sole purpose of this server is to apply
any new WAL files that they arrive. This second server then, becomes a warm backup of
the primary PostgreSQL server also termed as standby. The standby can also be
configured to be a read replica, where it can also serve read-only queries. This is called a
hot standby.
Log Shipping Based Replication - Block Level
Streaming replication improves the log shipping process. Instead of waiting for the
WAL switch the records are sent as and when they are generated thus improving
replication delay. The second improvement is that the standby server will connect to the
primary server over the network, using a replication protocol. The primary server can
then send WAL records directly over this connection without having to rely on scripts
provided by the end user.
How long the primary should retain WAL segment files?
Without any streaming replication clients, the server can discard/recycle the WAL
segment file once the archive script reports success, unless they are not required for
crash recovery.
In the presence of standby clients though, there is a problem : The server needs to keep
around WAL files long enough for as long as the slowest standby needs them. If the
standby, that was taken down for a while, comes back online and asks the primary for a
WAL file that the primary no longer has, then the replication fails with an error similar
to:
ERROR: requested WAL segment 00000001000000010000002D has
55/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
already been removed
The primary should therefore keep track of how far behind the standby is, and to not
delete/recycle WAL files that any standbys still need. This feature is provided through
replication slots.
Each replication slot has a name which is used to identify the slot. Each slot is
associated with:
(a) The oldest WAL segment file required by the consumer of the slot. WAL segment
files later than this are not deleted/recycled during checkpoints.
(b) The oldest transaction ID required to be retained by the consumer of the slot. Rows
needed by any transactions later than this are not deleted by vacuum.
56/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
9.5 Log Shipping Based - File Level
9.5.1 Setup
The setup consists of two CentOS 7 machines on which PostgreSQL version 10.7 is
installed. Both systems are loosely coupled, sharing only the WAL archive.
9.5.2 Configuring Replication using Log Shipping
Step 1: Disable and stop firewall on both the machines
sudo firewall-cmd --state
sudo systemctl stop firewalld
sudo systemctl disable firewalld
sudo systemctl mask --now firewalld
Step 2: Create a folder on standby that will archive WALs received form the server
sudo mkdir /opt/PostgreSQL/10/from_primary
sudo chown postgres:postgres /opt/PostgreSQL/10/from_primary
In /etc/passwd change home directory of user postgres to
/opt/PostgreSQL/10/from_primary
Step 3: Change home directory of postgres user on Primary
sudo mkdir /opt/PostgreSQL/10/home
sudo chown postgres:postgres /opt/PostgreSQL/10/home/
57/96
Primary
Standby
PostgreSQLPostgreSQL
WAL Archive
archive_command
Copies WAL files from
pg_wal to Archive
restore_command
Copies WAL files from
Archive to pg_wal
Replication in PostgreSQL - Deep Dive EnterpriseDB
In /etc/passwd change home directory of user postgres to /opt/PostgreSQL/10/home/
Step 4: Configure password-less ssh & scp between Primary and Standby
Login as postgres user on Primary
su - postgres
Password:
Last login: Fri Feb 22 05:54:11 EST 2019 on pts/0
Generate public – private key pair on Primary
-bash-4.2$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/opt/PostgreSQL/10/home/.ssh/id_rsa):
Created directory '/opt/PostgreSQL/10/home/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /opt/PostgreSQL/10/home/.ssh/id_rsa.
Your public key has been saved in /opt/PostgreSQL/10/home/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:jqjjYf8OcKp4tgtfPcLWG6liAot660/4CLrIRq01BqI
postgres@localhost.localdomain
The key's randomart image is:
+---[RSA 2048]----+
| |
| |
| |
|.. |
|o + . S |
|E. @ + + |
|*.O X B . |
|OOBX + + |
|X@XB=o+ |
+----[SHA256]-----+
58/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Copy the key to standby and add it to authorized_keys
-bash-4.2$ ssh-copy-id -i ~/.ssh/id_rsa.pub postgres@172.16.214.165
/bin/ssh-copy-id: INFO: Source of key(s) to be installed:
"/opt/PostgreSQL/10/home/.ssh/id_rsa.pub"
The authenticity of host '172.16.214.165 (172.16.214.165)' can't be established.
ECDSA key fingerprint is SHA256:VsSASWJWx6v7CvSbH8hjnzX6AFBn0vNimsAj0Wcih84.
ECDSA key fingerprint is MD5:ad:0c:42:f1:88:3f:f4:f9:8f:59:bf:e4:85:dc:15:b6.
Are you sure you want to continue connecting (yes/no)?
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out
any that are already installed
The authenticity of host '172.16.214.165 (172.16.214.165)' can't be established.
ECDSA key fingerprint is SHA256:VsSASWJWx6v7CvSbH8hjnzX6AFBn0vNimsAj0Wcih84.
ECDSA key fingerprint is MD5:ad:0c:42:f1:88:3f:f4:f9:8f:59:bf:e4:85:dc:15:b6.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are
prompted now it is to install the new keys
postgres@172.16.214.165's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'postgres@172.16.214.165'"
and check to make sure that only the key(s) you wanted were added.
Test password less SSH
-bash-4.2$ ssh postgres@172.16.214.165
Last login: Fri Feb 22 05:53:39 2019
-bash-4.2$ exit
logout
Connection to 172.16.214.165 closed.
-bash-4.2$
Test password less SCP
From Primary try this
su - postgres
-bash-4.2$scp 1.txt postgres@172.16.214.165:/opt/PostgreSQL/10/from_primary
1.txt
100% 3446 3.3MB/s 00:00
59/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Check on standby
su - postgres
-bash-4.2$ pwd
/opt/PostgreSQL/10/from_primary
-bash-4.2$ ls -l
total 4
-rw-r--r--. 1 postgres postgres 3446 Feb 22 08:17 1.txt
Step 5: Update the postgresql.conf file on primary
wal_level = replica
archive_mode = on
archive_command = 'if ssh postgres@172.16.214.165 test ! -f
"/opt/PostgreSQL/10/from_primary/%f" ; then scp %p
postgres@172.16.214.165:/opt/PostgreSQL/10/from_primary/; fi'
The archive_command will be executed every time a new WAL file is generated. This
archive command uses two place holders
%p : The complete path of the WAL file along with its name
%f : The name of the WAL file
This command tests to make sure that the WAL is not present on the standby and if it is
not present, it copies the WAL file to the archive folder.
Step 6: Create database & tables on primary
./createdb test_db
CREATE TABLE student(sid INT PRIMARY KEY, sname VARCHAR(255), saddress VARCHAR(255));
CREATE TABLE teacher(tid INT PRIMARY KEY, tname VARCHAR(255), tsubject VARCHAR(255));
INSERT INTO student VALUES(1, 'Edward', 'Main Campus');
INSERT INTO student VALUES(2, 'Linda', 'Girls Hostel');
INSERT INTO student VALUES(3, 'Jason', 'Boys Hostel');
INSERT INTO teacher VALUES(1, 'Gary', 'Physics');
INSERT INTO teacher VALUES(2, 'Karen', 'Maths');
INSERT INTO teacher VALUES(3, 'Carol', 'History');
60/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Step 7: Take base backup using the command
./pg_basebackup --pgdata=/opt/PostgreSQL/10/for_standby/ --format=p
--write-recovery-conf --checkpoint=fast --label=for_test --progress
--verbose --host=localhost --port=5432 --username=postgres
Password:
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
32578/32578 kB (100%), 1/1 tablespace
pg_basebackup: write-ahead log end point: 0/20000F8
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: base backup completed
--pgdata : Target folder for the base backup
--format : plain
Step 8: Modify the recovery.conf in the base backup
standby_mode = 'on'
restore_command = 'cp "/opt/PostgreSQL/10/from_primary/%f" "%p"'
The restore_command is invoked by the standby server periodically. Our restore
command copies the newly arrived WAL file to the pg_wal folder of the stand by server.
Step 9: Transfer the base backup to the standby server
sudo mkdir /opt/PostgreSQL/10/bb_data/
sudo mv /tmp/for_standby.tar.gz /opt/PostgreSQL/10/bb_data/
sudo chown postgres:postgres /opt/PostgreSQL/10/bb_data/
sudo chown postgres:postgres /opt/PostgreSQL/10/bb_data/for_standby.tar.gz
sudo chmod 700 /opt/PostgreSQL/10/bb_data/
sudo chmod 700 /opt/PostgreSQL/10/bb_data/for_standby
Step 10: Unzip base backup
su – postgres
cd bb_data/
-bash-4.2$ ls -l
total 3840
61/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
-rw-r--r--. 1 postgres postgres 3930538 Feb 23 01:40 for_standby.tar.gz
-bash-4.2$ tar -xvf for_standby.tar.gz
Step 11: Start the stand by server
-bash-4.2$ ./postgres -D ../bb_data/for_standby/ -p 5432
Step 12: Test standby server
./psql -p 5432 test_db -U postgres
Password for user postgres:
psql.bin (10.7)
Type "help" for help.
test_db=# d+
List of relations
Schema | Name | Type | Owner | Size | Description
--------+---------+-------+----------+-------+-------------
public | student | table | postgres | 16 kB |
public | teacher | table | postgres | 16 kB |
(2 rows)
test_db=# select * from student;
sid | sname | saddress
-----+--------+--------------
1 | Edward | Main Campus
2 | Linda | Girls Hostel
3 | Jason | Boys Hostel
(3 rows)
test_db=# select * from teacher;
tid | tname | tsubject
-----+-------+----------
1 | Gary | Physics
2 | Karen | Maths
3 | Carol | History
(3 rows)
test_db=# insert into student values(4, 'any');
ERROR: cannot execute INSERT in a read-only transaction
62/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Step 13: Restart primary and create a few new tables in the test database
CREATE TABLE test_tab AS SELECT * FROM GENERATE_SERIES(1, 100000) AS id;
SELECT 100000
CREATE TABLE another_tab AS SELECT * FROM GENERATE_SERIES(1, 100000) AS id;
SELECT 100000
Step 14: Force a WAL file switch
test_db=# select pg_switch_wal();
pg_switch_wal
---------------
0/3C58940
(1 row)
Step 15: Check WAL file on standby
-bash-4.2$ pwd
/opt/PostgreSQL/10/from_primary
-bash-4.2$ ls -l
total 16388
-rw-------. 1 postgres postgres 16777216 Feb 23 02:08 000000010000000000000003
-rw-r--r--. 1 postgres postgres 3446 Feb 22 08:17 1.txt
-bash-4.2$ pwd
/opt/PostgreSQL/10/bb_data/for_standby/pg_wal
-bash-4.2$ ls -l
total 32768
-rw-------. 1 postgres postgres 16777216 Feb 22 22:05 000000010000000000000002
-rw-------. 1 postgres postgres 16777216 Feb 23 02:08 000000010000000000000003
drwx------. 2 postgres postgres 43 Feb 23 02:08 archive_status
Step 16: Check the tables on the standby
test_db=# d+
List of relations
Schema | Name | Type | Owner | Size | Description
--------+-------------+-------+----------+---------+-------------
public | another_tab | table | postgres | 3568 kB |
63/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
public | student | table | postgres | 16 kB |
public | teacher | table | postgres | 16 kB |
public | test_tab | table | postgres | 3568 kB |
(4 rows)
9.5.3 Steps to perform Failover
Step 1: Simulate a primary server problem
Stop the server
Step 2: Promote the standby
-bash-4.2$ ./pg_ctl promote -D ../bb_data/for_standby/
waiting for server to promote.... done
server promoted
Step 3: Check standby
[abbas@localhost bin]$ ./psql -p 5432 test_db -U postgres
Password for user postgres:
psql.bin (10.7)
Type "help" for help.
test_db=# insert into student values(4, 'any');
INSERT 0 1
test_db=#
64/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
9.6 Log Shipping Based - Block Level
9.6.1 Physical Streaming Replication
In streaming replication the standby server connects to the primary server and receives
WAL records using a replication protocol. This provides two advantages:
1. The standby server does not need to wait for the WAL file to fill up, hence replication
lag is improved.
2. The dependency on the user provided script and an intermediate shared storage
between the servers is removed.
9.6.2 WAL Sender & WAL Receiver
A process called WAL receiver, running on the standby server connects, using the
connection details provided in primary_conninfo parameter of recovery.conf, to the
primary server using a TCP/IP connection. In the primary server another process called
WAL sender, is in charge of sending the WAL records to the standby server as and
when they are generated. WAL receiver saves the WAL records in WAL as if they were
generated by client activity of locally connected clients. Once WAL records reach WAL
segment files the stand by server constantly keeps replaying the WAL so that standby
and primary are up to date.
65/96
PostgreSQL Primary
Primary
Standby
PostgreSQL Standby
WAL
Sender
WAL
Receiver
WAL Records
WAL Records
W1 W2 W3 W4
WAL Records
W1 W2 W3 W4
Replication in PostgreSQL - Deep Dive EnterpriseDB
9.6.3 WAL Streaming Protocol Details
66/96
Start up Request
What is server's authentication scheme?
While we are asking this question please note
Parameter Name Parameter Value
user postgres
database replication
replication true ← Instructs to start WAL Sender process for this client
application_name walreceiver
Server is expecting password in MD5 format
52 00 00 00 0c 00 00 00 05 04 43 16 6a
Authentication Request Length md5 password salt generated by server
Password response
70 00 00 00 0b md5b094d71396249f3ca84a23b86d4ee7b9
Password response Length MD5 Password terminated by null
MD5 password is computed by md5(md5(password || username), salt)
Standby Server
WAL Receiver
Primary Server
WAL Sender
Authentication Reply
52 00 00 00 08 00 00 00 00
Authentication Reply Length User authenticated
Status Parameters
'S'|Length 4 bytes|Param Name | Param Value
Simple Query : IDENTIFY_SYSTEM
Query Response
WAL Receiver verifies that the systemid in response is same as in base backup
systemid timeline logpos dbname
6661510093306984809 1 0/3000140
Simple Query : START_REPLICATION SLOT "node_a_slot" 0/3000000 TIMELINE 1
WAL Data as CopyData messages
'd'|Length 4 bytes| WAL Data
Query Response
Server responds with CopyBothResponse ‘W’, and starts to stream WAL
'W'|Length 4 bytes|COPY format is Textual | Copy Data has 0 columns
Replication in PostgreSQL - Deep Dive EnterpriseDB
9.6.4 Setup
The setup consists of two CentOS 7 machines connected via LAN on which PostgreSQL
version 10.7 is installed.
9.6.5 Configuring PostgreSQL Replication using WAL Streaming
Step 1: Disable and stop firewall on both the machines
sudo firewall-cmd --state
sudo systemctl stop firewalld
sudo systemctl disable firewalld
sudo systemctl mask --now firewalld
Step 2: On primary allow replication connections & connections from the same
network. Modify pg_hba.conf.
Local all all md5
host all all 172.16.214.167/24 md5
host all all ::1/128 md5
local replication all md5
host replication all 172.16.214.167/24 md5
host replication all ::1/128 md5
Step 3: On primary edit postgresql.conf to modify the following parameters
max_wal_senders = 10
wal_level = replica
max_replication_slots = 10
synchronous_commit = on
synchronous_standby_names = '*'
listen_addresses = '*'
67/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Step 4: Start the primary server
./postgres -D ../pr_data -p 5432
Step 5: Take base backup to boost strap the stand by server
./pg_basebackup
--pgdata=/tmp/sb_data/
--format=p
--write-recovery-conf
--checkpoint=fast
--label=mffb
--progress
--verbose
--host=172.16.214.167
--port=5432
--username=postgres
Step 6: Check the base backup label file
START WAL LOCATION: 0/2000028 (file 000000010000000000000002)
CHECKPOINT LOCATION: 0/2000060
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2019-02-24 05:25:30 EST
LABEL: mffb
Step 7: In the base backup, add the following line in the recovery.conf
primary_slot_name = 'node_a_slot'
Step 8: Check the /tmp/sb_data/recovery.conf file
standby_mode = 'on'
primary_conninfo = 'user=enterprisedb
password=abc123
host=172.16.214.167
port=5432
68/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
sslmode=prefer
sslcompression=1
krbsrvname=postgres
target_session_attrs=any'
primary_slot_name = 'node_a_slot'
Step 9: Connect to the primary server and issue this command
edb=# SELECT * FROM pg_create_physical_replication_slot('node_a_slot');
slot_name | xlog_position
-------------+---------------
node_a_slot |
(1 row)
edb=# SELECT slot_name, slot_type, active FROM pg_replication_slots;
slot_name | slot_type | active
-------------+-----------+--------
node_a_slot | physical | f
(1 row)
Step 10: Transfer the base backup to the standby server
scp /tmp/sb_data.tar.gz abbas@172.16.214.166:/tmp
sudo mv /tmp/sb_data /opt/PostgreSQL/10/
sudo chown postgres:postgres /opt/PostgreSQL/10/sb_data/
sudo chown -R postgres:postgres /opt/PostgreSQL/10/sb_data/
sudo chmod 700 /opt/PostgreSQL/10/sb_data/
Step 11: Start the stand-by server
./postgres -D ../sb_data/ -p 5432
The primary will show this in log
LOG: standby "walreceiver" is now a synchronous standby with priority 1
69/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
The standby will show
LOG: database system was interrupted; last known up at 2018-10-24 15:49:55
LOG: entering standby mode
LOG: redo starts at 0/3000028
LOG: consistent recovery state reached at 0/30000F8
LOG: started streaming WAL from primary at 0/4000000 on timeline 1
Step 12: Connect to primary server and issue some simple commands
-bash-4.2$ ./edb-psql -p 9666 edb
Password:
psql.bin (9.6.10.17)
Type "help" for help.
create table abc(a int, b varchar(250));
insert into abc values(1,'One');
insert into abc values(2,'Two');
insert into abc values(3,'Three');
Step 13: Check data on slave
./psql -p 5432 -U postgres postgres
Password for user postgres:
psql.bin (10.7)
Type "help" for help.
postgres=# select * from abc;
a | b
---+-------
1 | One
2 | Two
3 | Three
(3 rows)
70/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
9.6.6 Steps to perform Failover
Step 1: Crash the primary server
Step 2: Promote the stand by server
./pg_ctl promote -D ../sb_data/
server promoting
Step 3: Connect to the promoted stand by server and insert a row
-bash-4.2$ ./edb-psql -p 9777 edb
Password:
psql.bin (9.6.10.17)
Type "help" for help.
edb=# insert into abc values(4,'Four');
71/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
9.7 Logical Decoding Based
9.7.1 What is Logical Replication
Physical Streaming replication as described in section 9.6 creates byte-by-byte read-only
replica of the primary server. The replica contains all databases, tables, roles,
tablespaces etc. With streaming replication we get all or nothing. What if we want a
replica of only a single table? This is where Logical Replication comes into play.
Logical replication can replay DML operations happening on a subset of tables in a
primary server on a stand-by server by:
*) Logically decoding WAL records
*) Streaming them over to stand-by server
*) Apply them to the table in the standby server in the correct transnational order
72/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
9.7.2 Comparison of Physical and Logical Replication
Feature Physical
Replication
Logical
Replication
Replica will be read only Yes No
Replica will contain every thing Yes No
Replica can contain a subset of data in the primary No Yes
Triggers will fire on DML operations No Yes
Will work across different PostgreSQL versions No Yes
Will work across different Operating Systems No Yes
Table in the standby can have extra columns, indexes or security No Yes
DML operations are possible on tables in stand-by No Yes
Will work even if table has no primary key Yes No
DDL commands are replicated to standby Yes No
Sequence data is replicated to standby Yes No
TRUNCATE command is replicated to the standby Yes No
Large objects are replicated to the standby Yes No
Constraint validation is performed on standby No Yes
Standby needs base backup of the primary server Yes No
DML operations can be filtered before sending to standby No Yes
Tables have to be created with the same name on standby manually No Yes
73/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
9.7.3 Publication & Subscription
Logical replication defines two entities: A publisher and a subscriber. A publisher is a
node that defines a certain group of tables, called a publication, to which a subscriber
can subscribe, by creating a subscription, to receive the changes to that particular group
of tables.
74/96
PostgreSQL
Publisher Subscriber
PostgreSQL
Apply
Decoded &
Filtered WAL
Records
WAL Records
W1 W2 W3 W4
WAL
Records
W1 W2 W3 W4
Logical
Decoding
Plugin
WAL Sender
Tables
Tables
Publication
Subscription
Replication in PostgreSQL - Deep Dive EnterpriseDB
9.7.4 Logical Decoding Plugin
In order to transform WAL internal representation to any format that can be used by the
client, a plugin can be installed into PostgreSQL. The plugin is supposed to implement
well defined call back functions which are called by the logical decoding framework at
appropriate time to allow the plugin to perform the format conversion. A plugin for
example can convert WAL records to SQL statements:
9.7.5 Logical replication slots
Logical replication slots are supposed to be consumed by logical replication. Physical
replication slots, being physical, work at the cluster level and are used to stream cluster
level changes to the standby. Logical replication slots on the other hand stream sequence
of changes from a single database. Each logical slot needs a decoding plugin that will be
used to transform the WAL records to a format required by the consumer.
9.7.6 test_decoding and pg_recvlogical
test_decoding is an example decoding plugin that is provided with PostgreSQL and
pg_recvlogical is an example of utility that can be used to receive changes from a
logical replication slot. Lets see them both in action:
Step 1: Make the following changes in postgresq.conf on primary server
wal_level = logical
max_replication_slots = 10
listen_addresses = '*'
log_connections = on
log_disconnections = on
log_statement = 'all'
log_replication_commands = on
Step 2: Make the following changes in pg_hba.conf on the primary server
host all all 172.16.214.167/24 trust
host replication all 172.16.214.167/24 trust
75/96
WAL Records
W1 W2 W3 W4 Plugin
SQL Statements
INSERT INTO tab VALUES(1,2)
UPDATE tab set b = 10;
……
Replication in PostgreSQL - Deep Dive EnterpriseDB
Step 3: Create a database on the primary server
./createdb -p 7654 mydb -U postgres
Step 4: Connect the client with the primary server
./psql -p 7654 mydb -U postgres
psql.bin (10.7)
Type "help" for help.
mydb=# SELECT pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
0/16998C0
(1 row)
mydb=#SELECT * FROM
pg_create_logical_replication_slot('my_slot', 'test_decoding');
slot_name | lsn
-----------+-----------
my_slot | 0/1699930
(1 row)
mydb=# select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary |
-----------+---------------+-----------+--------+----------+-----------+
my_slot | test_decoding | logical | 16384 | mydb | f |
active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
--------+------------+------+--------------+-------------+---------------------
f | | | 556 | 0/16998F8 | 0/1699930
This replication slot is asking
1) VACUUM should not remove catalog tuples deleted by any transaction later than 556.
2) The consumer of this replication slot needs all segments including and after 0/16998F8
3) The consumer of this logical replication slot has confirmed receiving data upto and before
0/1699930. Most of the time a slot will require older WAL (i.e. restart_lsn) than the confirmed
position (i.e. confirmed_flush_lsn). The flush position is just a marker saved by the consumer,
76/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
the actual WALs required is always determined by restart_lsn. If this is the first slot being
created in the cluster then the restart_lsn will be the current WAL LSN at the time when this slot
was created.
Step 5: On stand by start pg_recvlogical utility
./pg_recvlogical
--slot=my_slot
--verbose
-d mydb
-h 172.16.214.167
-p 7654
-U postgres
--start
-f -
pg_recvlogical:starting log streaming at 0/0 (slot my_slot)
pg_recvlogical:streaming initiated
pg_recvlogical:confirming write up to 0/0, flush to 0/0 (slot my_slot)
pg_recvlogical:confirming write up to 0/1699930, flush to 0/1699930 (slot my_slot)
pg_recvlogical:confirming write up to 0/1699930, flush to 0/1699930 (slot my_slot)
Step 6: On Primary create a table
create table test(a varchar(10));
Step 7: Check the output of pg_recvlogical
BEGIN 556
COMMIT 556
pg_recvlogical: confirming write up to 0/16B0580, flush to 0/16B0580 (slot my_slot)
Step 8: Insert a few rows in the table on primary
mydb=# insert into test values('qaz');
mydb=# insert into test values('wsx');
77/96
WAL_1 WAL_2 WAL_3 WAL_4 WAL_5 WAL_6
restart_lsn confirmed_flush_lsn
Required WALsWALs Not Required
Replication in PostgreSQL - Deep Dive EnterpriseDB
mydb=# insert into test values('edc');
Step 9: Check the output of pg_recvlogical
BEGIN 557
table public.test: INSERT: a[character varying]:'qaz'
COMMIT 557
pg_recvlogical: confirming write up to 0/16B0628, flush to 0/16B0628 (slot my_slot)
pg_recvlogical: confirming write up to 0/16B0660, flush to 0/16B0660 (slot my_slot)
BEGIN 558
table public.test: INSERT: a[character varying]:'wsx'
COMMIT 558
pg_recvlogical: confirming write up to 0/16B06D0, flush to 0/16B06D0 (slot my_slot)
BEGIN 559
table public.test: INSERT: a[character varying]:'edc'
COMMIT 559
pg_recvlogical: confirming write up to 0/16B0778, flush to 0/16B0778 (slot my_slot)
Step 10: Update rows in the table on primary
update test set a = 'tgb';
Step 11: Check the output of pg_recvlogical
BEGIN 560
table public.test: UPDATE: a[character varying]:'tgb'
table public.test: UPDATE: a[character varying]:'tgb'
table public.test: UPDATE: a[character varying]:'tgb'
COMMIT 560
pg_recvlogical: confirming write up to 0/16B08B8, flush to 0/16B08B8 (slot my_slot)
Step 12: Delete rows in the table on primary
delete from test;
Step 13: Check the output of pg_recvlogical
BEGIN 561
table public.test: DELETE: (no-tuple-data)
table public.test: DELETE: (no-tuple-data)
table public.test: DELETE: (no-tuple-data)
COMMIT 561
pg_recvlogical: confirming write up to 0/16B0990, flush to 0/16B0990 (slot my_slot)
78/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Step 14: Check the slot on primary, note that restart_lsn has been advanced
mydb=# select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary |
-----------+---------------+-----------+--------+----------+-----------+
my_slot | test_decoding | logical | 16384 | mydb | f |
active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
--------+------------+------+--------------+-------------+---------------------
f | | | 566 | 0/16B0DB8 | 0/16B0DF0
79/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
9.7.7 Setup
The setup consists of two CentOS 7 machines connected via LAN on which PostgreSQL
version 10.7 is installed.
9.7.8 Configuring PostgreSQL Replication using Logical Decoding
Step 1:Disable and stop firewall on both the machines
sudo firewall-cmd --state
sudo systemctl stop firewalld
sudo systemctl disable firewalld
sudo systemctl mask --now firewalld
Step 2: On primary allow replication connections & connections from the same
network. Modify pg_hba.conf.
Local all all trust
host all all 172.16.214.167/24 trust
host all all ::1/128 trust
local replication all trust
host replication all 172.16.214.167/24 trust
host replication all ::1/128 trust
Step 3: On publisher edit postgresql.conf to modify the following parameters
max_wal_senders = 10
wal_level = logical
max_replication_slots = 10
listen_addresses = '*'
log_connections = on
log_disconnections = on
log_statement = 'all'
log_replication_commands = on
Step 4: Start the publisher server
./postgres -D /tmp/data/ -p 5432
80/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Step 5: Create a database on the publisher server’s
./createdb -p 5432 -U postgres src_db
Step 6: Connect to the publisher server and create a table with some rows
create table t1 (id integer primary key, val text);
create user replicant with replication;
grant select on t1 to replicant;
insert into t1 (id, val) values (10, 'ten'),
(20, 'twenty'),
(30, 'thirty');
Step 7: Create the publication on the publisher server
create publication pub1 for table t1;
Step 8: Start the subscriber
./postgres -D /tmp/data/ -p 5432
Step 9: Create the database on the subscriber
./createdb -p 5432 -U postgres dst_db
Step 10: Connect to the subscriber server and create the table with an additional
column
create table t1 (id integer primary key, val text, val2 text);
Step 11: Create the subscription
create subscription sub1
connection 'host=172.16.214.167
port=5432
dbname=src_db
user=replicant'
publication pub1;
81/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Step 12: Check the data in the subscribed table
dst_db=# select * from t1;
id | val | val2
----+--------+------
10 | ten |
20 | twenty |
30 | thirty |
(3 rows)
82/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
9.7.9 Logical Replication Protocol Details
83/96
Start up Request
What is server's authentication scheme?
While we are asking this question please note
Parameter Name Parameter Value
user replicant
database src_db
replication database ← The connection goes into logical replication mode
application_name sub1
Subscriber Publisher
Authentication Reply
52 00 00 00 08 00 00 00 00
Authentication Reply Length User authenticated
Status Parameters
'S'|Length 4 bytes|Param Name | Param Value
SELECT DISTINCT t.schemaname, t.tablename
FROM pg_catalog.pg_publication_tables t
WHERE t.pubname IN ('pub1')
schemaname | tablename
------------+------------
public | t1
CREATE_REPLICATION_SLOT "sub1" LOGICAL pgoutput NOEXPORT_SNAPSHOT
slot_name | consistent_point | snapshot_name | output_plugin
----------+------------------+---------------+--------------
sub1 | 0/16B9EA8 | | pgoutput
Disconnect
Replication in PostgreSQL - Deep Dive EnterpriseDB
84/96
Start up Request
What is server's authentication scheme?
While we are asking this question please note
Parameter Name Parameter Value
user replicant
database src_db
replication database ← The connection goes into logical replication mode
application_name sub1
Subscriber Publisher
Authentication Reply
52 00 00 00 08 00 00 00 00
Authentication Reply Length User authenticated
Status Parameters
'S'|Length 4 bytes|Param Name | Param Value
systemid | timeline | xlogpos | dbname
--------------------+-----------+-----------+--------------
6664876364497978284 | 1 | 0/16B9EA8 | src_db
START_REPLICATION SLOT "sub1" LOGICAL 0/0
(proto_version '1', publication_names '"pub1"')
Decoded WAL Data as CopyData messages
'd'|Length 4 bytes| Copy Data
Server responds with CopyBothResponse ‘W’
'W'|Length 4 bytes|COPY format is Textual | Copy Data has 0 columns
Simple Query : IDENTIFY_SYSTEM
Replication in PostgreSQL - Deep Dive EnterpriseDB
85/96
Start up Request
What is server's authentication scheme?
While we are asking this question please note
Parameter Name Parameter Value
user replicant
database src_db
replication database ← The connection goes into logical replication mode
application_name sub1_16393_sync_16385
Subscriber Publisher
Authentication Reply
52 00 00 00 08 00 00 00 00
Authentication Reply Length User authenticated
Status Parameters
'S'|Length 4 bytes|Param Name | Param Value
Command Completed and Transaction started
CREATE_REPLICATION_SLOT "sub1_16393_sync_16385"
TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
BEGIN READ ONLY ISOLATION LEVEL REPEATABLE READ
slot_name | consistent_point | snapshot_name | output_plugin
-----------------------+------------------+---------------+--------------
sub1_16393_sync_16385 | 0/16B9EE0 | | pgoutput
SELECT c.oid, c.relreplident FROM pg_catalog.pg_class c
INNER JOIN pg_catalog.pg_namespace n ON (c.relnamespace = n.oid)
WHERE n.nspname = 'public' AND c.relname = 't1' AND c.relkind = 'r'
oid | relreplident
--------+-------------------
16385 | d (primary key)
Cont. on next page
Replication in PostgreSQL - Deep Dive EnterpriseDB
86/96
Subscriber Publisher
SELECT a.attname, a.atttypid, a.atttypmod, a.attnum = ANY(i.indkey)
FROM pg_catalog.pg_attribute a LEFT JOIN pg_catalog.pg_index i
ON (i.indexrelid = pg_get_replica_identity_index(16385))
WHERE a.attnum > 0::pg_catalog.int2 AND NOT a.attisdropped
AND a.attrelid = 16385 ORDER BY a.attnum
attname | atttypid | atttypmod | ?column?
--------+-----------+-----------+----------
id | 23 | -1 | t
val | 25 | -1 | f
COPY public.t1 TO STDOUT
10 ten
20 twenty
30 thirty
COMMIT
Transaction Complete
Disconnect
Replication in PostgreSQL - Deep Dive EnterpriseDB
9.8 Statement Based
9.8.1 Introduction to pgpool-II
PgPool-II is a middle-ware system that sits between PostgreSQL servers and clients to
provide the following features:
• Connection Pooling
• Replication & Load Balancing
• Automated Failover
We are going to focus on the replication feature provided by pgPool-II
When used to replicate data pgPool receives the INSERT command from the client and
sends the command enclosed in a BEGIN-COMMIT block to all the PostgreSQL servers
under it.
87/96
Client Application
PgPool II
PostgreSQL PostgreSQL
INSERT INTO my_tab VALUES(1, ‘One’);
BEGIN
INSERT INTO my_tab VALUES(1, ‘One’);
COMMIT
BEGIN
INSERT INTO my_tab VALUES(1, ‘One’);
COMMIT
Replication in PostgreSQL - Deep Dive EnterpriseDB
9.8.2 Setup
The setup consists of two CentOS 7 machines on which PostgreSQL 10.7 installed. On
one of the machines pgpool-II version 3.6.15 (subaruboshi) is also installed.
9.8.3 Configuring PostgreSQL replication using pgpool-II
Step 1: Modify the postgresql.conf files of both the PostgreSQL instances
listen_addresses = '*'
logging_collector = off
log_connections = on
log_disconnections = on
log_statement = 'all'
Step 2: Modify the pg_hba.conf files of both the PostgreSQL instances
host all all 172.16.214.173/24 trust
Step 3: Modify the pgpool.conf
cd /opt/edb/pgpool3.6/etc
cp pgpool.conf.sample pgpool.conf && vim pgpool.conf
listen_addresses = '*'
backend_hostname0 = '172.16.214.173'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/data0'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_hostname1 = '172.16.214.172'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/data1'
backend_flag1 = 'ALLOW_TO_FAILOVER'
replication_mode = on
fail_over_on_backend_error = on
88/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Step 4: Generate md5 for the password
/opt/edb/pgpool3.6/bin/pg_md5 abc123
e99a18c428cb38d5f260853678922e03
Step 5: Modify the pcp.conf
cp pcp.conf.sample pcp.conf && vim pcp.conf
postgres:e99a18c428cb38d5f260853678922e03
Step 6: Start both the servers
./postgres -D ../data
./postgres -D ../data
Step 7: Start pgpool
./pgpool -n
-f /opt/edb/pgpool3.6/etc/pgpool.conf
-F /opt/edb/pgpool3.6/etc/pcp.conf
Step 8: Create database
./createdb -p 9999 test_pgp -U postgres
Note we are connecting with pgPool
Step 9: Check database server logs
For First Server (172.16.214.173)
[104386] LOG: connection received: host=172.16.214.173 port=57524
[104386] LOG: connection authorized: user=postgres database=postgres
[104386] LOG: statement: SELECT pg_catalog.set_config('search_path', '', false)
[104386] LOG: statement: CREATE DATABASE test_pgp;
[104386] LOG: statement: DISCARD ALL
[104386] LOG: disconnection: session time: 0:00:00.787 user=postgres database=postgres
host=172.16.214.173 port=57524
For Second Server (172.16.214.172)
[12363] LOG: connection received: host=172.16.214.173 port=42138
[12363] LOG: connection authorized: user=postgres database=postgres
[12363] LOG: statement: CREATE DATABASE test_pgp;
[12363] LOG: statement: DISCARD ALL
[12363] LOG: disconnection: session time: 0:00:00.704 user=postgres database=postgres
89/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
host=172.16.214.173 port=42138
Step 10: Create a new table
./psql -p 9999 test_pgp -U postgres
Note we are connecting with pgPool
create table my_tab(a int primary key, b varchar(10));
Step 11: Check server log
For First Server (172.16.214.173)
[107539] LOG: connection received: host=172.16.214.173 port=57528
[107539] LOG: connection authorized: user=postgres database=test_pgp
[107539] LOG: statement: BEGIN
[107539] LOG: statement: create table my_tab(a int primary key, b varchar(10));
[107539] LOG: statement: COMMIT
For Second Server (172.16.214.172)
[12400] LOG: connection received: host=172.16.214.173 port=42142
[12400] LOG: connection authorized: user=postgres database=test_pgp
[12400] LOG: statement: BEGIN
[12400] LOG: statement: create table my_tab(a int primary key, b varchar(10));
[12400] LOG: statement: COMMIT
Step 12: Insert rows in the table
insert into my_tab values(1,'One');
insert into my_tab values(2,'Two');
insert into my_tab values(3,'Three');
Step 13: Check server log
For First Server (172.16.214.173)
[107539] LOG: statement: BEGIN
[107539] LOG: statement:
SELECT count(*) from
( SELECT has_function_privilege
( 'postgres', 'pg_catalog.to_regclass(cstring)','execute' )
WHERE EXISTS
( SELECT * FROM pg_catalog.pg_proc AS p WHERE p.proname = 'to_regclass' )
90/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
) AS s
[107539] LOG: statement:
SELECT count(*) FROM pg_catalog.pg_attrdef AS d,
pg_catalog.pg_class AS c
WHERE d.adrelid = c.oid AND d.adsrc ~ 'nextval' AND
c.oid = pg_catalog.to_regclass('"my_tab"')
[107539] LOG: statement:
SELECT attname, d.adsrc as default_value,
coalesce
(
(
d.adsrc LIKE '%now()%' OR d.adsrc LIKE '%''now''::text%' OR
d.adsrc LIKE '%CURRENT_TIMESTAMP%' OR d.adsrc LIKE '%CURRENT_TIME%' OR
d.adsrc LIKE '%CURRENT_DATE%' OR d.adsrc LIKE '%LOCALTIME%' OR
d.adsrc LIKE '%LOCALTIMESTAMP%'
) AND
(
a.atttypid = 'timestamp'::regtype::oid OR
a.atttypid = 'timestamp with time zone'::regtype::oid OR
a.atttypid = 'date'::regtype::oid OR a.atttypid = 'time'::regtype::oid OR
a.atttypid = 'time with time zone'::regtype::oid
) ,
false
)
FROM pg_catalog.pg_class c,
pg_catalog.pg_attribute a LEFT JOIN
pg_catalog.pg_attrdef d ON
(a.attrelid = d.adrelid AND a.attnum = d.adnum)
WHERE c.oid = a.attrelid AND a.attnum >= 1 AND
a.attisdropped = 'f' AND c.oid = to_regclass('"my_tab"')
ORDER BY a.attnum
[107539] LOG: statement: insert into my_tab values(1,'One');
[107539] LOG: statement: COMMIT
[107539] LOG: statement: BEGIN
[107539] LOG: statement: insert into my_tab values(2,'Two');
[107539] LOG: statement: COMMIT
[107539] LOG: statement: BEGIN
[107539] LOG: statement: insert into my_tab values(3,'Three');
91/96
What are attribute names?
What are their default values if any?
Does any column has default value now()?
attname | default_value | coalesce
---------+---------------+----------
a | | f
b | | f
Replication in PostgreSQL - Deep Dive EnterpriseDB
[107539] LOG: statement: COMMIT
For Second Server (172.16.214.172)
[12400] LOG: statement: BEGIN
[12400] LOG: statement: insert into my_tab values(1,'One');
[12400] LOG: statement: COMMIT
[12400] LOG: statement: BEGIN
[12400] LOG: statement: insert into my_tab values(2,'Two');
[12400] LOG: statement: COMMIT
[12400] LOG: statement: BEGIN
[12400] LOG: statement: insert into my_tab values(3,'Three');
[12400] LOG: statement: COMMIT
Step 14: Try an update statement
UPDAYTE my_tab SET b = 'threee' WHERE b like 'Three';
Step 15: Check server log
For First Server (172.16.214.173)
[107539] LOG: statement: BEGIN
[107539] LOG: statement: update my_tab set b = 'threee' where b like 'Three';
[107539] LOG: statement: COMMIT
For First Server (172.16.214.172)
[12400] LOG: statement: BEGIN
[12400] LOG: statement: update my_tab set b = 'threee' where b like 'Three';
[12400] LOG: statement: COMMIT
Step 16: Select data from the table
test_pgp=# select * from my_tab;
a | b
---+--------
1 | One
2 | Two
3 | threee
(3 rows)
92/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Step 17: Check server log and observe load balancing
For First Server (172.16.214.173)
[107539] LOG: statement: select * from my_tab;
For First Server (172.16.214.172)
(Nothing)
Step 18: Check node status
test_pgp=# show pool_nodes;
node_id | hostname | port | status | lb_weight |
---------+----------------+------+--------+-----------+
0 | 172.16.214.173 | 5432 | up | 0.500000 |
1 | 172.16.214.172 | 5432 | up | 0.500000 |
role | select_cnt | load_balance_node | replication_delay
--------+------------+-------------------+-------------------
master | 5 | true | 0
slave | 0 | false | 0
(2 rows)
Step 19: Stop the master server i.e. 172.16.214.173
Step 20: Run the select query
test_pgp=# select * from my_tab;
FATAL: unable to read data from DB node 0
DETAIL: EOF encountered with backend
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
93/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
Step 21: Run the select query again
test_pgp=# select * from my_tab;
a | b
---+-------
1 | One
2 | Two
3 | threee
(3 rows)
Step 22: Check node status
test_pgp=# show pool_nodes;
node_id | hostname | port | status | lb_weight |
---------+----------------+------+--------+-----------+
0 | 172.16.214.173 | 5432 | down | 0.500000 |
1 | 172.16.214.172 | 5432 | up | 0.500000 |
role | select_cnt | load_balance_node | replication_delay
--------+------------+-------------------+-------------------
slave | 5 | false | 0
master | 1 | true | 0
(2 rows)
Step 23: Create another table and insert a row in it
create table time_test(a timestamp);
insert into time_test values(now());
Step 24: Check server log and observe how pgPool translated now() to real time
For First Server (172.16.214.173)
[107539] LOG: statement: BEGIN
[107539] LOG: statement: create table time_test(a timestamp);
[107539] LOG: statement: COMMIT
[107539] LOG: statement: BEGIN
[107539] LOG: statement: SELECT now()
94/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
[107539] LOG: statement: INSERT INTO "time_test" VALUES ("pg_catalog"."timestamptz"
('2019-03-14 04:36:22.324674-04'::text))
[107539] LOG: statement: COMMIT
For First Server (172.16.214.172)
[12400] LOG: statement: BEGIN
[12400] LOG: statement: create table time_test(a timestamp);
[12400] LOG: statement: COMMIT
[12400] LOG: statement: BEGIN
[12400] LOG: statement: INSERT INTO "time_test" VALUES ("pg_catalog"."timestamptz"
('2019-03-14 04:36:22.324674-04'::text))
[12400] LOG: statement: COMMIT
95/96
Replication in PostgreSQL - Deep Dive EnterpriseDB
9.9 Other possibilities
9.9.1 EDB xDB Replication Server
EDB xDB (cross database) Replication Server is an asynchronous replication system
available for PostgreSQL based on publish subscribe model.
xDB Replication Server can be used to implement replication systems based on either of
two different replication models
• Single-master (master-to-slave) replication
• Multi-master replication.
The following are the combinations of cross database replications that xDB Replication
Server supports for single-master replication:
Master Database Slave Database
Oracle PostgreSQL
Oracle EDB Postgres
SQL Server PostgreSQL
SQL Server EDB Postgres
PostgreSQL SQL Server
PostgreSQL EDB Postgres
EDB Postgres SQL Server
EDB Postgres Oracle
EDB Postgres PostgreSQL
For multi-master replication, xDB Replication Server supports the following servers:
Master Database
PostgreSQL
EDB Postgres
XDB replication server can either use trigger based method or logical decoding based
model to perform replication.
96/96

More Related Content

What's hot (20)

Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
Brendan Gregg
 
FOSDEM 2022 MySQL Devroom: MySQL 8.0 - Logical Backups, Snapshots and Point-...
FOSDEM 2022 MySQL Devroom:  MySQL 8.0 - Logical Backups, Snapshots and Point-...FOSDEM 2022 MySQL Devroom:  MySQL 8.0 - Logical Backups, Snapshots and Point-...
FOSDEM 2022 MySQL Devroom: MySQL 8.0 - Logical Backups, Snapshots and Point-...
Frederic Descamps
 
Docker Networking
Docker NetworkingDocker Networking
Docker Networking
Kingston Smiler
 
Understanding oracle rac internals part 1 - slides
Understanding oracle rac internals   part 1 - slidesUnderstanding oracle rac internals   part 1 - slides
Understanding oracle rac internals part 1 - slides
Mohamed Farouk
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFS
Tomas Vondra
 
PostgreSQL replication
PostgreSQL replicationPostgreSQL replication
PostgreSQL replication
NTT DATA OSS Professional Services
 
Ash architecture and advanced usage rmoug2014
Ash architecture and advanced usage rmoug2014Ash architecture and advanced usage rmoug2014
Ash architecture and advanced usage rmoug2014
John Beresniewicz
 
Average Active Sessions RMOUG2007
Average Active Sessions RMOUG2007Average Active Sessions RMOUG2007
Average Active Sessions RMOUG2007
John Beresniewicz
 
Understanding oracle rac internals part 2 - slides
Understanding oracle rac internals   part 2 - slidesUnderstanding oracle rac internals   part 2 - slides
Understanding oracle rac internals part 2 - slides
Mohamed Farouk
 
Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...
Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...
Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...
Vietnam Open Infrastructure User Group
 
Run Qt on Linux embedded systems using Yocto
Run Qt on Linux embedded systems using YoctoRun Qt on Linux embedded systems using Yocto
Run Qt on Linux embedded systems using Yocto
Marco Cavallini
 
Docker Commands With Examples | Docker Tutorial | DevOps Tutorial | Docker Tr...
Docker Commands With Examples | Docker Tutorial | DevOps Tutorial | Docker Tr...Docker Commands With Examples | Docker Tutorial | DevOps Tutorial | Docker Tr...
Docker Commands With Examples | Docker Tutorial | DevOps Tutorial | Docker Tr...
Edureka!
 
Wait! What’s going on inside my database?
Wait! What’s going on inside my database?Wait! What’s going on inside my database?
Wait! What’s going on inside my database?
Jeremy Schneider
 
Angular - a real world case study
Angular - a real world case studyAngular - a real world case study
Angular - a real world case study
dwcarter74
 
ASM
ASMASM
ASM
VINAY PANDEY
 
PostgreSQL and Linux Containers
PostgreSQL and Linux ContainersPostgreSQL and Linux Containers
PostgreSQL and Linux Containers
Jignesh Shah
 
Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!
PGConf APAC
 
ZFS in 30 minutes
ZFS in 30 minutesZFS in 30 minutes
ZFS in 30 minutes
William Hathaway
 
PostgreSQL - backup and recovery with large databases
PostgreSQL - backup and recovery with large databasesPostgreSQL - backup and recovery with large databases
PostgreSQL - backup and recovery with large databases
Federico Campoli
 
Kubernetes dealing with storage and persistence
Kubernetes  dealing with storage and persistenceKubernetes  dealing with storage and persistence
Kubernetes dealing with storage and persistence
Janakiram MSV
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
Brendan Gregg
 
FOSDEM 2022 MySQL Devroom: MySQL 8.0 - Logical Backups, Snapshots and Point-...
FOSDEM 2022 MySQL Devroom:  MySQL 8.0 - Logical Backups, Snapshots and Point-...FOSDEM 2022 MySQL Devroom:  MySQL 8.0 - Logical Backups, Snapshots and Point-...
FOSDEM 2022 MySQL Devroom: MySQL 8.0 - Logical Backups, Snapshots and Point-...
Frederic Descamps
 
Understanding oracle rac internals part 1 - slides
Understanding oracle rac internals   part 1 - slidesUnderstanding oracle rac internals   part 1 - slides
Understanding oracle rac internals part 1 - slides
Mohamed Farouk
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFS
Tomas Vondra
 
Ash architecture and advanced usage rmoug2014
Ash architecture and advanced usage rmoug2014Ash architecture and advanced usage rmoug2014
Ash architecture and advanced usage rmoug2014
John Beresniewicz
 
Average Active Sessions RMOUG2007
Average Active Sessions RMOUG2007Average Active Sessions RMOUG2007
Average Active Sessions RMOUG2007
John Beresniewicz
 
Understanding oracle rac internals part 2 - slides
Understanding oracle rac internals   part 2 - slidesUnderstanding oracle rac internals   part 2 - slides
Understanding oracle rac internals part 2 - slides
Mohamed Farouk
 
Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...
Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...
Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...
Vietnam Open Infrastructure User Group
 
Run Qt on Linux embedded systems using Yocto
Run Qt on Linux embedded systems using YoctoRun Qt on Linux embedded systems using Yocto
Run Qt on Linux embedded systems using Yocto
Marco Cavallini
 
Docker Commands With Examples | Docker Tutorial | DevOps Tutorial | Docker Tr...
Docker Commands With Examples | Docker Tutorial | DevOps Tutorial | Docker Tr...Docker Commands With Examples | Docker Tutorial | DevOps Tutorial | Docker Tr...
Docker Commands With Examples | Docker Tutorial | DevOps Tutorial | Docker Tr...
Edureka!
 
Wait! What’s going on inside my database?
Wait! What’s going on inside my database?Wait! What’s going on inside my database?
Wait! What’s going on inside my database?
Jeremy Schneider
 
Angular - a real world case study
Angular - a real world case studyAngular - a real world case study
Angular - a real world case study
dwcarter74
 
PostgreSQL and Linux Containers
PostgreSQL and Linux ContainersPostgreSQL and Linux Containers
PostgreSQL and Linux Containers
Jignesh Shah
 
Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!
PGConf APAC
 
PostgreSQL - backup and recovery with large databases
PostgreSQL - backup and recovery with large databasesPostgreSQL - backup and recovery with large databases
PostgreSQL - backup and recovery with large databases
Federico Campoli
 
Kubernetes dealing with storage and persistence
Kubernetes  dealing with storage and persistenceKubernetes  dealing with storage and persistence
Kubernetes dealing with storage and persistence
Janakiram MSV
 

Similar to Replication in PostgreSQL tutorial given in Postgres Conference 2019 (20)

Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4
Denish Patel
 
Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)
Denish Patel
 
Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)
Denish Patel
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014
Michael Renner
 
PostgreSQL Sharding and HA: Theory and Practice (PGConf.ASIA 2017)
PostgreSQL Sharding and HA: Theory and Practice (PGConf.ASIA 2017)PostgreSQL Sharding and HA: Theory and Practice (PGConf.ASIA 2017)
PostgreSQL Sharding and HA: Theory and Practice (PGConf.ASIA 2017)
Aleksander Alekseev
 
Out of the Box Replication in Postgres 9.4(pgconfsf)
Out of the Box Replication in Postgres 9.4(pgconfsf)Out of the Box Replication in Postgres 9.4(pgconfsf)
Out of the Box Replication in Postgres 9.4(pgconfsf)
Denish Patel
 
Streaming replication in practice
Streaming replication in practiceStreaming replication in practice
Streaming replication in practice
Alexey Lesovsky
 
PG_Phsycal_logical_study_Replication.pptx
PG_Phsycal_logical_study_Replication.pptxPG_Phsycal_logical_study_Replication.pptx
PG_Phsycal_logical_study_Replication.pptx
ankitmodidba
 
Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)
Denish Patel
 
Out of the Box Replication in Postgres 9.4(PgConfUS)
Out of the Box Replication in Postgres 9.4(PgConfUS)Out of the Box Replication in Postgres 9.4(PgConfUS)
Out of the Box Replication in Postgres 9.4(PgConfUS)
Denish Patel
 
Built in physical and logical replication in postgresql-Firat Gulec
Built in physical and logical replication in postgresql-Firat GulecBuilt in physical and logical replication in postgresql-Firat Gulec
Built in physical and logical replication in postgresql-Firat Gulec
FIRAT GULEC
 
PostgreSQL Replication Tutorial
PostgreSQL Replication TutorialPostgreSQL Replication Tutorial
PostgreSQL Replication Tutorial
Hans-Jürgen Schönig
 
Online Upgrade Using Logical Replication.
Online Upgrade Using Logical Replication.Online Upgrade Using Logical Replication.
Online Upgrade Using Logical Replication.
EDB
 
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr UnternehmenDie 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
EDB
 
PostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsPostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability Methods
Mydbops
 
Demystifying postgres logical replication percona live sc
Demystifying postgres logical replication percona live scDemystifying postgres logical replication percona live sc
Demystifying postgres logical replication percona live sc
Emanuel Calvo
 
Logical Replication in PostgreSQL - FLOSSUK 2016
Logical Replication in PostgreSQL - FLOSSUK 2016Logical Replication in PostgreSQL - FLOSSUK 2016
Logical Replication in PostgreSQL - FLOSSUK 2016
Petr Jelinek
 
PostgreSQL- An Introduction
PostgreSQL- An IntroductionPostgreSQL- An Introduction
PostgreSQL- An Introduction
Smita Prasad
 
9.6_Course Material-Postgresql_002.pdf
9.6_Course Material-Postgresql_002.pdf9.6_Course Material-Postgresql_002.pdf
9.6_Course Material-Postgresql_002.pdf
sreedb2
 
PostgreSQL Server Programming 2nd Edition Usama Dar
PostgreSQL Server Programming 2nd Edition Usama DarPostgreSQL Server Programming 2nd Edition Usama Dar
PostgreSQL Server Programming 2nd Edition Usama Dar
obdlioubysz
 
Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4
Denish Patel
 
Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)
Denish Patel
 
Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)
Denish Patel
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014
Michael Renner
 
PostgreSQL Sharding and HA: Theory and Practice (PGConf.ASIA 2017)
PostgreSQL Sharding and HA: Theory and Practice (PGConf.ASIA 2017)PostgreSQL Sharding and HA: Theory and Practice (PGConf.ASIA 2017)
PostgreSQL Sharding and HA: Theory and Practice (PGConf.ASIA 2017)
Aleksander Alekseev
 
Out of the Box Replication in Postgres 9.4(pgconfsf)
Out of the Box Replication in Postgres 9.4(pgconfsf)Out of the Box Replication in Postgres 9.4(pgconfsf)
Out of the Box Replication in Postgres 9.4(pgconfsf)
Denish Patel
 
Streaming replication in practice
Streaming replication in practiceStreaming replication in practice
Streaming replication in practice
Alexey Lesovsky
 
PG_Phsycal_logical_study_Replication.pptx
PG_Phsycal_logical_study_Replication.pptxPG_Phsycal_logical_study_Replication.pptx
PG_Phsycal_logical_study_Replication.pptx
ankitmodidba
 
Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)
Denish Patel
 
Out of the Box Replication in Postgres 9.4(PgConfUS)
Out of the Box Replication in Postgres 9.4(PgConfUS)Out of the Box Replication in Postgres 9.4(PgConfUS)
Out of the Box Replication in Postgres 9.4(PgConfUS)
Denish Patel
 
Built in physical and logical replication in postgresql-Firat Gulec
Built in physical and logical replication in postgresql-Firat GulecBuilt in physical and logical replication in postgresql-Firat Gulec
Built in physical and logical replication in postgresql-Firat Gulec
FIRAT GULEC
 
Online Upgrade Using Logical Replication.
Online Upgrade Using Logical Replication.Online Upgrade Using Logical Replication.
Online Upgrade Using Logical Replication.
EDB
 
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr UnternehmenDie 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
EDB
 
PostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsPostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability Methods
Mydbops
 
Demystifying postgres logical replication percona live sc
Demystifying postgres logical replication percona live scDemystifying postgres logical replication percona live sc
Demystifying postgres logical replication percona live sc
Emanuel Calvo
 
Logical Replication in PostgreSQL - FLOSSUK 2016
Logical Replication in PostgreSQL - FLOSSUK 2016Logical Replication in PostgreSQL - FLOSSUK 2016
Logical Replication in PostgreSQL - FLOSSUK 2016
Petr Jelinek
 
PostgreSQL- An Introduction
PostgreSQL- An IntroductionPostgreSQL- An Introduction
PostgreSQL- An Introduction
Smita Prasad
 
9.6_Course Material-Postgresql_002.pdf
9.6_Course Material-Postgresql_002.pdf9.6_Course Material-Postgresql_002.pdf
9.6_Course Material-Postgresql_002.pdf
sreedb2
 
PostgreSQL Server Programming 2nd Edition Usama Dar
PostgreSQL Server Programming 2nd Edition Usama DarPostgreSQL Server Programming 2nd Edition Usama Dar
PostgreSQL Server Programming 2nd Edition Usama Dar
obdlioubysz
 
Ad

Recently uploaded (20)

The State of Web3 Industry- Industry Report
The State of Web3 Industry- Industry ReportThe State of Web3 Industry- Industry Report
The State of Web3 Industry- Industry Report
Liveplex
 
Oracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI FoundationsOracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI Foundations
VICTOR MAESTRE RAMIREZ
 
How to Detect Outliers in IBM SPSS Statistics.pptx
How to Detect Outliers in IBM SPSS Statistics.pptxHow to Detect Outliers in IBM SPSS Statistics.pptx
How to Detect Outliers in IBM SPSS Statistics.pptx
Version 1 Analytics
 
Agentic AI: Beyond the Buzz- LangGraph Studio V2
Agentic AI: Beyond the Buzz- LangGraph Studio V2Agentic AI: Beyond the Buzz- LangGraph Studio V2
Agentic AI: Beyond the Buzz- LangGraph Studio V2
Shashikant Jagtap
 
Boosting MySQL with Vector Search -THE VECTOR SEARCH CONFERENCE 2025 .pdf
Boosting MySQL with Vector Search -THE VECTOR SEARCH CONFERENCE 2025 .pdfBoosting MySQL with Vector Search -THE VECTOR SEARCH CONFERENCE 2025 .pdf
Boosting MySQL with Vector Search -THE VECTOR SEARCH CONFERENCE 2025 .pdf
Alkin Tezuysal
 
Providing an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME FlowProviding an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME Flow
Safe Software
 
Cisco ISE Performance, Scalability and Best Practices.pdf
Cisco ISE Performance, Scalability and Best Practices.pdfCisco ISE Performance, Scalability and Best Practices.pdf
Cisco ISE Performance, Scalability and Best Practices.pdf
superdpz
 
Trends Artificial Intelligence - Mary Meeker
Trends Artificial Intelligence - Mary MeekerTrends Artificial Intelligence - Mary Meeker
Trends Artificial Intelligence - Mary Meeker
Clive Dickens
 
How Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdf
How Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdfHow Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdf
How Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdf
Rejig Digital
 
Domino IQ – What to Expect, First Steps and Use Cases
Domino IQ – What to Expect, First Steps and Use CasesDomino IQ – What to Expect, First Steps and Use Cases
Domino IQ – What to Expect, First Steps and Use Cases
panagenda
 
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Floods in Valencia: Two FME-Powered Stories of Data ResilienceFloods in Valencia: Two FME-Powered Stories of Data Resilience
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Safe Software
 
Developing Schemas with FME and Excel - Peak of Data & AI 2025
Developing Schemas with FME and Excel - Peak of Data & AI 2025Developing Schemas with FME and Excel - Peak of Data & AI 2025
Developing Schemas with FME and Excel - Peak of Data & AI 2025
Safe Software
 
Mastering AI Workflows with FME - Peak of Data & AI 2025
Mastering AI Workflows with FME - Peak of Data & AI 2025Mastering AI Workflows with FME - Peak of Data & AI 2025
Mastering AI Workflows with FME - Peak of Data & AI 2025
Safe Software
 
Ben Blair - Operating Safely in a Vibe Coding World
Ben Blair - Operating Safely in a Vibe Coding WorldBen Blair - Operating Safely in a Vibe Coding World
Ben Blair - Operating Safely in a Vibe Coding World
AWS Chicago
 
AI Agents in Logistics and Supply Chain Applications Benefits and Implementation
AI Agents in Logistics and Supply Chain Applications Benefits and ImplementationAI Agents in Logistics and Supply Chain Applications Benefits and Implementation
AI Agents in Logistics and Supply Chain Applications Benefits and Implementation
Christine Shepherd
 
Secure Access with Azure Active Directory
Secure Access with Azure Active DirectorySecure Access with Azure Active Directory
Secure Access with Azure Active Directory
VICTOR MAESTRE RAMIREZ
 
Domino IQ – Was Sie erwartet, erste Schritte und Anwendungsfälle
Domino IQ – Was Sie erwartet, erste Schritte und AnwendungsfälleDomino IQ – Was Sie erwartet, erste Schritte und Anwendungsfälle
Domino IQ – Was Sie erwartet, erste Schritte und Anwendungsfälle
panagenda
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Your startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean accountYour startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean account
angelo60207
 
Creating an Accessible Future-How AI-powered Accessibility Testing is Shaping...
Creating an Accessible Future-How AI-powered Accessibility Testing is Shaping...Creating an Accessible Future-How AI-powered Accessibility Testing is Shaping...
Creating an Accessible Future-How AI-powered Accessibility Testing is Shaping...
Impelsys Inc.
 
The State of Web3 Industry- Industry Report
The State of Web3 Industry- Industry ReportThe State of Web3 Industry- Industry Report
The State of Web3 Industry- Industry Report
Liveplex
 
Oracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI FoundationsOracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI Foundations
VICTOR MAESTRE RAMIREZ
 
How to Detect Outliers in IBM SPSS Statistics.pptx
How to Detect Outliers in IBM SPSS Statistics.pptxHow to Detect Outliers in IBM SPSS Statistics.pptx
How to Detect Outliers in IBM SPSS Statistics.pptx
Version 1 Analytics
 
Agentic AI: Beyond the Buzz- LangGraph Studio V2
Agentic AI: Beyond the Buzz- LangGraph Studio V2Agentic AI: Beyond the Buzz- LangGraph Studio V2
Agentic AI: Beyond the Buzz- LangGraph Studio V2
Shashikant Jagtap
 
Boosting MySQL with Vector Search -THE VECTOR SEARCH CONFERENCE 2025 .pdf
Boosting MySQL with Vector Search -THE VECTOR SEARCH CONFERENCE 2025 .pdfBoosting MySQL with Vector Search -THE VECTOR SEARCH CONFERENCE 2025 .pdf
Boosting MySQL with Vector Search -THE VECTOR SEARCH CONFERENCE 2025 .pdf
Alkin Tezuysal
 
Providing an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME FlowProviding an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME Flow
Safe Software
 
Cisco ISE Performance, Scalability and Best Practices.pdf
Cisco ISE Performance, Scalability and Best Practices.pdfCisco ISE Performance, Scalability and Best Practices.pdf
Cisco ISE Performance, Scalability and Best Practices.pdf
superdpz
 
Trends Artificial Intelligence - Mary Meeker
Trends Artificial Intelligence - Mary MeekerTrends Artificial Intelligence - Mary Meeker
Trends Artificial Intelligence - Mary Meeker
Clive Dickens
 
How Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdf
How Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdfHow Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdf
How Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdf
Rejig Digital
 
Domino IQ – What to Expect, First Steps and Use Cases
Domino IQ – What to Expect, First Steps and Use CasesDomino IQ – What to Expect, First Steps and Use Cases
Domino IQ – What to Expect, First Steps and Use Cases
panagenda
 
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Floods in Valencia: Two FME-Powered Stories of Data ResilienceFloods in Valencia: Two FME-Powered Stories of Data Resilience
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Safe Software
 
Developing Schemas with FME and Excel - Peak of Data & AI 2025
Developing Schemas with FME and Excel - Peak of Data & AI 2025Developing Schemas with FME and Excel - Peak of Data & AI 2025
Developing Schemas with FME and Excel - Peak of Data & AI 2025
Safe Software
 
Mastering AI Workflows with FME - Peak of Data & AI 2025
Mastering AI Workflows with FME - Peak of Data & AI 2025Mastering AI Workflows with FME - Peak of Data & AI 2025
Mastering AI Workflows with FME - Peak of Data & AI 2025
Safe Software
 
Ben Blair - Operating Safely in a Vibe Coding World
Ben Blair - Operating Safely in a Vibe Coding WorldBen Blair - Operating Safely in a Vibe Coding World
Ben Blair - Operating Safely in a Vibe Coding World
AWS Chicago
 
AI Agents in Logistics and Supply Chain Applications Benefits and Implementation
AI Agents in Logistics and Supply Chain Applications Benefits and ImplementationAI Agents in Logistics and Supply Chain Applications Benefits and Implementation
AI Agents in Logistics and Supply Chain Applications Benefits and Implementation
Christine Shepherd
 
Secure Access with Azure Active Directory
Secure Access with Azure Active DirectorySecure Access with Azure Active Directory
Secure Access with Azure Active Directory
VICTOR MAESTRE RAMIREZ
 
Domino IQ – Was Sie erwartet, erste Schritte und Anwendungsfälle
Domino IQ – Was Sie erwartet, erste Schritte und AnwendungsfälleDomino IQ – Was Sie erwartet, erste Schritte und Anwendungsfälle
Domino IQ – Was Sie erwartet, erste Schritte und Anwendungsfälle
panagenda
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Your startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean accountYour startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean account
angelo60207
 
Creating an Accessible Future-How AI-powered Accessibility Testing is Shaping...
Creating an Accessible Future-How AI-powered Accessibility Testing is Shaping...Creating an Accessible Future-How AI-powered Accessibility Testing is Shaping...
Creating an Accessible Future-How AI-powered Accessibility Testing is Shaping...
Impelsys Inc.
 
Ad

Replication in PostgreSQL tutorial given in Postgres Conference 2019

  • 1. Replication in PostgreSQL - Deep Dive EnterpriseDB Table of Contents 1 Objectives........................................................................................................................................................3 2 Presenter...........................................................................................................................................................3 3 What is Replication..........................................................................................................................................4 4 Why use Replication........................................................................................................................................4 5 Models of Replication (Single Master & Multi Master)..................................................................................5 6 Classes of Replication (Unidirectional & Bidirectional).................................................................................5 7 Modes of Replication (Asynchronous & Synchronous)..................................................................................6 8 Types of Replication (Physical & Logical)......................................................................................................7 9 Methods Of Replication...................................................................................................................................8 9.1 Disk Based Replication.................................................................................................................................8 9.1.1 Introduction................................................................................................................................................8 9.1.2 Setup..........................................................................................................................................................8 9.1.3 Configuring PostgreSQL Replication using NAS.....................................................................................8 9.1.4 Steps to perform Failover........................................................................................................................12 9.2 File System Based.......................................................................................................................................13 9.2.1 Introduction to DRBD.............................................................................................................................13 9.2.2 Setup........................................................................................................................................................15 9.2.3 Configuring PostgreSQL Replication using DRBD with Protocol C......................................................18 9.2.4 Steps to perform Failover........................................................................................................................26 9.3 Trigger Based..............................................................................................................................................28 9.3.1 Introduction to Slony-I.............................................................................................................................28 9.3.2 Advantages and Disadvantages of Slony.................................................................................................29 9.3.3 Setup........................................................................................................................................................30 9.3.4 Configuring PostgreSQL Replication using Slony-I...............................................................................30 9.3.5 Steps to perform controlled switchover...................................................................................................43 9.4 Introduction to WAL...................................................................................................................................44 9.4.1 What is WAL and Why is it required.......................................................................................................44 9.4.2 Transaction Log and WAL Segment Files...............................................................................................48 9.4.3 WAL Writer..............................................................................................................................................48 9.4.4 WAL Segment File Management.............................................................................................................49 9.4.5 WAL Example..........................................................................................................................................50 9.4.6 Overview of Replication Options based on WAL....................................................................................55 9.5 Log Shipping Based - File Level................................................................................................................57 9.5.1 Setup........................................................................................................................................................57 9.5.2 Configuring Replication using Log Shipping..........................................................................................57 9.5.3 Steps to perform Failover........................................................................................................................64 9.6 Log Shipping Based - Block Level.............................................................................................................65 9.6.1 Physical Streaming Replication...............................................................................................................65 9.6.2 WAL Sender & WAL Receiver................................................................................................................65 9.6.3 WAL Streaming Protocol Details.............................................................................................................66 9.6.4 Setup........................................................................................................................................................67 9.6.5 Configuring PostgreSQL Replication using WAL Streaming..................................................................67 9.6.6 Steps to perform Failover........................................................................................................................71 9.7 Logical Decoding Based.............................................................................................................................72 9.7.1 What is Logical Replication.....................................................................................................................72 1/96
  • 2. Replication in PostgreSQL - Deep Dive EnterpriseDB 9.7.2 Comparison of Physical and Logical Replication....................................................................................73 9.7.3 Publication & Subscription......................................................................................................................74 9.7.4 Logical Decoding Plugin.........................................................................................................................75 9.7.5 Logical replication slots...........................................................................................................................75 9.7.6 test_decoding and pg_recvlogical............................................................................................................75 9.7.7 Setup........................................................................................................................................................80 9.7.8 Configuring PostgreSQL Replication using Logical Decoding...............................................................80 9.7.9 Logical Replication Protocol Details.......................................................................................................83 9.8 Statement Based..........................................................................................................................................87 9.8.1 Introduction to pgpool-II.........................................................................................................................87 9.8.2 Setup........................................................................................................................................................88 9.8.3 Configuring PostgreSQL replication using pgpool-II..............................................................................88 9.9 Other possibilities.......................................................................................................................................96 2/96
  • 3. Replication in PostgreSQL - Deep Dive EnterpriseDB 1 Objectives A) Familiarize with Replication in PostgreSQL. B) Learn configuration and fail-over for each method of replication in PostgreSQL using a two node cluster. 2 Presenter My name is Abbas, I have a Masters in Computer Engineering. I have spent most of my career in product development. I work as a Senior Architect at EnterpriseDB. My work highlights are as follows: • Migration Portal for online schema migration from Oracle to PostgreSQL • xDB Replication Server • Schema Cloning with support for parallelism using Background Workers • Distributed Transactions (XA) Compliance for PostgreSQL using PgBouncer • Oracle Compatible Packages for IBM DB2 : UTL_ENCODE, UTL_TCP, UTL_SMTP, UTL_MAIL • HDFS_FDW, Mongo_FDW, MySQL FDW • Postgres-XC Email : [email protected] Linkedin : https://pk.linkedin.com/in/abbasbutt Blog : https://abbas-technical.blogspot.com 3/96
  • 4. Replication in PostgreSQL - Deep Dive EnterpriseDB 3 What is Replication Replication is the process of copying data from one database server to a another database server. The source database server is usually called Master Server, whereas the target database server is called Slave Server. 4 Why use Replication Replication of data can have many use cases. For Example: • Remove reporting queries load from the production OLTP system. This improves reporting queries time as well as transaction processing performance. • Fault tolerance : In the event of failure of the master database server, the slave database server can take over since it is already up to date with the master server. In this configuration the slave server can also be called standby server. This configuration can also be used for regular maintenance of the primary server. • Data migration : To upgrade database server hardware, or to deploy the same system for another customer. • Testing systems in parallel : In case we decide to port the application from one DBMS to another, the results from old and new systems on the same data must be compared to ensure whether the new system works as expected. 4/96 Master SlaveData
  • 5. Replication in PostgreSQL - Deep Dive EnterpriseDB 5 Models of Replication (Single Master & Multi Master) In Single-Master Replication (SMR) changes to table rows in a designated master database server are replicated to one or more slave database servers. The replicated tables in the slave database are not permitted to accept any changes (except from the master) and even if they do, changes are not replicated back to the master server. In Multi-Master Replication (MMR) changes to table rows in more than one designated masters are replicated to their counterpart tables in every other database. In this model often conflict resolution schemes are employed to avoid duplicate primary keys for example. MMR adds to the use cases of replication in the following manner: • Write availability and scalability • Multi-master replication allows you to employ a WAN connected network of master databases that can be geographically close to groups of clients, yet maintain data consistency across master databases. 6 Classes of Replication (Unidirectional & Bidirectional) Single-Master Replication (SMR) is also termed as unidirectional since replication data flows in one direction only from master to slave, whereas in Multi-Master Replication (MMR) replication data flows in both directions, it is therefore called bidirectional replication. 5/96 Master-I Master - IIData
  • 6. Replication in PostgreSQL - Deep Dive EnterpriseDB 7 Modes of Replication (Asynchronous & Synchronous) In synchronous mode of replication transactions on the master database are declared complete only when the changes have been replicated to all the slaves in addition to the master. All slaves have to be available all the time for the transactions to complete on the master. In Asynchronous mode the transactions on the master server are declared complete when the changes have been done on the master server. These changes are replicated to the slaves later in time. In this mode the slaves can remain out-of-sync for a certain duration which is called replication lag. 6/96 Master Slave - I Slave - II Time An insert to a replicated table Master Slave - I Slave - II Time An insert to a replicated table Replication Lag
  • 7. Replication in PostgreSQL - Deep Dive EnterpriseDB 8 Types of Replication (Physical & Logical) Before we discuss physical and logical replication replication, lets first discuss the context of the terms physical and logical here. Logical Operation Physical Operation 1 initdb Creates a base directory for the cluster 2 CREATE DATABASE Create a sub-directory in the base directory 3 CREATE TABLE Creates a file within the sub-directory of the database 4 INSERT Changes the file that was created for this particular table and writes new WAL Records in the current WAL segment For example: ramp=# create table sample_tbl(a int, b varchar(255)); CREATE TABLE ramp=# SELECT pg_relation_filepath('sample_tbl'); pg_relation_filepath ---------------------- base/34740/706736 (1 row) ramp=# SELECT datname, oid FROM pg_database WHERE datname = 'ramp'; datname | oid ---------+------- ramp | 34740 (1 row) ramp=# SELECT relname, oid FROM pg_class WHERE relname = 'sample_tbl'; relname | oid ------------+-------- sample_tbl | 706736 (1 row) Physical replication deals with files and directories, it has no knowledge of what these files and directories represent. It is done at file system level or disk level. Logical replication on the other hand deals with databases, tables and DML operations. It is therefore possible in logical replication to replicate a certain set of tables only. It is done at database cluster level. 7/96
  • 8. Replication in PostgreSQL - Deep Dive EnterpriseDB 9 Methods Of Replication 9.1 Disk Based Replication 9.1.1 Introduction A network attached storage with at least two disks can provide transparent replication by using mirroring i.e. RAID-1. Mirroring provides replication by copying all data from one disk to the other as if the second disk was mirror image of the first. This configuration provides fault tolerance in case of a single disk failure. 9.1.2 Setup The setup consists of one Centos 7 machine with PostgreSQL 10.7 installed and a Western Digital My Cloud Home 4 TB NAS. 9.1.3 Configuring PostgreSQL Replication using NAS Step 1: Connect the NAS device to the Internet The device needs a DHCP server running on the network and needs Internet for first time configuration. Step 2: Make sure PostgreSQL machine is connected to the same network as your device Step 3: Make sure you are able to access the device through the web interface mycloud.com/hello Create an account with email and password. 8/96 PostgreSQL WD My Cloud Home 4 TB Internet
  • 9. Replication in PostgreSQL - Deep Dive EnterpriseDB Step 4: Find the Mac address of your NAS device. The MAC address for my device is 00:00:c0:08:d7:01 Step 5: Find the IP address of the device On the PostgreSQL machine run the command arp -a and look for an entry like this ? (172.24.37.136) at 00:00:c0:08:d7:01 [ether] on ens160u3u1c2 The IP address of the NAS device is therefore 172.24.37.136 9/96
  • 10. Replication in PostgreSQL - Deep Dive EnterpriseDB Step 6: Check the public share on the device smbclient -N -L 172.24.37.136 Sharename Type Comment --------- ---- ------- Public Disk IPC$ IPC IPC Service (MyCloudDevice) Reconnecting with SMB1 for workgroup listing. Server Comment --------- ------- Workgroup Master --------- ------- WORKGROUP BFAS91-WIN Step 7: Mount the public share of the device on a local folder on the PostgreSQL machine mkdir /home/abbas/mc2 Step 7.1: Create a local folder mkdir /home/abbas/mc2 Step 7.2: Edit the /etc/fstab file and add the following line in it, modes are important //172.24.37.136/public/ /home/abbas/mc2/ cifs credentials=/home/abbas/.smbcredentials, uid=abbas,gid=abbas,rw,dir_mode=0700,file_mode=0700 0 0 Step 7.3: Create the credentials file as follows vim ~/.smbcredentials username=abbas password=abc123 Step 7.4: Mount sudo mount -a Step 7.5: Check the mounted folder ls -l /home/abbas/mc2 total 4 -rwx------. 1 abbas abbas 1135 Mar 10 05:30 for_nas.txt drwx------. 2 abbas abbas 0 Mar 11 07:08 for_pg 10/96
  • 11. Replication in PostgreSQL - Deep Dive EnterpriseDB Step 8: Create a folder to initdb, note the permissions, that’s why modes are important in step 7.2 mkdir /home/abbas/mc2/data ls -l /home/abbas/mc2 total 4 drwx------. 2 abbas abbas 0 Mar 11 07:42 data -rwx------. 1 abbas abbas 1135 Mar 10 05:30 for_nas.txt drwx------. 2 abbas abbas 0 Mar 11 07:08 for_pg Step 8: Initialize cluster ./initdb -D /home/abbas/mc2/data/ Step 9: Run the server ./postgres -D /home/abbas/mc2/data -p 7654 Step 10: Create a new table ./psql -p 7654 postgres create table test_tab(a int, b varchar(10)); SELECT pg_relation_filepath('test_tab'); pg_relation_filepath ---------------------- base/13212/16384 (1 row) 11/96
  • 12. Replication in PostgreSQL - Deep Dive EnterpriseDB Step 11: Connect to the device using nautilus Step 12: Check the relation file and its path 9.1.4 Steps to perform Failover In a two disk NAS device that has RAID-1 build in to it, the user can simply remove the faulty disk and replace it with a new disk, the database server will never notice the absence of the second disk or its replacement. 12/96 Enter device address smb://172.24.37.136 Press connect button smb://172.24.37.136/public/data/base/13212
  • 13. Replication in PostgreSQL - Deep Dive EnterpriseDB 9.2 File System Based 9.2.1 Introduction to DRBD Distributed Replicated Block Device (DRBD) is a software module that provides disk or partition mirroring between network hosts. DRBD is a virtual block device driver implemented as a kernel module. It provides replication solution which is independent of the application that is generating the data to be replicated. PostgreSQL is configured to use data directory on the DRBD controlled partition. When PostgreSQL writes any data, DRBD module not only writes that data on the disk but also sends the same data on the network to the connected secondary. The DRBD module on the secondary receives the data from the network and writes it to the disk. In DRBD the most commonly used data synchronization mode is Single-Primary. In the single primary mode only one cluster node manipulates the data at any moment. DRBD can also support Dual-Primay mode. We are using Single Primary Mode with ext4 file system. DRBD Supports three replication protocols: Protocol A - Asynchronous replication protocol. Local write operations on the primary node are considered completed as soon as the local disk write has finished, and the replication packet has been placed in the local TCP send buffer. In the event of forced fail-over, data loss may occur. 13/96 PostgreSQL DRBD Write Primary Secondary DRBD sda2sda2 Replication
  • 14. Replication in PostgreSQL - Deep Dive EnterpriseDB Protocol B - Memory synchronous (semi-synchronous) replication protocol. Local write operations on the primary node are considered completed as soon as the local disk write has occurred, and the replication packet has reached the peer node. Normally, no writes are lost in case of forced fail-over. Protocol C - Synchronous replication protocol. Local write operations on the primary node are considered completed only after both the local and the remote disk write have been confirmed. As a result, loss of a single node is guaranteed not to lead to any data loss. Most commonly used replication protocol in DRBD setup is Protocol C. 14/96
  • 15. Replication in PostgreSQL - Deep Dive EnterpriseDB 9.2.2 Setup The setup consists of two CentOS 7 machines connected via LAN installed with two partitions. While installing CentOS 7, choose "Installation Destination" option Deselect "Automatically configure partitioning" and Select "I will configure partitioning" After clicking Done, Manual Partitioning screen will appear Click the + button to add a mount point Mount Point / Desired Capacity 15 GiB File System ext4 15/96
  • 16. Replication in PostgreSQL - Deep Dive EnterpriseDB Click the + button to add a mount point For swap Enter Desired Capacity 4GiB Click the + button to add a mount point Mount Point /for_data Desired Capacity 12 GiB File System ext4 16/96
  • 17. Replication in PostgreSQL - Deep Dive EnterpriseDB Click Done Accept Changes 17/96
  • 18. Replication in PostgreSQL - Deep Dive EnterpriseDB 9.2.3 Configuring PostgreSQL Replication using DRBD with Protocol C All steps are for both primary and secondary node, unless mentioned otherwise. Step 1: Disable and stop firewall on both the nodes sudo firewall-cmd --state sudo systemctl stop firewalld sudo systemctl disable firewalld sudo systemctl mask --now firewalld Step 2: Change hostname sudo hostnamectl set-hostname primary sudo hostnamectl set-hostname secondary Step 3: Install Extra Packages for Enterprise Linux (EPEL) repository sudo yum install epel-release sudo rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org sudo rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm Step 4: Install DRBD sudo yum install drbd90-utils kmod-drbd90 Step 5: Restart the System Step 6: Install the kernel module sudo modprobe drbd 18/96
  • 19. Replication in PostgreSQL - Deep Dive EnterpriseDB Step 7: Create configuration file sudo vim /etc/drbd.d/pgconf.res resource pgconf { protocol C; on primary { device /dev/drbd0; disk /dev/sda2; address 172.16.214.151:7788; meta-disk internal; } on secondary { device /dev/drbd0; disk /dev/sda2; address 172.16.214.150:7788; meta-disk internal; } } Step 8: Unmount the disk df -h Filesystem Size Used Avail Use% Mounted on /dev/sda1 15G 5.7G 8.2G 41% / devtmpfs 1.4G 0 1.4G 0% /dev tmpfs 1.4G 58M 1.3G 5% /dev/shm tmpfs 1.4G 11M 1.4G 1% /run tmpfs 1.4G 0 1.4G 0% /sys/fs/cgroup /dev/sda2 12G 41M 12G 1% /for_data tmpfs 278M 4.0K 278M 1% /run/user/42 tmpfs 278M 44K 278M 1% /run/user/1000 19/96
  • 20. Replication in PostgreSQL - Deep Dive EnterpriseDB sudo umount /dev/sda2 df -h Filesystem Size Used Avail Use% Mounted on /dev/sda1 15G 5.7G 8.2G 41% / devtmpfs 1.4G 0 1.4G 0% /dev tmpfs 1.4G 58M 1.3G 5% /dev/shm tmpfs 1.4G 11M 1.4G 1% /run tmpfs 1.4G 0 1.4G 0% /sys/fs/cgroup tmpfs 278M 4.0K 278M 1% /run/user/42 tmpfs 278M 44K 278M 1% /run/user/1000 Step 9: Delete file system from the disk, DRBD needs a disk without any file system sudo yum install util-linux sudo wipefs /dev/sda2 offset type ---------------------------------------------------------------- 0x438 ext4 [filesystem] UUID: 8def5959-4dc9-4605-ad61-bd3b597966a3 sudo wipefs -a /dev/sda2 /dev/sda2: 2 bytes were erased at offset 0x00000438 (ext4): 53 ef Step 10: Create DRBD device meta data sudo drbdadm create-md pgconf md_offset 12883849216 al_offset 12883816448 bm_offset 12883423232 Found some data ==> This might destroy existing data! <== Do you want to proceed? [need to type 'yes' to confirm] yes initializing activity log initializing bitmap (384 KB) to all zero 20/96
  • 21. Replication in PostgreSQL - Deep Dive EnterpriseDB Writing meta data... New drbd meta data block successfully created. success Step 11: Associate the DRBD disk with the backing device on the both nodes sudo lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 30G 0 disk ├─sda1 8:1 0 15G 0 part / ├─sda2 8:2 0 12G 0 part └─sda3 8:3 0 3G 0 part [SWAP] sr0 11:0 1 1024M 0 rom NAME This is the device name. MAJ:MIN This column shows the major and minor device number. RM This column shows whether the device is removable or not. SIZE This is column give information on the size of the device. RO This indicates whether a device is read-only. TYPE This column shows information whether the block device is a disk or a partition(part) within a disk. In this example sda is a disk, sda1, sda2 & sda3 are partitions and sr0 is read only memory (rom) MOUNTPOINT This column indicates mount point on which the device is mounted. sudo drbdadm up pgconf sudo lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 30G 0 disk ├─sda1 8:1 0 15G 0 part / ├─sda2 8:2 0 12G 0 part │ └─drbd0 147:0 0 12G 1 disk └─sda3 8:3 0 3G 0 part [SWAP] sr0 11:0 1 1024M 0 rom 21/96
  • 22. Replication in PostgreSQL - Deep Dive EnterpriseDB Step 12: Start drbd on both the nodes sudo systemctl start drbd sudo systemctl enable drbd Step 13: Start initial full synchronization on the primary node sudo drbdadm primary pgconf --force Step 14: Build ext4 file system on DRBD device on the primary node sudo mkfs -t ext4 /dev/drbd0 mke2fs 1.42.9 (28-Dec-2013) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=0 blocks, Stripe width=0 blocks 786432 inodes, 3145367 blocks 157268 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=2151677952 96 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208 Allocating group tables: done Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done 22/96
  • 23. Replication in PostgreSQL - Deep Dive EnterpriseDB Step 15: Mount the DRBD device on the primary node sudo mount /dev/drbd0 /for_data df -h Filesystem Size Used Avail Use% Mounted on /dev/sda1 15G 5.7G 8.2G 41% / devtmpfs 1.4G 0 1.4G 0% /dev tmpfs 1.4G 58M 1.3G 5% /dev/shm tmpfs 1.4G 11M 1.4G 1% /run tmpfs 1.4G 0 1.4G 0% /sys/fs/cgroup tmpfs 278M 4.0K 278M 1% /run/user/42 tmpfs 278M 44K 278M 1% /run/user/1000 /dev/drbd0 12G 41M 12G 1% /for_data Step 16: Check the connections between primary and secondary nodes sudo netstat -n | grep 7788 tcp 0 0 172.16.214.151:47609 172.16.214.150:7788 ESTABLISHED tcp 0 0 172.16.214.151:7788 172.16.214.150:40336 ESTABLISHED Step 17: Check DRBD processes ps -ef --forest | grep drbd root 11109 2 0 13:35 ? 00:00:00 _ [drbd-reissue] root 88248 2 0 21:37 ? 00:00:00 _ [drbd_w_pgconf] root 88250 2 0 21:37 ? 00:00:00 _ [drbd0_submit] root 88256 2 0 21:37 ? 00:00:02 _ [drbd_s_pgconf] root 88262 2 6 21:37 ? 00:01:29 _ [drbd_r_pgconf] root 88269 2 0 21:37 ? 00:00:00 _ [drbd_a_pgconf] root 88270 2 0 21:37 ? 00:00:00 _ [drbd_as_pgconf] 23/96
  • 24. Replication in PostgreSQL - Deep Dive EnterpriseDB Step 18: Check the output of drbdmon tool drbdmon Step 19: Install PostgreSQL git clone git://git.postgresql.org/git/postgresql.git cd postgresql/ git checkout REL_11_STABLE ./configure --prefix=/usr/local/pg11 --enable-debug CFLAGS=-O0 make && make install 24/96
  • 25. Replication in PostgreSQL - Deep Dive EnterpriseDB Step 20: Initialize the cluster ./initdb -D /for_data/data Note that we are using the DRBD device to store data, since that device is being replicated to the secondary. Step 21: Start the server ./postgres -D /for_data/data/ -p 6543 Step 22: Create a table and insert some rows ./psql -p 6543 postgres create table for_testing(id int primary key, value varchar(255)); insert into for_testing values(1, 'One'); insert into for_testing values(2, 'Two'); insert into for_testing values(3, 'Three'); Step 23: Simulate disk failure on the primary node ./pg_ctl stop /for_data/data/ sudo umount /for_data 25/96
  • 26. Replication in PostgreSQL - Deep Dive EnterpriseDB 9.2.4 Steps to perform Failover Step 1: Install PostgreSQL on the secondary node Step 2: Check the data directory replicated by DRBD sudo mkdir /usr/local/pg11 sudo chown abbas:abbas /usr/local/pg11 sudo drbdadm primary pgconf sudo mount /dev/drbd0 /for_data/ ls -l /for_data/ total 20 drwx------ 19 abbas abbas 4096 Jan 3 05:08 data drwx------ 2 root root 16384 Jan 2 21:45 lost+found ls -l /for_data/data/ total 116 drwx------ 5 abbas abbas 4096 Jan 3 03:59 base drwx------ 2 abbas abbas 4096 Jan 3 04:00 global drwx------ 2 abbas abbas 4096 Jan 3 03:59 pg_commit_ts drwx------ 2 abbas abbas 4096 Jan 3 03:59 pg_dynshmem -rw------- 1 abbas abbas 4513 Jan 3 03:59 pg_hba.conf -rw------- 1 abbas abbas 1636 Jan 3 03:59 pg_ident.conf drwx------ 4 abbas abbas 4096 Jan 3 05:08 pg_logical drwx------ 4 abbas abbas 4096 Jan 3 03:59 pg_multixact drwx------ 2 abbas abbas 4096 Jan 3 03:59 pg_notify drwx------ 2 abbas abbas 4096 Jan 3 03:59 pg_replslot drwx------ 2 abbas abbas 4096 Jan 3 03:59 pg_serial drwx------ 2 abbas abbas 4096 Jan 3 03:59 pg_snapshots drwx------ 2 abbas abbas 4096 Jan 3 05:08 pg_stat drwx------ 2 abbas abbas 4096 Jan 3 05:08 pg_stat_tmp drwx------ 2 abbas abbas 4096 Jan 3 03:59 pg_subtrans drwx------ 2 abbas abbas 4096 Jan 3 03:59 pg_tblspc drwx------ 2 abbas abbas 4096 Jan 3 03:59 pg_twophase -rw------- 1 abbas abbas 3 Jan 3 03:59 PG_VERSION drwx------ 3 abbas abbas 4096 Jan 3 03:59 pg_wal drwx------ 2 abbas abbas 4096 Jan 3 03:59 pg_xact -rw------- 1 abbas abbas 88 Jan 3 03:59 postgresql.auto.conf -rw------- 1 abbas abbas 23866 Jan 3 03:59 postgresql.conf -rw------- 1 abbas abbas 64 Jan 3 03:59 postmaster.opts 26/96
  • 27. Replication in PostgreSQL - Deep Dive EnterpriseDB Step 3: Start the server on this data directory ./postgres -D /for_data/data/ -p 6543 Step 4: Check the table and the data in it ./psql -p 6543 postgres psql (11.1) Type "help" for help. postgres=# select * from for_testing; id | value ----+------- 1 | One 2 | Two 3 | Three (3 rows) 27/96
  • 28. Replication in PostgreSQL - Deep Dive EnterpriseDB 9.3 Trigger Based 9.3.1 Introduction to Slony-I Slony is a master to multiple slaves AFTER ROW trigger based asynchronous logical replication system for PostgreSQL. Slony supports cascading. Direct subscribers put load on master, indirect subscribers put load on direct subscribers. Slony uses the following terminology: Cluster : A named set of PostgreSQL instances between which replication takes place. Node : A named PostgreSQL instance that participates as master/slave in a replication cluster. Set : A set of tables that need to be replicated between two nodes. Origin & Subscriber : Each replication set has an origin (master) and a subscriber. Origin is where the modifications of the data take place and subscriber is where those changes get replicated to. Slon daemon : Slon daemon runs on each node in the cluster. It manages replication activity for that node. Slon processes replication events. Replication events are of two types: Configuration events : which occur when the configuration of the cluster is changed. Slon in this case would replicate the changed configuration to all the nodes. For example adding a table to a subscribed set. SYNC events : which occur when replicated tables are updated. Lets look in detail how does an insert in a table on origin node gets replicated to a slave node. The following diagram shows a two node cluster that is using Slony for replication between one master and one slave. Each slon daemon establishes connection with master as well as slave database. When the slony replication system is installed it performs the following steps: • Creates an AFTER INSERT UPDATE DELETE ROW trigger on the table to be replicated on the master node. • Creates an trigger to deny any writes to the replicated table on the slave. • Creates tables and functions required to support replication in a separate schema named after the cluster name . When the client inserts a row in the table on the master then following happens to do the replication: • The after row trigger inserts a log row in the table sl_log_1 or sl_log_2 table. • The slon daemon on the master inserts a row in sl_event and issues a NOTIFY. This 28/96
  • 29. Replication in PostgreSQL - Deep Dive EnterpriseDB generates a SYNC event. • The slon daemon on the slave listens to the notification and reads the sl_log_1 or sl_log_2 table form the remote database. • The slon daemon constructs the insert statement and executes it locally to replicate the row to the slave. 9.3.2 Advantages and Disadvantages of Slony Advantages: • Slony allows to replicate a small subset of tables in a database. • Slony works across different PostgreSQL major versions. • Slony provide ability to create additional indexes on slaves. • Slony can be used to upgrade from an older PostgreSQL version to a newer one. • Baring tables of the set, slony allows slaves to be used for read/write activity. • Load of indirect slaves is not put on the master. Only direct slaves are server’s load. Disadvantages: • Slony cannot replicate large objects, DDL commands, users and roles. • Slony is asynchronous and cannot provide ability to failover with zero transaction loss. • Slony puts load on the master. The more the slaves the more the load. • Slony mandates the use of primary key on all the tables to be replicated. 29/96 Origin Slave PostgreSQL PostgreSQL slon slon Remote connection Remote connection Local connection Local connection
  • 30. Replication in PostgreSQL - Deep Dive EnterpriseDB 9.3.3 Setup The setup consists of two CentOS 7 machines connected via LAN on which PostgreSQL version 10.7 and slony version 2.2.6 is installed. 9.3.4 Configuring PostgreSQL Replication using Slony-I Step 1: Disable and stop firewall on both the nodes sudo firewall-cmd --state sudo systemctl stop firewalld sudo systemctl disable firewalld sudo systemctl mask --now firewalld Step 2: Install PostgreSQL and Slony Download postgresql-10.7-1-linux-x64.run from EnterpriseDB website and install all the components. Run StackBuilder and install Slony 2.2.6. Step 3: Configure trust authentication in both master and slave As postgres user do the following cd /opt/PostgreSQL/10/bin/ ./pg_ctl stop -D ../data vim ../data/pg_hba.conf host all all 172.16.214.163/24 trust ./pg_ctl start -D ../data Step 4: Export environment variables export CLUSTERNAME=slony_example export MASTERDBNAME=for_slony export SLAVEDBNAME=for_slony export MASTERHOST=172.16.214.163 export SLAVEHOST=172.16.214.162 export MASTERPORT=5432 export SLAVEPORT=5432 30/96
  • 31. Replication in PostgreSQL - Deep Dive EnterpriseDB export REPLICATIONUSER=postgres export PATH=$PATH:/opt/PostgreSQL/10/bin/ Step 5: Make sure both servers are accessible from both machines ./psql -h $MASTERHOST -p $MASTERPORT -U $REPLICATIONUSER $MASTERDBNAME psql.bin (10.7) Type "help" for help. postgres=# q ./psql -h $SLAVEHOST -p $SLAVEPORT -U $REPLICATIONUSER $SLAVEDBNAME psql.bin (10.7) Type "help" for help. postgres=# q Step 6: Create database, tables and insert some values on master ./createdb -h $MASTERHOST -p $MASTERPORT -U $REPLICATIONUSER $MASTERDBNAME ./psql -h $MASTERHOST -p $MASTERPORT -U $REPLICATIONUSER $MASTERDBNAME CREATE TABLE student(sid INT PRIMARY KEY, sname VARCHAR(255), saddress VARCHAR(255)); CREATE TABLE teacher(tid INT PRIMARY KEY, tname VARCHAR(255), tsubject VARCHAR(255)); INSERT INTO student VALUES(1, 'Edward', 'Main Campus'); INSERT INTO student VALUES(2, 'Linda', 'Girls Hostel'); INSERT INTO student VALUES(3, 'Jason', 'Boys Hostel'); INSERT INTO teacher VALUES(1, 'Gary', 'Physics'); INSERT INTO teacher VALUES(2, 'Karen', 'Maths'); INSERT INTO teacher VALUES(3, 'Carol', 'History'); Step 7: Create database and tables on slave ./createdb -h $SLAVEHOST -p $SLAVEPORT -U $REPLICATIONUSER $SLAVEDBNAME ./psql -h $SLAVEHOST -p $SLAVEPORT -U $REPLICATIONUSER $SLAVEDBNAME CREATE TABLE student(sid INT PRIMARY KEY, sname VARCHAR(255), saddress VARCHAR(255)); CREATE TABLE teacher(tid INT PRIMARY KEY, tname VARCHAR(255), tsubject VARCHAR(255)); 31/96
  • 32. Replication in PostgreSQL - Deep Dive EnterpriseDB Step 8: Create and execute slony setup script to do the following steps ./slony_setup.sh <stdin>:21: Possible unsupported PostgreSQL version (100700) 10.7, defaulting to 8.4 support <stdin>:36: Possible unsupported PostgreSQL version (100700) 10.7, defaulting to 8.4 support Step 8.1: Define the schema name that slony uses to create all slony objects, in our example it is _slony_example cluster name = $CLUSTERNAME; Step 8.2: Provide connection info that is used by slonik to connect to master and slave node 1 admin conninfo = 'dbname=$MASTERDBNAME host=$MASTERHOST port=$MASTERPORT user=$REPLICATIONUSER'; node 2 admin conninfo = 'dbname=$SLAVEDBNAME host=$SLAVEHOST port=$SLAVEPORT user=$REPLICATIONUSER'; Step 8.3: Initialize the first node. Its id MUST be 1. This creates the schema _slony_example containing all replication system specific database objects. The main tables that store change log are _slony_example.sl_log_1 & _slony_example.sl_log_2. The main function that adds change log to these tables is _slony_example.logtrigger which calls the C function _Slony_I_logTrigger init cluster ( id=1, comment = 'Master Node'); Step 8.4: Create a table set that can be subscribed by slaves create set (id=1, origin=1, comment='some tables'); set add table (set id=1, origin=1, id=1, fully qualified name = 'public.student', comment='student table'); set add table (set id=1, origin=1, id=2, fully qualified name = 'public.teacher', comment='teacher table'); Step 8.5: Create a slave node store node (id=2, comment = 'Slave node', event node=1); Step 8.6: Provide connection info for nodes to be able to connect to listen for events store path (server = 1, client = 2, conninfo='dbname=$MASTERDBNAME host=$MASTERHOST port=$MASTERPORT user=$REPLICATIONUSER'); store path (server = 2, client = 1, conninfo='dbname=$SLAVEDBNAME host=$SLAVEHOST port=$SLAVEPORT user=$REPLICATIONUSER'); 32/96
  • 33. Replication in PostgreSQL - Deep Dive EnterpriseDB The setup script performs the following actions Action 1: It creates the following triggers on each of the table in the set on the master for_slony=# d+ student Table "public.student" Column | Type | ----------+------------------------+ sid | integer | sname | character varying(255) | saddress | character varying(255) | Indexes: "student_pkey" PRIMARY KEY, btree (sid) Triggers: _slony_example_logtrigger AFTER INSERT OR DELETE OR UPDATE ON student FOR EACH ROW EXECUTE PROCEDURE _slony_example.logtrigger('_slony_example','1','k') _slony_example_truncatetrigger BEFORE TRUNCATE ON student FOR EACH STATEMENT EXECUTE PROCEDURE _slony_example.log_truncate('1') Disabled user triggers: _slony_example_denyaccess BEFORE INSERT OR DELETE OR UPDATE ON student FOR EACH ROW EXECUTE PROCEDURE _slony_example.denyaccess('_slony_example') _slony_example_truncatedeny BEFORE TRUNCATE ON student FOR EACH STATEMENT EXECUTE PROCEDURE _slony_example.deny_truncate() 33/96
  • 34. Replication in PostgreSQL - Deep Dive EnterpriseDB Action 2: It creates the following triggers on each of the table in the set on the slave for_slony=# d+ student Table "public.student" Column | Type | ----------+------------------------+ sid | integer | sname | character varying(255) | saddress | character varying(255) | Indexes: "student_pkey" PRIMARY KEY, btree (sid) Triggers: _slony_example_denyaccess BEFORE INSERT OR DELETE OR UPDATE ON student FOR EACH ROW EXECUTE PROCEDURE _slony_example.denyaccess('_slony_example') _slony_example_truncatedeny BEFORE TRUNCATE ON student FOR EACH STATEMENT EXECUTE PROCEDURE _slony_example.deny_truncate() Disabled user triggers: _slony_example_logtrigger AFTER INSERT OR DELETE OR UPDATE ON student FOR EACH ROW EXECUTE PROCEDURE _slony_example.logtrigger('_slony_example', '1', 'k') _slony_example_truncatetrigger BEFORE TRUNCATE ON student FOR EACH STATEMENT EXECUTE PROCEDURE _slony_example.log_truncate('1') 34/96
  • 35. Replication in PostgreSQL - Deep Dive EnterpriseDB Action 3: It creates the _slony_example schema with the following objects on both nodes for_slony=# dtvs _slony_example.* List of relations Schema | Name | Type | Owner ----------------+----------------------------+----------+---------- _slony_example | sl_nodelock_nl_conncnt_seq | sequence | postgres _slony_example | sl_log_status | sequence | postgres _slony_example | sl_action_seq | sequence | postgres _slony_example | sl_local_node_id | sequence | postgres _slony_example | sl_event_seq | sequence | postgres _slony_example | sl_log_script | table | postgres _slony_example | sl_registry | table | postgres _slony_example | sl_apply_stats | table | postgres _slony_example | sl_nodelock | table | postgres _slony_example | sl_setsync | table | postgres _slony_example | sl_table | table | postgres _slony_example | sl_sequence | table | postgres _slony_example | sl_node | table | postgres _slony_example | sl_listen | table | postgres _slony_example | sl_path | table | postgres _slony_example | sl_log_1 | table | postgres _slony_example | sl_log_2 | table | postgres _slony_example | sl_subscribe | table | postgres _slony_example | sl_event | table | postgres _slony_example | sl_confirm | table | postgres _slony_example | sl_seqlog | table | postgres _slony_example | sl_components | table | postgres _slony_example | sl_set | table | postgres _slony_example | sl_config_lock | table | postgres _slony_example | sl_event_lock | table | postgres _slony_example | sl_archive_counter | table | postgres _slony_example | sl_failover_targets | view | postgres _slony_example | sl_seqlastvalue | view | postgres _slony_example | sl_status | view | postgres (29 rows) 35/96
  • 36. Replication in PostgreSQL - Deep Dive EnterpriseDB Action 4: Creates the following triggers List of triggers ----------------+------------------------------------+ Schema | Name | ----------------+------------------------------------+ _slony_example | logapply _slony_example | log_truncate _slony_example | deny_truncate _slony_example | logtrigger _slony_example | denyaccess _slony_example | lockedset Action 5: Creates around 150 functions in the _slony_example schema It does not run any daemon on master or slave, i.e. it does not start replication process, It does not copy any data from master to slave. Step 9: Start the slon daemon on both the master and slave slon $CLUSTERNAME "dbname=$MASTERDBNAME user=$REPLICATIONUSER host=$MASTERHOST port=$MASTERPORT" slon $CLUSTERNAME "dbname=$SLAVEDBNAME user=$REPLICATIONUSER host=$SLAVEHOST port=$SLAVEPORT" slon daemon should emit messages of the sort INFO remoteWorkerThread_2: SYNC 5000000178 done in 0.003 seconds INFO remoteWorkerThread_2: SYNC 5000000179 done in 0.003 seconds NOTICE: Slony-I: log switch to sl_log_2 complete - truncate sl_log_1 INFO cleanupThread: 0.020 seconds for cleanupEvent() INFO remoteWorkerThread_2: SYNC 5000000180 done in 0.002 seconds INFO remoteWorkerThread_2: SYNC 5000000181 done in 0.004 seconds Connection problems result in errors like this WARN remoteListenThread_2: DB connection failed - sleep 10 seconds ERROR slon_connectdb: PQconnectdb("dbname=for_slony host=w.x.y.z port=5432 user=postgres") failed - fe_sendauth: no password supplied WARN remoteListenThread_2: DB connection failed - sleep 10 seconds ERROR slon_connectdb: PQconnectdb("dbname=for_slony host=w.x.y.z port=5432 user=postgres") failed - fe_sendauth: no password supplied WARN remoteListenThread_2: DB connection failed - sleep 10 seconds ERROR slon_connectdb: PQconnectdb("dbname=for_slony host=w.x.y.z port=5432 user=postgres") failed - fe_sendauth: no password supplied 36/96
  • 37. Replication in PostgreSQL - Deep Dive EnterpriseDB Step 10: Start the subscription ./slony_sub.sh The following script instructs slony to subscribe set whose id is 1 and whose provider (master) id is 1 for receiver (slave) whose id is 2 #!/bin/sh slonik <<_EOF_ # ---- # This defines which namespace the replication system uses # ---- cluster name = $CLUSTERNAME; # ---- # Admin conninfo’s are used by the slonik program to connect # to the node databases. So these are the PQconnectdb arguments # that connect from the administrators workstation (where # slonik is executed). # ---- node 1 admin conninfo = 'dbname=$MASTERDBNAME host=$MASTERHOST port=$MASTERPORT user=$REPLICATIONUSER'; node 2 admin conninfo = 'dbname=$SLAVEDBNAME host=$SLAVEHOST port=$SLAVEPORT user=$REPLICATIONUSER'; # ---- # Node 2 subscribes set 1 # ---- subscribe set ( id = 1, provider = 1, receiver = 2, forward = no); _EOF_ 37/96
  • 38. Replication in PostgreSQL - Deep Dive EnterpriseDB slon will emit messages similar to the following, which copy initial data form master to slave CONFIG version for "dbname=for_slony host=172.16.214.163 port=5432 user=postgres" is 100700 CONFIG remoteWorkerThread_1: connected to provider DB CONFIG remoteWorkerThread_1: prepare to copy table "public"."student" CONFIG remoteWorkerThread_1: prepare to copy table "public"."teacher" CONFIG remoteWorkerThread_1: all tables for set 1 found on subscriber CONFIG remoteWorkerThread_1: copy table "public"."student" CONFIG remoteWorkerThread_1: Begin COPY of table "public"."student" NOTICE: truncate of "public"."student" succeeded CONFIG remoteWorkerThread_1: 62 bytes copied for table "public"."student" CONFIG remoteWorkerThread_1: 0.082 seconds to copy table "public"."student" CONFIG remoteWorkerThread_1: copy table "public"."teacher" CONFIG remoteWorkerThread_1: Begin COPY of table "public"."teacher" NOTICE: truncate of "public"."teacher" succeeded CONFIG remoteWorkerThread_1: 45 bytes copied for table "public"."teacher" CONFIG remoteWorkerThread_1: 0.031 seconds to copy table "public"."teacher" INFO remoteWorkerThread_1: copy_set SYNC found, use event seqno 5000000311. INFO remoteWorkerThread_1: 0.018 seconds to build initial setsync status INFO copy_set 1 done in 0.172 seconds CONFIG enableSubscription: sub_set=1 CONFIG storeListen: li_origin=1 li_receiver=2 li_provider=1 CONFIG remoteWorkerThread_1: update provider configuration CONFIG remoteWorkerThread_1: added active set 1 to provider 1 CONFIG version for "dbname=for_slony host=172.16.214.163 port=5432 user=postgres" is 100700 INFO remoteWorkerThread_1: SYNC 5000000297 done in 0.082 seconds INFO remoteWorkerThread_1: SYNC 5000000301 done in 0.005 seconds INFO remoteWorkerThread_1: SYNC 5000000309 done in 0.038 seconds INFO remoteWorkerThread_1: SYNC 5000000310 done in 0.030 seconds INFO remoteWorkerThread_1: SYNC 5000000311 done in 0.005 seconds 38/96
  • 39. Replication in PostgreSQL - Deep Dive EnterpriseDB Step 11: Check replicated data in slave ./psql -h $SLAVEHOST -p $SLAVEPORT -U $REPLICATIONUSER $SLAVEDBNAME psql.bin (10.7) Type "help" for help. for_slony=# select * from teacher; tid | tname | tsubject -----+-------+---------- 1 | Gary | Physics 2 | Karen | Maths 3 | Carol | History (3 rows) for_slony=# select * from student; sid | sname | saddress -----+--------+-------------- 1 | Edward | Main Campus 2 | Linda | Girls Hostel 3 | Jason | Boys Hostel (3 rows) Step 12: Check insert operation on slave for_slony=# insert into student values(4, 'David', 'Kent'); ERROR: Slony-I: Table student is replicated and cannot be modified on a subscriber node - role=0 39/96
  • 40. Replication in PostgreSQL - Deep Dive EnterpriseDB Step 13: Try insert on master insert into student values(4, 'David', 'Kent'); Check the log entry: select * from _slony_example.sl_log_2; ------------+----------+-------------+---------------+------------------+ log_origin | log_txid | log_tableid | log_actionseq | log_tablenspname | ------------+----------+-------------+---------------+------------------+ 1 | 5446 | 1 | 1 | public | ------------+----------+-------------+---------------+------------------+ ------------------+-------------+-----------------+---------------------------------- log_tablerelname | log_cmdtype | log_cmdupdncols | log_cmdargs ------------------+-------------+-----------------+---------------------------------- student | I | 0 | {sid,4,sname,David,saddress,Kent} ------------------+-------------+-----------------+---------------------------------- for_slony=# select ctid,xmin, xmax, cmin, * from student; ctid | xmin | xmax | cmin | sid | sname | saddress -------+------+------+------+-----+--------+-------------- (0,1) | 560 | 0 | 0 | 1 | Edward | Main Campus (0,2) | 561 | 0 | 0 | 2 | Linda | Girls Hostel (0,3) | 562 | 0 | 0 | 3 | Jason | Boys Hostel (0,4) | 5446 | 0 | 0 | 4 | David | Kent (4 rows) 40/96
  • 41. Replication in PostgreSQL - Deep Dive EnterpriseDB Step 14: Try update on master for_slony=# update student set saddress = 'Whales' where sid = 4; UPDATE 1 select * from _slony_example.sl_log_1; ------------+----------+-------------+---------------+------------------+ log_origin | log_txid | log_tableid | log_actionseq | log_tablenspname | ------------+----------+-------------+---------------+------------------+ 1 | 6239 | 1 | 2 | public | ------------+----------+-------------+---------------+------------------+ ------------------+-------------+-----------------+------------------------ log_tablerelname | log_cmdtype | log_cmdupdncols | log_cmdargs ------------------+-------------+-----------------+------------------------ student | U | 1 | {saddress,Whales,sid,4} ------------------+-------------+-----------------+------------------------ Step 15: Check result on slave for_slony=# select * from student; sid | sname | saddress -----+--------+-------------- 1 | Edward | Main Campus 2 | Linda | Girls Hostel 3 | Jason | Boys Hostel 4 | David | Whales (4 rows) 41/96
  • 42. Replication in PostgreSQL - Deep Dive EnterpriseDB Step 16: Try delete on master for_slony=# delete from student where sid = 4; DELETE 1 select * from _slony_example.sl_log_1; ------------+----------+-------------+---------------+------------------+ log_origin | log_txid | log_tableid | log_actionseq | log_tablenspname | ------------+----------+-------------+---------------+------------------+ 1 | 6407 | 1 | 3 | public | ------------+----------+-------------+---------------+------------------+ ------------------+-------------+-----------------+------------------------ log_tablerelname | log_cmdtype | log_cmdupdncols | log_cmdargs ------------------+-------------+-----------------+------------------------ student | D | 0 | {sid,4} ------------------+-------------+-----------------+------------------------ Step 17: Check result on slave for_slony=# select * from student; sid | sname | saddress -----+--------+-------------- 1 | Edward | Main Campus 2 | Linda | Girls Hostel 3 | Jason | Boys Hostel (3 rows) 42/96
  • 43. Replication in PostgreSQL - Deep Dive EnterpriseDB 9.3.5 Steps to perform controlled switchover A small slonik script can achieve a controlled switch over in which we switch roles of the two nodes completely. The old master would now become slave and the old slave would be the new master. Please note that this is a planned activity and it has nothing to do with any type of failure. #!/bin/sh slonik <<_EOF_ cluster name = $CLUSTERNAME; node 1 admin conninfo = 'dbname=$MASTERDBNAME host=$MASTERHOST port=$MASTERPORT user=$REPLICATIONUSER'; node 2 admin conninfo = 'dbname=$SLAVEDBNAME host=$SLAVEHOST port=$SLAVEPORT user=$REPLICATIONUSER'; lock set (id = 1, origin = 1); wait for event (origin = 1, confirmed = 2, wait on=1); move set (id = 1, old origin = 1, new origin = 2); wait for event (origin = 1, confirmed = 2, wait on=1); _EOF_ After the command runs the slony trigger definitions on tables in the set would have changed on the new master. _slony_example_denyaccess & _slony_example_denyaccess triggers would get disabled and _slony_example_logtrigger & _slony_example_truncatetrigger enabled on the new master. Changes to the tables in the set would therefore be possible on the new master. 43/96
  • 44. Replication in PostgreSQL - Deep Dive EnterpriseDB 9.4 Introduction to WAL 9.4.1 What is WAL and Why is it required In PostgreSQL all changes made by every transaction are first saved in a log file and then the result of the transaction is sent to the initiating client. Data files are not changed on every transaction. This is a standard mechanism to prevent data loss in case of situations like OS crash, hardware failure, PostgreSQL crash etc. This mechanism is called Write Ahead Logging and the log file is call Write Ahead Log (WAL). Each change that the transaction performs (INSERT, UPDATE, DELETE, COMMIT) is written in the log as a WAL record. WAL records are first written into an in-memory WAL buffer. On transaction commit the records are written into a WAL segment file on the disk. Log sequence number (LSN) of a WAL record represents the location/position where it is saved in the log file. LSN is used as a unique id of the WAL record. Logically transaction log is file whose size is 2^64 bytes. LSN is therefore a 64bit number represented as two 32 bit hexadecimal numbers separated by a /. For example: select pg_current_wal_lsn(); pg_current_wal_lsn -------------------- 0/2BDBBD0 (1 row) 44/96
  • 45. Replication in PostgreSQL - Deep Dive EnterpriseDB In the event of a system crash the database can recover committed transactions from the WAL. While recovering PostgreSQL starts recovery from the last REDO point or checkpoint. A checkpoint is a point in the transaction log at which all data files have been updated to reflect the information in the log. The process of saving the WAL records from the log file to the actual data files is called check-pointing. Lets consider a case where database crashes after two transactions which perform one insert each and WAL is used for recovery. 1. Assume a CHECKPOINT is issued which stores the location of the latest REDO point in the current WAL segment. This also flushes all dirty pages in the shared buffer pool to the disk. This guarantees that WAL records before the REDO point are no longer needed for recovery, since all data has been flushed to the disk pages. 2. First INSERT statement is issued. The table’s page is loaded from disk to the buffer pool. 3. A tuple is inserted into the loaded page. 4. WAL record of this insert is saved into the WAL buffer at location LSN_1. 5. Update page LSN, which identifies WAL record for last change to this page, from LSN_0 to LSN_1. 6. First COMMIT statement is issued. 7. WAL record of this commit action is written into the WAL buffer, and then, all WAL records in the WAL buffer upto this page’s LSN are flushed to the WAL segment file. 8. For the second INSERT and commit steps 2 to 7 are repeated. 45/96
  • 46. Replication in PostgreSQL - Deep Dive EnterpriseDB Operation performed by the client CHECKPOINT BEGIN; INSERT INTO TAB VALUES (‘A’); COMMIT; BEGIN; INSERT INTO TAB VALUES (‘B’); COMMIT; Shared buffer pool WAL Buffer WAL Segment Data files containing pages 46/96 TAB LSN_0 LSN_1 A A LSN_1 COMMIT A LSN_1 COMMIT TAB LSN_1 LSN_2 A B LSN_2 COMMIT B B LSN_2 COMMIT TAB LSN_0 TAB LSN_0 REDO Point CHECKPOINT REDO Point CHECKPOINT
  • 47. Replication in PostgreSQL - Deep Dive EnterpriseDB In the event of an operating system crash, all of data on the shared buffer pool will be lost, however all modifications of the page have been written into the WAL segment files as history data. The following steps show how our database cluster can recover back to the state immediately before the crash using WAL records. There is no need to do anything special, since PostgreSQL will automatically enter into the recovery-mode after restarting. 1. PostgreSQL reads the WAL record of the first INSERT statement from the appropriate WAL segment file. 2. PostgreSQL loads the table's page from the database cluster into the shared buffer pool. 3. PostgreSQL compares the WAL record's LSN (LSN_1) with page LSN (LSN_0). Since LSN_1 is greater than LSN_0, the tuple in the WAL record is inserted into the page and page's LSN is updated to LSN_1. The remaining WAL records are replayed in the similar manner. Shared Buffer pool WAL Segment Data files containing pages 47/96 TAB LSN_0 TAB LSN_0 LSN_1 A TAB LSN_1 LSN_2 AB TAB LSN_0 A LSN_1 COMMIT B LSN_2 COMMIT REDO Point
  • 48. Replication in PostgreSQL - Deep Dive EnterpriseDB 9.4.2 Transaction Log and WAL Segment Files In PostgreSQL transaction log is a virtual file with a capacity of 8-byte length. Physically the log is divided into 16 Mega byte files, each of which is called a WAL segment. WAL segment file name is a 24 digit number with the naming rule as follows: Assuming that current time line ID is 0x00000001 the first WAL segment file names will be 00000001 00000000 0000000 00000001 00000000 0000001 00000001 00000000 0000002 ………. 00000001 00000001 0000000 00000001 00000001 0000001 00000001 00000001 0000002 ………… 00000001 FFFFFFFF FFFFFFFD 00000001 FFFFFFFF FFFFFFFE 00000001 FFFFFFFF FFFFFFFF For Example: select pg_walfile_name('0/2BDBBD0'); pg_walfile_name -------------------------- 000000010000000000000002 9.4.3 WAL Writer WAL writer is a background process to check WAL buffer periodically and write all unwritten WAL records to into the WAL segments. WAL writer avoids burst of IO activity and spans it over time with little amount of IO activity. The configuration 48/96 Timeline ID 8 digits Log Sequence Number / 256 8 digits Log Sequence Number % 256 8 digits
  • 49. Replication in PostgreSQL - Deep Dive EnterpriseDB parameter wal_writer_delay controls how often WAL writer flushes the WAL, with default value of 200 ms. 9.4.4 WAL Segment File Management WAL segment files are stored in pg_wal sub-directory. PostgreSQL switches for a new WAL segment file under the following conditions: 1. WAL segment has been filled up. 2. The function pg_switch_wal has been issued. 3. archive_mode is enabled and the time set to archive_timeout has been exceeded. Switched WAL files can either be removed or recycled i.e. renamed and reused for future. The number of WAL files that the server would retain at any point in time depends on server configuration as well as server activity. Whenever the checkpoint starts, PostgreSQL estimates and prepares the number of WAL segment files required for this checkpoint cycle. Such estimate is made with regards to the numbers of files consumed in previous checkpoint cycles. They are counted from the segment that contains the prior REDO point, and the value is to be between min_wal_size (by default, 80 MB, i.e. 5 files) and max_wal_size (1 GB, i.e. 64 files). If a checkpoint starts, necessary files will be held and recycled, while the unnecessary ones removed. A specific example is shown in the diagram below. Assuming that there are six files before checkpoint starts, WAL_3 contains the prior REDO point (or REDO point in version 11), and PostgreSQL estimates that five files are needed. In this case, WAL_1 will be renamed as WAL_7 for recycling and WAL_2 will be removed. 49/96 WAL_1 WAL_2 WAL_3 WAL_4 WAL_5 WAL_6 WAL_1 WAL_2 WAL_3 WAL_4 WAL_5 WAL_6 WAL_7 REDO Point Estimated number of WAL segments needed by server Unneeded file WAL_1 renamed as WAL_7 to be re-used
  • 50. Replication in PostgreSQL - Deep Dive EnterpriseDB 9.4.5 WAL Example Step 1:SELECT datname, oid FROM pg_database WHERE datname = 'postgres'; datname | oid ----------+------- postgres | 15709 (1 row) Note the database OID i.e. 15709 Step 2: SELECT oid,* from pg_tablespace; oid | spcname | spcowner | spcacl | spcoptions ------+------------+----------+--------+------------ 1663 | pg_default | 10 | | 1664 | pg_global | 10 | | (2 rows) Note the table space OID i.e. 1663 Step 3: SELECT pg_current_wal_lsn(); pg_current_wal_lsn -------------------- 0/1C420B8 (1 row) Note the LSN i.e. 0/1C420B8 Step 4: CREATE TABLE abc(a VARCHAR(10)); Step 5: SELECT pg_relation_filepath('abc'); pg_relation_filepath ---------------------- base/15709/16384 (1 row) Note the relation file name base/15709/16384 50/96
  • 51. Replication in PostgreSQL - Deep Dive EnterpriseDB Step 6: ./pg_waldump --path=/tmp/sd/pg_wal –start=0/1C420B8 and use the Start LSN noted in step 3. Note that the WAL contains the instruction to create physical file 15709 → database postgres → noted in step 1 16384 → table abc → noted in step 5 rmgr Len(rec /tot) tx lsn prev desc XLOG 30/ 30 0 0/01C420B8 0/01C42080 NEXTOID 24576 Storage 42/ 42 0 0/01C420D8 0/01C420B8 CREATE base/15709/16384 Heap 203/203 1216 0/01C42108 0/01C420D8 INSERT off 2, blkref #0: rel 1663/15709/1247 blk 0 Btree 64/ 64 1216 0/01C421D8 0/01C42108 INSERT_LEAF off 298, blkref #0: rel 1663/15709/2703 blk 2 Btree 64/ 64 1216 0/01C42218 0/01C421D8 INSERT_LEAF off 7, blkref #0: rel 1663/15709/2704 blk 5 Heap 80/ 80 1216 0/01C42258 0/01C42218 INSERT off 30, blkref #0: rel 1663/15709/2608 blk 9 Btree 72/ 72 1216 0/01C422A8 0/01C42258 INSERT_LEAF off 243, blkref #0: rel 1663/15709/2673 blk 51 Btree 72/ 72 1216 0/01C422F0 0/01C422A8 INSERT_LEAF off 170, blkref #0: rel 1663/15709/2674 blk 61 Heap 203/203 1216 0/01C42338 0/01C422F0 INSERT off 6, blkref #0: rel 1663/15709/1247 blk 1 Btree 64/64 1216 0/01C42408 0/01C42338 INSERT_LEAF off 298, blkref #0: rel 1663/15709/2703 blk 2 Btree 72/ 72 1216 0/01C42448 0/01C42408 INSERT_LEAF off 3, blkref #0: rel 1663/15709/2704 blk 1 Heap 80/ 80 1216 0/01C42490 0/01C42448 INSERT off 36, blkref #0: rel 1663/15709/2608 blk 9 Btree 72/ 72 1216 0/01C424E0 0/01C42490 INSERT_LEAF off 243, blkref #0: rel 1663/15709/2673 blk 51 Btree 72/ 72 1216 0/01C42528 0/01C424E0 INSERT_LEAF off 97, blkref #0: rel 1663/15709/2674 blk 57 Heap 199/199 1216 0/01C42570 0/01C42528 INSERT off 2, blkref #0: rel 1663/15709/1259 blk 0 Btree 64/ 64 1216 0/01C42638 0/01C42570 INSERT_LEAF off 257, blkref #0: rel 1663/15709/2662 blk 2 Btree 64/ 64 1216 0/01C42678 0/01C42638 INSERT_LEAF off 8, blkref #0: rel 1663/15709/2663 blk 1 Btree 64/ 64 1216 0/01C426B8 0/01C42678 INSERT_LEAF off 217, blkref #0: rel 1663/15709/3455 blk 5 Heap 171/171 1216 0/01C426F8 0/01C426B8 INSERT off 53, blkref #0: rel 1663/15709/1249 blk 16 Btree 64/ 64 1216 0/01C427A8 0/01C426F8 INSERT_LEAF off 185, blkref #0: rel 1663/15709/2658 blk 25 Btree 64/ 64 1216 0/01C427E8 0/01C427A8 INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16 Heap 171/171 1216 0/01C42828 0/01C427E8 INSERT off 54, blkref #0: rel 1663/15709/1249 blk 16 Btree 72/ 72 1216 0/01C428D8 0/01C42828 INSERT_LEAF off 186, blkref #0: rel 1663/15709/2658 blk 25 Btree 64/ 64 1216 0/01C42920 0/01C428D8 INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16 51/96
  • 52. Replication in PostgreSQL - Deep Dive EnterpriseDB Heap 171/171 1216 0/01C42960 0/01C42920 INSERT off 55, blkref #0: rel 1663/15709/1249 blk 16 Btree 72/ 72 1216 0/01C42A10 0/01C42960 INSERT_LEAF off 187, blkref #0: rel 1663/15709/2658 blk 25 Btree 64/ 64 1216 0/01C42A58 0/01C42A10 INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16 Heap 171/171 1216 0/01C42A98 0/01C42A58 INSERT off 1, blkref #0: rel 1663/15709/1249 blk 17 Btree 72/ 72 1216 0/01C42B48 0/01C42A98 INSERT_LEAF off 186, blkref #0: rel 1663/15709/2658 blk 25 Btree 64/ 64 1216 0/01C42B90 0/01C42B48 INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16 Heap 171/171 1216 0/01C42BD0 0/01C42B90 INSERT off 3, blkref #0: rel 1663/15709/1249 blk 17 Btree 72/ 72 1216 0/01C42C80 0/01C42BD0 INSERT_LEAF off 188, blkref #0: rel 1663/15709/2658 blk 25 Btree 64/ 64 1216 0/01C42CC8 0/01C42C80 INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16 Heap 171/171 1216 0/01C42D08 0/01C42CC8 INSERT off 5, blkref #0: rel 1663/15709/1249 blk 17 Btree 72/ 72 1216 0/01C42DB8 0/01C42D08 INSERT_LEAF off 186, blkref #0: rel 1663/15709/2658 blk 25 Btree 64/ 64 1216 0/01C42E00 0/01C42DB8 INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16 Heap 171/171 1216 0/01C42E40 0/01C42E00 INSERT off 30, blkref #0: rel 1663/15709/1249 blk 32 Btree 72/ 72 1216 0/01C42EF0 0/01C42E40 INSERT_LEAF off 189, blkref #0: rel 1663/15709/2658 blk 25 Btree 64/ 64 1216 0/01C42F38 0/01C42EF0 INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16 Heap 80/ 80 1216 0/01C42F78 0/01C42F38 INSERT off 25, blkref #0: rel 1663/15709/2608 blk 11 Btree 72/ 72 1216 0/01C42FC8 0/01C42F78 INSERT_LEAF off 131, blkref #0: rel 1663/15709/2673 blk 44 Btree 72/ 72 1216 0/01C43010 0/01C42FC8 INSERT_LEAF off 66, blkref #0: rel 1663/15709/2674 blk 46 Standby 42/ 42 1216 0/01C43058 0/01C43010 LOCK xid 1216 db 15709 rel 16384 Txn 405/405 1216 0/01C43088 0/01C43058 COMMIT 2019-03-04 07:42:23.165514 EST;... snapshot 2608 relcache 16384 Standby 50/ 50 0 0/01C43220 0/01C43088 RUNNING_XACTS nextXid 1217 latestCompletedXid 1216 oldestRunningXid 1217 Step 7: SELECT pg_current_wal_lsn(); pg_current_wal_lsn -------------------- 0/1C43258 (1 row) Step 8: INSERT INTO abc VALUES('pkn'); 52/96
  • 53. Replication in PostgreSQL - Deep Dive EnterpriseDB Step 9: ./pg_waldump --path=/tmp/sd/pg_wal --start=0/1C43258 and use start LSN from step 7. 1663 → pg_default tablespace → noted in step 2 15709 → database postgres → noted in step 1 16384 → table abc → noted in step 5 rmgr Len (rec/ tot) tx lsn prev desc Heap 59/59 1217 0/01C43258 0/01C43220 INSERT+INIT off 1, blkref #0: rel 1663/15709/16384 blk 0 Transaction 34/34 1217 0/01C43298 0/01C43258 COMMIT 2019-03-04 07:43:45.887511 EST Standby 54/54 0 0/01C432C0 0/01C43298 RUNNING_XACTS nextXid 1218 latestCompletedXid 1216 oldestRunningXid 1217; 1 xacts: 1217 Step 10: SELECT pg_current_wal_lsn(); pg_current_wal_lsn -------------------- 0/1C432F8 (1 row) Step 11: INSERT INTO abc VALUES('ujy'); Step 12: ./pg_waldump --path=/tmp/sd/pg_wal –start=0/1C432F8 and use start LSN as noted in step 10. rmgr Len (rec/ tot) tx lsn prev desc Heap 59/59 1218 0/01C432F8 0/01C432C0 INSERT off 2, blkref #0: rel 1663/15709/16384 blk 0 Transaction 34/34 1218 0/01C43338 0/01C432F8 COMMIT 2019-03-04 07:44:25.449151 EST Standby 50/50 0 0/01C43360 0/01C43338 RUNNING_XACTS nextXid 1219 latestCompletedXid 1218 oldestRunningXid 1219 53/96
  • 54. Replication in PostgreSQL - Deep Dive EnterpriseDB Step 13: Check the actual tuples in the WAL segment files. ---------+---------------------------------------------------+----------------+ Offset | Hex Bytes | ASCII chars | ---------+---------------------------------------------------+----------------+ 00000060 | 3b 00 00 00 c3 04 00 00 28 00 40 02 00 00 00 00 |;.......(.@.....| 00000070 | 00 0a 00 00 ec 28 75 6e 00 20 0a 00 7f 06 00 00 |.....(un. ......| 00000080 | 5d 3d 00 00 00 40 00 00 00 00 00 00 ff 03 01 00 |]=...@..........| 00000090 | 02 08 18 00 09 70 6b 6e 03 00 00 00 00 00 00 00 |.....pkn........| 000000a0 | 22 00 00 00 c3 04 00 00 60 00 40 02 00 00 00 00 |".......`.@.....| 000000b0 | 00 01 00 00 dd 4c 87 04 ff 08 e4 73 44 e7 41 26 |.....L.....sD.A&| 000000c0 | 02 00 00 00 00 00 00 00 32 00 00 00 00 00 00 00 |........2.......| 000000d0 | a0 00 40 02 00 00 00 00 10 08 00 00 9e 01 36 88 |[email protected].| 000000e0 | ff 18 00 00 00 00 00 00 00 00 00 03 00 00 c4 04 |................| 000000f0 | 00 00 c4 04 00 00 c3 04 00 00 00 00 00 00 00 00 |................| 00000100 | 3b 00 00 00 c4 04 00 00 c8 00 40 02 00 00 00 00 |;.........@.....| 00000110 | 00 0a 00 00 33 df b4 71 00 20 0a 00 7f 06 00 00 |....3..q. ......| 00000120 | 5d 3d 00 00 00 40 00 00 00 00 00 00 ff 03 01 00 |]=...@..........| 00000130 | 02 08 18 00 09 75 6a 79 04 00 00 00 00 00 00 00 |.....ujy........| 00000140 | 22 00 00 00 c4 04 00 00 00 01 40 02 00 00 00 00 |".........@.....| 00000150 | 00 01 00 00 96 2e 96 a6 ff 08 d8 f3 79 ed 41 26 |............y.A&| 00000160 | 02 00 00 00 00 00 00 00 32 00 00 00 00 00 00 00 |........2.......| 00000170 | 40 01 40 02 00 00 00 00 10 08 00 00 eb 6b 95 36 |@[email protected]| 00000180 | ff 18 00 00 00 00 00 00 00 00 00 03 00 00 c5 04 |................| 00000190 | 00 00 c5 04 00 00 c4 04 00 00 00 00 00 00 00 00 |................| 000001a0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 54/96
  • 55. Replication in PostgreSQL - Deep Dive EnterpriseDB 9.4.6 Overview of Replication Options based on WAL Continuous WALArchiving Copying WAL files as they are generated, into any location other than pg_wall sub- directory for the purpose of archiving them is called WAL archiving. To archive a script provided by the user is invoked by PostgreSQL each time a WAL file is generated. The script can use scp command to copy the file to one or more locations. The location can be an NFS mount. Once archived the WAL segment files can be used to recover database to any specified point in time. Log Shipping Based Replication - File Level The process of copying log files to another PostgreSQL server for the purpose of creating another standby server by replaying WAL files is called Log Shipping. This server is configured to be in recovery mode. The sole purpose of this server is to apply any new WAL files that they arrive. This second server then, becomes a warm backup of the primary PostgreSQL server also termed as standby. The standby can also be configured to be a read replica, where it can also serve read-only queries. This is called a hot standby. Log Shipping Based Replication - Block Level Streaming replication improves the log shipping process. Instead of waiting for the WAL switch the records are sent as and when they are generated thus improving replication delay. The second improvement is that the standby server will connect to the primary server over the network, using a replication protocol. The primary server can then send WAL records directly over this connection without having to rely on scripts provided by the end user. How long the primary should retain WAL segment files? Without any streaming replication clients, the server can discard/recycle the WAL segment file once the archive script reports success, unless they are not required for crash recovery. In the presence of standby clients though, there is a problem : The server needs to keep around WAL files long enough for as long as the slowest standby needs them. If the standby, that was taken down for a while, comes back online and asks the primary for a WAL file that the primary no longer has, then the replication fails with an error similar to: ERROR: requested WAL segment 00000001000000010000002D has 55/96
  • 56. Replication in PostgreSQL - Deep Dive EnterpriseDB already been removed The primary should therefore keep track of how far behind the standby is, and to not delete/recycle WAL files that any standbys still need. This feature is provided through replication slots. Each replication slot has a name which is used to identify the slot. Each slot is associated with: (a) The oldest WAL segment file required by the consumer of the slot. WAL segment files later than this are not deleted/recycled during checkpoints. (b) The oldest transaction ID required to be retained by the consumer of the slot. Rows needed by any transactions later than this are not deleted by vacuum. 56/96
  • 57. Replication in PostgreSQL - Deep Dive EnterpriseDB 9.5 Log Shipping Based - File Level 9.5.1 Setup The setup consists of two CentOS 7 machines on which PostgreSQL version 10.7 is installed. Both systems are loosely coupled, sharing only the WAL archive. 9.5.2 Configuring Replication using Log Shipping Step 1: Disable and stop firewall on both the machines sudo firewall-cmd --state sudo systemctl stop firewalld sudo systemctl disable firewalld sudo systemctl mask --now firewalld Step 2: Create a folder on standby that will archive WALs received form the server sudo mkdir /opt/PostgreSQL/10/from_primary sudo chown postgres:postgres /opt/PostgreSQL/10/from_primary In /etc/passwd change home directory of user postgres to /opt/PostgreSQL/10/from_primary Step 3: Change home directory of postgres user on Primary sudo mkdir /opt/PostgreSQL/10/home sudo chown postgres:postgres /opt/PostgreSQL/10/home/ 57/96 Primary Standby PostgreSQLPostgreSQL WAL Archive archive_command Copies WAL files from pg_wal to Archive restore_command Copies WAL files from Archive to pg_wal
  • 58. Replication in PostgreSQL - Deep Dive EnterpriseDB In /etc/passwd change home directory of user postgres to /opt/PostgreSQL/10/home/ Step 4: Configure password-less ssh & scp between Primary and Standby Login as postgres user on Primary su - postgres Password: Last login: Fri Feb 22 05:54:11 EST 2019 on pts/0 Generate public – private key pair on Primary -bash-4.2$ ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/opt/PostgreSQL/10/home/.ssh/id_rsa): Created directory '/opt/PostgreSQL/10/home/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /opt/PostgreSQL/10/home/.ssh/id_rsa. Your public key has been saved in /opt/PostgreSQL/10/home/.ssh/id_rsa.pub. The key fingerprint is: SHA256:jqjjYf8OcKp4tgtfPcLWG6liAot660/4CLrIRq01BqI [email protected] The key's randomart image is: +---[RSA 2048]----+ | | | | | | |.. | |o + . S | |E. @ + + | |*.O X B . | |OOBX + + | |X@XB=o+ | +----[SHA256]-----+ 58/96
  • 59. Replication in PostgreSQL - Deep Dive EnterpriseDB Copy the key to standby and add it to authorized_keys -bash-4.2$ ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected] /bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/opt/PostgreSQL/10/home/.ssh/id_rsa.pub" The authenticity of host '172.16.214.165 (172.16.214.165)' can't be established. ECDSA key fingerprint is SHA256:VsSASWJWx6v7CvSbH8hjnzX6AFBn0vNimsAj0Wcih84. ECDSA key fingerprint is MD5:ad:0c:42:f1:88:3f:f4:f9:8f:59:bf:e4:85:dc:15:b6. Are you sure you want to continue connecting (yes/no)? /bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed The authenticity of host '172.16.214.165 (172.16.214.165)' can't be established. ECDSA key fingerprint is SHA256:VsSASWJWx6v7CvSbH8hjnzX6AFBn0vNimsAj0Wcih84. ECDSA key fingerprint is MD5:ad:0c:42:f1:88:3f:f4:f9:8f:59:bf:e4:85:dc:15:b6. Are you sure you want to continue connecting (yes/no)? yes /bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys [email protected]'s password: Number of key(s) added: 1 Now try logging into the machine, with: "ssh '[email protected]'" and check to make sure that only the key(s) you wanted were added. Test password less SSH -bash-4.2$ ssh [email protected] Last login: Fri Feb 22 05:53:39 2019 -bash-4.2$ exit logout Connection to 172.16.214.165 closed. -bash-4.2$ Test password less SCP From Primary try this su - postgres -bash-4.2$scp 1.txt [email protected]:/opt/PostgreSQL/10/from_primary 1.txt 100% 3446 3.3MB/s 00:00 59/96
  • 60. Replication in PostgreSQL - Deep Dive EnterpriseDB Check on standby su - postgres -bash-4.2$ pwd /opt/PostgreSQL/10/from_primary -bash-4.2$ ls -l total 4 -rw-r--r--. 1 postgres postgres 3446 Feb 22 08:17 1.txt Step 5: Update the postgresql.conf file on primary wal_level = replica archive_mode = on archive_command = 'if ssh [email protected] test ! -f "/opt/PostgreSQL/10/from_primary/%f" ; then scp %p [email protected]:/opt/PostgreSQL/10/from_primary/; fi' The archive_command will be executed every time a new WAL file is generated. This archive command uses two place holders %p : The complete path of the WAL file along with its name %f : The name of the WAL file This command tests to make sure that the WAL is not present on the standby and if it is not present, it copies the WAL file to the archive folder. Step 6: Create database & tables on primary ./createdb test_db CREATE TABLE student(sid INT PRIMARY KEY, sname VARCHAR(255), saddress VARCHAR(255)); CREATE TABLE teacher(tid INT PRIMARY KEY, tname VARCHAR(255), tsubject VARCHAR(255)); INSERT INTO student VALUES(1, 'Edward', 'Main Campus'); INSERT INTO student VALUES(2, 'Linda', 'Girls Hostel'); INSERT INTO student VALUES(3, 'Jason', 'Boys Hostel'); INSERT INTO teacher VALUES(1, 'Gary', 'Physics'); INSERT INTO teacher VALUES(2, 'Karen', 'Maths'); INSERT INTO teacher VALUES(3, 'Carol', 'History'); 60/96
  • 61. Replication in PostgreSQL - Deep Dive EnterpriseDB Step 7: Take base backup using the command ./pg_basebackup --pgdata=/opt/PostgreSQL/10/for_standby/ --format=p --write-recovery-conf --checkpoint=fast --label=for_test --progress --verbose --host=localhost --port=5432 --username=postgres Password: pg_basebackup: initiating base backup, waiting for checkpoint to complete pg_basebackup: checkpoint completed pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1 pg_basebackup: starting background WAL receiver 32578/32578 kB (100%), 1/1 tablespace pg_basebackup: write-ahead log end point: 0/20000F8 pg_basebackup: waiting for background process to finish streaming ... pg_basebackup: base backup completed --pgdata : Target folder for the base backup --format : plain Step 8: Modify the recovery.conf in the base backup standby_mode = 'on' restore_command = 'cp "/opt/PostgreSQL/10/from_primary/%f" "%p"' The restore_command is invoked by the standby server periodically. Our restore command copies the newly arrived WAL file to the pg_wal folder of the stand by server. Step 9: Transfer the base backup to the standby server sudo mkdir /opt/PostgreSQL/10/bb_data/ sudo mv /tmp/for_standby.tar.gz /opt/PostgreSQL/10/bb_data/ sudo chown postgres:postgres /opt/PostgreSQL/10/bb_data/ sudo chown postgres:postgres /opt/PostgreSQL/10/bb_data/for_standby.tar.gz sudo chmod 700 /opt/PostgreSQL/10/bb_data/ sudo chmod 700 /opt/PostgreSQL/10/bb_data/for_standby Step 10: Unzip base backup su – postgres cd bb_data/ -bash-4.2$ ls -l total 3840 61/96
  • 62. Replication in PostgreSQL - Deep Dive EnterpriseDB -rw-r--r--. 1 postgres postgres 3930538 Feb 23 01:40 for_standby.tar.gz -bash-4.2$ tar -xvf for_standby.tar.gz Step 11: Start the stand by server -bash-4.2$ ./postgres -D ../bb_data/for_standby/ -p 5432 Step 12: Test standby server ./psql -p 5432 test_db -U postgres Password for user postgres: psql.bin (10.7) Type "help" for help. test_db=# d+ List of relations Schema | Name | Type | Owner | Size | Description --------+---------+-------+----------+-------+------------- public | student | table | postgres | 16 kB | public | teacher | table | postgres | 16 kB | (2 rows) test_db=# select * from student; sid | sname | saddress -----+--------+-------------- 1 | Edward | Main Campus 2 | Linda | Girls Hostel 3 | Jason | Boys Hostel (3 rows) test_db=# select * from teacher; tid | tname | tsubject -----+-------+---------- 1 | Gary | Physics 2 | Karen | Maths 3 | Carol | History (3 rows) test_db=# insert into student values(4, 'any'); ERROR: cannot execute INSERT in a read-only transaction 62/96
  • 63. Replication in PostgreSQL - Deep Dive EnterpriseDB Step 13: Restart primary and create a few new tables in the test database CREATE TABLE test_tab AS SELECT * FROM GENERATE_SERIES(1, 100000) AS id; SELECT 100000 CREATE TABLE another_tab AS SELECT * FROM GENERATE_SERIES(1, 100000) AS id; SELECT 100000 Step 14: Force a WAL file switch test_db=# select pg_switch_wal(); pg_switch_wal --------------- 0/3C58940 (1 row) Step 15: Check WAL file on standby -bash-4.2$ pwd /opt/PostgreSQL/10/from_primary -bash-4.2$ ls -l total 16388 -rw-------. 1 postgres postgres 16777216 Feb 23 02:08 000000010000000000000003 -rw-r--r--. 1 postgres postgres 3446 Feb 22 08:17 1.txt -bash-4.2$ pwd /opt/PostgreSQL/10/bb_data/for_standby/pg_wal -bash-4.2$ ls -l total 32768 -rw-------. 1 postgres postgres 16777216 Feb 22 22:05 000000010000000000000002 -rw-------. 1 postgres postgres 16777216 Feb 23 02:08 000000010000000000000003 drwx------. 2 postgres postgres 43 Feb 23 02:08 archive_status Step 16: Check the tables on the standby test_db=# d+ List of relations Schema | Name | Type | Owner | Size | Description --------+-------------+-------+----------+---------+------------- public | another_tab | table | postgres | 3568 kB | 63/96
  • 64. Replication in PostgreSQL - Deep Dive EnterpriseDB public | student | table | postgres | 16 kB | public | teacher | table | postgres | 16 kB | public | test_tab | table | postgres | 3568 kB | (4 rows) 9.5.3 Steps to perform Failover Step 1: Simulate a primary server problem Stop the server Step 2: Promote the standby -bash-4.2$ ./pg_ctl promote -D ../bb_data/for_standby/ waiting for server to promote.... done server promoted Step 3: Check standby [abbas@localhost bin]$ ./psql -p 5432 test_db -U postgres Password for user postgres: psql.bin (10.7) Type "help" for help. test_db=# insert into student values(4, 'any'); INSERT 0 1 test_db=# 64/96
  • 65. Replication in PostgreSQL - Deep Dive EnterpriseDB 9.6 Log Shipping Based - Block Level 9.6.1 Physical Streaming Replication In streaming replication the standby server connects to the primary server and receives WAL records using a replication protocol. This provides two advantages: 1. The standby server does not need to wait for the WAL file to fill up, hence replication lag is improved. 2. The dependency on the user provided script and an intermediate shared storage between the servers is removed. 9.6.2 WAL Sender & WAL Receiver A process called WAL receiver, running on the standby server connects, using the connection details provided in primary_conninfo parameter of recovery.conf, to the primary server using a TCP/IP connection. In the primary server another process called WAL sender, is in charge of sending the WAL records to the standby server as and when they are generated. WAL receiver saves the WAL records in WAL as if they were generated by client activity of locally connected clients. Once WAL records reach WAL segment files the stand by server constantly keeps replaying the WAL so that standby and primary are up to date. 65/96 PostgreSQL Primary Primary Standby PostgreSQL Standby WAL Sender WAL Receiver WAL Records WAL Records W1 W2 W3 W4 WAL Records W1 W2 W3 W4
  • 66. Replication in PostgreSQL - Deep Dive EnterpriseDB 9.6.3 WAL Streaming Protocol Details 66/96 Start up Request What is server's authentication scheme? While we are asking this question please note Parameter Name Parameter Value user postgres database replication replication true ← Instructs to start WAL Sender process for this client application_name walreceiver Server is expecting password in MD5 format 52 00 00 00 0c 00 00 00 05 04 43 16 6a Authentication Request Length md5 password salt generated by server Password response 70 00 00 00 0b md5b094d71396249f3ca84a23b86d4ee7b9 Password response Length MD5 Password terminated by null MD5 password is computed by md5(md5(password || username), salt) Standby Server WAL Receiver Primary Server WAL Sender Authentication Reply 52 00 00 00 08 00 00 00 00 Authentication Reply Length User authenticated Status Parameters 'S'|Length 4 bytes|Param Name | Param Value Simple Query : IDENTIFY_SYSTEM Query Response WAL Receiver verifies that the systemid in response is same as in base backup systemid timeline logpos dbname 6661510093306984809 1 0/3000140 Simple Query : START_REPLICATION SLOT "node_a_slot" 0/3000000 TIMELINE 1 WAL Data as CopyData messages 'd'|Length 4 bytes| WAL Data Query Response Server responds with CopyBothResponse ‘W’, and starts to stream WAL 'W'|Length 4 bytes|COPY format is Textual | Copy Data has 0 columns
  • 67. Replication in PostgreSQL - Deep Dive EnterpriseDB 9.6.4 Setup The setup consists of two CentOS 7 machines connected via LAN on which PostgreSQL version 10.7 is installed. 9.6.5 Configuring PostgreSQL Replication using WAL Streaming Step 1: Disable and stop firewall on both the machines sudo firewall-cmd --state sudo systemctl stop firewalld sudo systemctl disable firewalld sudo systemctl mask --now firewalld Step 2: On primary allow replication connections & connections from the same network. Modify pg_hba.conf. Local all all md5 host all all 172.16.214.167/24 md5 host all all ::1/128 md5 local replication all md5 host replication all 172.16.214.167/24 md5 host replication all ::1/128 md5 Step 3: On primary edit postgresql.conf to modify the following parameters max_wal_senders = 10 wal_level = replica max_replication_slots = 10 synchronous_commit = on synchronous_standby_names = '*' listen_addresses = '*' 67/96
  • 68. Replication in PostgreSQL - Deep Dive EnterpriseDB Step 4: Start the primary server ./postgres -D ../pr_data -p 5432 Step 5: Take base backup to boost strap the stand by server ./pg_basebackup --pgdata=/tmp/sb_data/ --format=p --write-recovery-conf --checkpoint=fast --label=mffb --progress --verbose --host=172.16.214.167 --port=5432 --username=postgres Step 6: Check the base backup label file START WAL LOCATION: 0/2000028 (file 000000010000000000000002) CHECKPOINT LOCATION: 0/2000060 BACKUP METHOD: streamed BACKUP FROM: master START TIME: 2019-02-24 05:25:30 EST LABEL: mffb Step 7: In the base backup, add the following line in the recovery.conf primary_slot_name = 'node_a_slot' Step 8: Check the /tmp/sb_data/recovery.conf file standby_mode = 'on' primary_conninfo = 'user=enterprisedb password=abc123 host=172.16.214.167 port=5432 68/96
  • 69. Replication in PostgreSQL - Deep Dive EnterpriseDB sslmode=prefer sslcompression=1 krbsrvname=postgres target_session_attrs=any' primary_slot_name = 'node_a_slot' Step 9: Connect to the primary server and issue this command edb=# SELECT * FROM pg_create_physical_replication_slot('node_a_slot'); slot_name | xlog_position -------------+--------------- node_a_slot | (1 row) edb=# SELECT slot_name, slot_type, active FROM pg_replication_slots; slot_name | slot_type | active -------------+-----------+-------- node_a_slot | physical | f (1 row) Step 10: Transfer the base backup to the standby server scp /tmp/sb_data.tar.gz [email protected]:/tmp sudo mv /tmp/sb_data /opt/PostgreSQL/10/ sudo chown postgres:postgres /opt/PostgreSQL/10/sb_data/ sudo chown -R postgres:postgres /opt/PostgreSQL/10/sb_data/ sudo chmod 700 /opt/PostgreSQL/10/sb_data/ Step 11: Start the stand-by server ./postgres -D ../sb_data/ -p 5432 The primary will show this in log LOG: standby "walreceiver" is now a synchronous standby with priority 1 69/96
  • 70. Replication in PostgreSQL - Deep Dive EnterpriseDB The standby will show LOG: database system was interrupted; last known up at 2018-10-24 15:49:55 LOG: entering standby mode LOG: redo starts at 0/3000028 LOG: consistent recovery state reached at 0/30000F8 LOG: started streaming WAL from primary at 0/4000000 on timeline 1 Step 12: Connect to primary server and issue some simple commands -bash-4.2$ ./edb-psql -p 9666 edb Password: psql.bin (9.6.10.17) Type "help" for help. create table abc(a int, b varchar(250)); insert into abc values(1,'One'); insert into abc values(2,'Two'); insert into abc values(3,'Three'); Step 13: Check data on slave ./psql -p 5432 -U postgres postgres Password for user postgres: psql.bin (10.7) Type "help" for help. postgres=# select * from abc; a | b ---+------- 1 | One 2 | Two 3 | Three (3 rows) 70/96
  • 71. Replication in PostgreSQL - Deep Dive EnterpriseDB 9.6.6 Steps to perform Failover Step 1: Crash the primary server Step 2: Promote the stand by server ./pg_ctl promote -D ../sb_data/ server promoting Step 3: Connect to the promoted stand by server and insert a row -bash-4.2$ ./edb-psql -p 9777 edb Password: psql.bin (9.6.10.17) Type "help" for help. edb=# insert into abc values(4,'Four'); 71/96
  • 72. Replication in PostgreSQL - Deep Dive EnterpriseDB 9.7 Logical Decoding Based 9.7.1 What is Logical Replication Physical Streaming replication as described in section 9.6 creates byte-by-byte read-only replica of the primary server. The replica contains all databases, tables, roles, tablespaces etc. With streaming replication we get all or nothing. What if we want a replica of only a single table? This is where Logical Replication comes into play. Logical replication can replay DML operations happening on a subset of tables in a primary server on a stand-by server by: *) Logically decoding WAL records *) Streaming them over to stand-by server *) Apply them to the table in the standby server in the correct transnational order 72/96
  • 73. Replication in PostgreSQL - Deep Dive EnterpriseDB 9.7.2 Comparison of Physical and Logical Replication Feature Physical Replication Logical Replication Replica will be read only Yes No Replica will contain every thing Yes No Replica can contain a subset of data in the primary No Yes Triggers will fire on DML operations No Yes Will work across different PostgreSQL versions No Yes Will work across different Operating Systems No Yes Table in the standby can have extra columns, indexes or security No Yes DML operations are possible on tables in stand-by No Yes Will work even if table has no primary key Yes No DDL commands are replicated to standby Yes No Sequence data is replicated to standby Yes No TRUNCATE command is replicated to the standby Yes No Large objects are replicated to the standby Yes No Constraint validation is performed on standby No Yes Standby needs base backup of the primary server Yes No DML operations can be filtered before sending to standby No Yes Tables have to be created with the same name on standby manually No Yes 73/96
  • 74. Replication in PostgreSQL - Deep Dive EnterpriseDB 9.7.3 Publication & Subscription Logical replication defines two entities: A publisher and a subscriber. A publisher is a node that defines a certain group of tables, called a publication, to which a subscriber can subscribe, by creating a subscription, to receive the changes to that particular group of tables. 74/96 PostgreSQL Publisher Subscriber PostgreSQL Apply Decoded & Filtered WAL Records WAL Records W1 W2 W3 W4 WAL Records W1 W2 W3 W4 Logical Decoding Plugin WAL Sender Tables Tables Publication Subscription
  • 75. Replication in PostgreSQL - Deep Dive EnterpriseDB 9.7.4 Logical Decoding Plugin In order to transform WAL internal representation to any format that can be used by the client, a plugin can be installed into PostgreSQL. The plugin is supposed to implement well defined call back functions which are called by the logical decoding framework at appropriate time to allow the plugin to perform the format conversion. A plugin for example can convert WAL records to SQL statements: 9.7.5 Logical replication slots Logical replication slots are supposed to be consumed by logical replication. Physical replication slots, being physical, work at the cluster level and are used to stream cluster level changes to the standby. Logical replication slots on the other hand stream sequence of changes from a single database. Each logical slot needs a decoding plugin that will be used to transform the WAL records to a format required by the consumer. 9.7.6 test_decoding and pg_recvlogical test_decoding is an example decoding plugin that is provided with PostgreSQL and pg_recvlogical is an example of utility that can be used to receive changes from a logical replication slot. Lets see them both in action: Step 1: Make the following changes in postgresq.conf on primary server wal_level = logical max_replication_slots = 10 listen_addresses = '*' log_connections = on log_disconnections = on log_statement = 'all' log_replication_commands = on Step 2: Make the following changes in pg_hba.conf on the primary server host all all 172.16.214.167/24 trust host replication all 172.16.214.167/24 trust 75/96 WAL Records W1 W2 W3 W4 Plugin SQL Statements INSERT INTO tab VALUES(1,2) UPDATE tab set b = 10; ……
  • 76. Replication in PostgreSQL - Deep Dive EnterpriseDB Step 3: Create a database on the primary server ./createdb -p 7654 mydb -U postgres Step 4: Connect the client with the primary server ./psql -p 7654 mydb -U postgres psql.bin (10.7) Type "help" for help. mydb=# SELECT pg_current_wal_lsn(); pg_current_wal_lsn -------------------- 0/16998C0 (1 row) mydb=#SELECT * FROM pg_create_logical_replication_slot('my_slot', 'test_decoding'); slot_name | lsn -----------+----------- my_slot | 0/1699930 (1 row) mydb=# select * from pg_replication_slots; slot_name | plugin | slot_type | datoid | database | temporary | -----------+---------------+-----------+--------+----------+-----------+ my_slot | test_decoding | logical | 16384 | mydb | f | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn --------+------------+------+--------------+-------------+--------------------- f | | | 556 | 0/16998F8 | 0/1699930 This replication slot is asking 1) VACUUM should not remove catalog tuples deleted by any transaction later than 556. 2) The consumer of this replication slot needs all segments including and after 0/16998F8 3) The consumer of this logical replication slot has confirmed receiving data upto and before 0/1699930. Most of the time a slot will require older WAL (i.e. restart_lsn) than the confirmed position (i.e. confirmed_flush_lsn). The flush position is just a marker saved by the consumer, 76/96
  • 77. Replication in PostgreSQL - Deep Dive EnterpriseDB the actual WALs required is always determined by restart_lsn. If this is the first slot being created in the cluster then the restart_lsn will be the current WAL LSN at the time when this slot was created. Step 5: On stand by start pg_recvlogical utility ./pg_recvlogical --slot=my_slot --verbose -d mydb -h 172.16.214.167 -p 7654 -U postgres --start -f - pg_recvlogical:starting log streaming at 0/0 (slot my_slot) pg_recvlogical:streaming initiated pg_recvlogical:confirming write up to 0/0, flush to 0/0 (slot my_slot) pg_recvlogical:confirming write up to 0/1699930, flush to 0/1699930 (slot my_slot) pg_recvlogical:confirming write up to 0/1699930, flush to 0/1699930 (slot my_slot) Step 6: On Primary create a table create table test(a varchar(10)); Step 7: Check the output of pg_recvlogical BEGIN 556 COMMIT 556 pg_recvlogical: confirming write up to 0/16B0580, flush to 0/16B0580 (slot my_slot) Step 8: Insert a few rows in the table on primary mydb=# insert into test values('qaz'); mydb=# insert into test values('wsx'); 77/96 WAL_1 WAL_2 WAL_3 WAL_4 WAL_5 WAL_6 restart_lsn confirmed_flush_lsn Required WALsWALs Not Required
  • 78. Replication in PostgreSQL - Deep Dive EnterpriseDB mydb=# insert into test values('edc'); Step 9: Check the output of pg_recvlogical BEGIN 557 table public.test: INSERT: a[character varying]:'qaz' COMMIT 557 pg_recvlogical: confirming write up to 0/16B0628, flush to 0/16B0628 (slot my_slot) pg_recvlogical: confirming write up to 0/16B0660, flush to 0/16B0660 (slot my_slot) BEGIN 558 table public.test: INSERT: a[character varying]:'wsx' COMMIT 558 pg_recvlogical: confirming write up to 0/16B06D0, flush to 0/16B06D0 (slot my_slot) BEGIN 559 table public.test: INSERT: a[character varying]:'edc' COMMIT 559 pg_recvlogical: confirming write up to 0/16B0778, flush to 0/16B0778 (slot my_slot) Step 10: Update rows in the table on primary update test set a = 'tgb'; Step 11: Check the output of pg_recvlogical BEGIN 560 table public.test: UPDATE: a[character varying]:'tgb' table public.test: UPDATE: a[character varying]:'tgb' table public.test: UPDATE: a[character varying]:'tgb' COMMIT 560 pg_recvlogical: confirming write up to 0/16B08B8, flush to 0/16B08B8 (slot my_slot) Step 12: Delete rows in the table on primary delete from test; Step 13: Check the output of pg_recvlogical BEGIN 561 table public.test: DELETE: (no-tuple-data) table public.test: DELETE: (no-tuple-data) table public.test: DELETE: (no-tuple-data) COMMIT 561 pg_recvlogical: confirming write up to 0/16B0990, flush to 0/16B0990 (slot my_slot) 78/96
  • 79. Replication in PostgreSQL - Deep Dive EnterpriseDB Step 14: Check the slot on primary, note that restart_lsn has been advanced mydb=# select * from pg_replication_slots; slot_name | plugin | slot_type | datoid | database | temporary | -----------+---------------+-----------+--------+----------+-----------+ my_slot | test_decoding | logical | 16384 | mydb | f | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn --------+------------+------+--------------+-------------+--------------------- f | | | 566 | 0/16B0DB8 | 0/16B0DF0 79/96
  • 80. Replication in PostgreSQL - Deep Dive EnterpriseDB 9.7.7 Setup The setup consists of two CentOS 7 machines connected via LAN on which PostgreSQL version 10.7 is installed. 9.7.8 Configuring PostgreSQL Replication using Logical Decoding Step 1:Disable and stop firewall on both the machines sudo firewall-cmd --state sudo systemctl stop firewalld sudo systemctl disable firewalld sudo systemctl mask --now firewalld Step 2: On primary allow replication connections & connections from the same network. Modify pg_hba.conf. Local all all trust host all all 172.16.214.167/24 trust host all all ::1/128 trust local replication all trust host replication all 172.16.214.167/24 trust host replication all ::1/128 trust Step 3: On publisher edit postgresql.conf to modify the following parameters max_wal_senders = 10 wal_level = logical max_replication_slots = 10 listen_addresses = '*' log_connections = on log_disconnections = on log_statement = 'all' log_replication_commands = on Step 4: Start the publisher server ./postgres -D /tmp/data/ -p 5432 80/96
  • 81. Replication in PostgreSQL - Deep Dive EnterpriseDB Step 5: Create a database on the publisher server’s ./createdb -p 5432 -U postgres src_db Step 6: Connect to the publisher server and create a table with some rows create table t1 (id integer primary key, val text); create user replicant with replication; grant select on t1 to replicant; insert into t1 (id, val) values (10, 'ten'), (20, 'twenty'), (30, 'thirty'); Step 7: Create the publication on the publisher server create publication pub1 for table t1; Step 8: Start the subscriber ./postgres -D /tmp/data/ -p 5432 Step 9: Create the database on the subscriber ./createdb -p 5432 -U postgres dst_db Step 10: Connect to the subscriber server and create the table with an additional column create table t1 (id integer primary key, val text, val2 text); Step 11: Create the subscription create subscription sub1 connection 'host=172.16.214.167 port=5432 dbname=src_db user=replicant' publication pub1; 81/96
  • 82. Replication in PostgreSQL - Deep Dive EnterpriseDB Step 12: Check the data in the subscribed table dst_db=# select * from t1; id | val | val2 ----+--------+------ 10 | ten | 20 | twenty | 30 | thirty | (3 rows) 82/96
  • 83. Replication in PostgreSQL - Deep Dive EnterpriseDB 9.7.9 Logical Replication Protocol Details 83/96 Start up Request What is server's authentication scheme? While we are asking this question please note Parameter Name Parameter Value user replicant database src_db replication database ← The connection goes into logical replication mode application_name sub1 Subscriber Publisher Authentication Reply 52 00 00 00 08 00 00 00 00 Authentication Reply Length User authenticated Status Parameters 'S'|Length 4 bytes|Param Name | Param Value SELECT DISTINCT t.schemaname, t.tablename FROM pg_catalog.pg_publication_tables t WHERE t.pubname IN ('pub1') schemaname | tablename ------------+------------ public | t1 CREATE_REPLICATION_SLOT "sub1" LOGICAL pgoutput NOEXPORT_SNAPSHOT slot_name | consistent_point | snapshot_name | output_plugin ----------+------------------+---------------+-------------- sub1 | 0/16B9EA8 | | pgoutput Disconnect
  • 84. Replication in PostgreSQL - Deep Dive EnterpriseDB 84/96 Start up Request What is server's authentication scheme? While we are asking this question please note Parameter Name Parameter Value user replicant database src_db replication database ← The connection goes into logical replication mode application_name sub1 Subscriber Publisher Authentication Reply 52 00 00 00 08 00 00 00 00 Authentication Reply Length User authenticated Status Parameters 'S'|Length 4 bytes|Param Name | Param Value systemid | timeline | xlogpos | dbname --------------------+-----------+-----------+-------------- 6664876364497978284 | 1 | 0/16B9EA8 | src_db START_REPLICATION SLOT "sub1" LOGICAL 0/0 (proto_version '1', publication_names '"pub1"') Decoded WAL Data as CopyData messages 'd'|Length 4 bytes| Copy Data Server responds with CopyBothResponse ‘W’ 'W'|Length 4 bytes|COPY format is Textual | Copy Data has 0 columns Simple Query : IDENTIFY_SYSTEM
  • 85. Replication in PostgreSQL - Deep Dive EnterpriseDB 85/96 Start up Request What is server's authentication scheme? While we are asking this question please note Parameter Name Parameter Value user replicant database src_db replication database ← The connection goes into logical replication mode application_name sub1_16393_sync_16385 Subscriber Publisher Authentication Reply 52 00 00 00 08 00 00 00 00 Authentication Reply Length User authenticated Status Parameters 'S'|Length 4 bytes|Param Name | Param Value Command Completed and Transaction started CREATE_REPLICATION_SLOT "sub1_16393_sync_16385" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT BEGIN READ ONLY ISOLATION LEVEL REPEATABLE READ slot_name | consistent_point | snapshot_name | output_plugin -----------------------+------------------+---------------+-------------- sub1_16393_sync_16385 | 0/16B9EE0 | | pgoutput SELECT c.oid, c.relreplident FROM pg_catalog.pg_class c INNER JOIN pg_catalog.pg_namespace n ON (c.relnamespace = n.oid) WHERE n.nspname = 'public' AND c.relname = 't1' AND c.relkind = 'r' oid | relreplident --------+------------------- 16385 | d (primary key) Cont. on next page
  • 86. Replication in PostgreSQL - Deep Dive EnterpriseDB 86/96 Subscriber Publisher SELECT a.attname, a.atttypid, a.atttypmod, a.attnum = ANY(i.indkey) FROM pg_catalog.pg_attribute a LEFT JOIN pg_catalog.pg_index i ON (i.indexrelid = pg_get_replica_identity_index(16385)) WHERE a.attnum > 0::pg_catalog.int2 AND NOT a.attisdropped AND a.attrelid = 16385 ORDER BY a.attnum attname | atttypid | atttypmod | ?column? --------+-----------+-----------+---------- id | 23 | -1 | t val | 25 | -1 | f COPY public.t1 TO STDOUT 10 ten 20 twenty 30 thirty COMMIT Transaction Complete Disconnect
  • 87. Replication in PostgreSQL - Deep Dive EnterpriseDB 9.8 Statement Based 9.8.1 Introduction to pgpool-II PgPool-II is a middle-ware system that sits between PostgreSQL servers and clients to provide the following features: • Connection Pooling • Replication & Load Balancing • Automated Failover We are going to focus on the replication feature provided by pgPool-II When used to replicate data pgPool receives the INSERT command from the client and sends the command enclosed in a BEGIN-COMMIT block to all the PostgreSQL servers under it. 87/96 Client Application PgPool II PostgreSQL PostgreSQL INSERT INTO my_tab VALUES(1, ‘One’); BEGIN INSERT INTO my_tab VALUES(1, ‘One’); COMMIT BEGIN INSERT INTO my_tab VALUES(1, ‘One’); COMMIT
  • 88. Replication in PostgreSQL - Deep Dive EnterpriseDB 9.8.2 Setup The setup consists of two CentOS 7 machines on which PostgreSQL 10.7 installed. On one of the machines pgpool-II version 3.6.15 (subaruboshi) is also installed. 9.8.3 Configuring PostgreSQL replication using pgpool-II Step 1: Modify the postgresql.conf files of both the PostgreSQL instances listen_addresses = '*' logging_collector = off log_connections = on log_disconnections = on log_statement = 'all' Step 2: Modify the pg_hba.conf files of both the PostgreSQL instances host all all 172.16.214.173/24 trust Step 3: Modify the pgpool.conf cd /opt/edb/pgpool3.6/etc cp pgpool.conf.sample pgpool.conf && vim pgpool.conf listen_addresses = '*' backend_hostname0 = '172.16.214.173' backend_port0 = 5432 backend_weight0 = 1 backend_data_directory0 = '/data0' backend_flag0 = 'ALLOW_TO_FAILOVER' backend_hostname1 = '172.16.214.172' backend_port1 = 5432 backend_weight1 = 1 backend_data_directory1 = '/data1' backend_flag1 = 'ALLOW_TO_FAILOVER' replication_mode = on fail_over_on_backend_error = on 88/96
  • 89. Replication in PostgreSQL - Deep Dive EnterpriseDB Step 4: Generate md5 for the password /opt/edb/pgpool3.6/bin/pg_md5 abc123 e99a18c428cb38d5f260853678922e03 Step 5: Modify the pcp.conf cp pcp.conf.sample pcp.conf && vim pcp.conf postgres:e99a18c428cb38d5f260853678922e03 Step 6: Start both the servers ./postgres -D ../data ./postgres -D ../data Step 7: Start pgpool ./pgpool -n -f /opt/edb/pgpool3.6/etc/pgpool.conf -F /opt/edb/pgpool3.6/etc/pcp.conf Step 8: Create database ./createdb -p 9999 test_pgp -U postgres Note we are connecting with pgPool Step 9: Check database server logs For First Server (172.16.214.173) [104386] LOG: connection received: host=172.16.214.173 port=57524 [104386] LOG: connection authorized: user=postgres database=postgres [104386] LOG: statement: SELECT pg_catalog.set_config('search_path', '', false) [104386] LOG: statement: CREATE DATABASE test_pgp; [104386] LOG: statement: DISCARD ALL [104386] LOG: disconnection: session time: 0:00:00.787 user=postgres database=postgres host=172.16.214.173 port=57524 For Second Server (172.16.214.172) [12363] LOG: connection received: host=172.16.214.173 port=42138 [12363] LOG: connection authorized: user=postgres database=postgres [12363] LOG: statement: CREATE DATABASE test_pgp; [12363] LOG: statement: DISCARD ALL [12363] LOG: disconnection: session time: 0:00:00.704 user=postgres database=postgres 89/96
  • 90. Replication in PostgreSQL - Deep Dive EnterpriseDB host=172.16.214.173 port=42138 Step 10: Create a new table ./psql -p 9999 test_pgp -U postgres Note we are connecting with pgPool create table my_tab(a int primary key, b varchar(10)); Step 11: Check server log For First Server (172.16.214.173) [107539] LOG: connection received: host=172.16.214.173 port=57528 [107539] LOG: connection authorized: user=postgres database=test_pgp [107539] LOG: statement: BEGIN [107539] LOG: statement: create table my_tab(a int primary key, b varchar(10)); [107539] LOG: statement: COMMIT For Second Server (172.16.214.172) [12400] LOG: connection received: host=172.16.214.173 port=42142 [12400] LOG: connection authorized: user=postgres database=test_pgp [12400] LOG: statement: BEGIN [12400] LOG: statement: create table my_tab(a int primary key, b varchar(10)); [12400] LOG: statement: COMMIT Step 12: Insert rows in the table insert into my_tab values(1,'One'); insert into my_tab values(2,'Two'); insert into my_tab values(3,'Three'); Step 13: Check server log For First Server (172.16.214.173) [107539] LOG: statement: BEGIN [107539] LOG: statement: SELECT count(*) from ( SELECT has_function_privilege ( 'postgres', 'pg_catalog.to_regclass(cstring)','execute' ) WHERE EXISTS ( SELECT * FROM pg_catalog.pg_proc AS p WHERE p.proname = 'to_regclass' ) 90/96
  • 91. Replication in PostgreSQL - Deep Dive EnterpriseDB ) AS s [107539] LOG: statement: SELECT count(*) FROM pg_catalog.pg_attrdef AS d, pg_catalog.pg_class AS c WHERE d.adrelid = c.oid AND d.adsrc ~ 'nextval' AND c.oid = pg_catalog.to_regclass('"my_tab"') [107539] LOG: statement: SELECT attname, d.adsrc as default_value, coalesce ( ( d.adsrc LIKE '%now()%' OR d.adsrc LIKE '%''now''::text%' OR d.adsrc LIKE '%CURRENT_TIMESTAMP%' OR d.adsrc LIKE '%CURRENT_TIME%' OR d.adsrc LIKE '%CURRENT_DATE%' OR d.adsrc LIKE '%LOCALTIME%' OR d.adsrc LIKE '%LOCALTIMESTAMP%' ) AND ( a.atttypid = 'timestamp'::regtype::oid OR a.atttypid = 'timestamp with time zone'::regtype::oid OR a.atttypid = 'date'::regtype::oid OR a.atttypid = 'time'::regtype::oid OR a.atttypid = 'time with time zone'::regtype::oid ) , false ) FROM pg_catalog.pg_class c, pg_catalog.pg_attribute a LEFT JOIN pg_catalog.pg_attrdef d ON (a.attrelid = d.adrelid AND a.attnum = d.adnum) WHERE c.oid = a.attrelid AND a.attnum >= 1 AND a.attisdropped = 'f' AND c.oid = to_regclass('"my_tab"') ORDER BY a.attnum [107539] LOG: statement: insert into my_tab values(1,'One'); [107539] LOG: statement: COMMIT [107539] LOG: statement: BEGIN [107539] LOG: statement: insert into my_tab values(2,'Two'); [107539] LOG: statement: COMMIT [107539] LOG: statement: BEGIN [107539] LOG: statement: insert into my_tab values(3,'Three'); 91/96 What are attribute names? What are their default values if any? Does any column has default value now()? attname | default_value | coalesce ---------+---------------+---------- a | | f b | | f
  • 92. Replication in PostgreSQL - Deep Dive EnterpriseDB [107539] LOG: statement: COMMIT For Second Server (172.16.214.172) [12400] LOG: statement: BEGIN [12400] LOG: statement: insert into my_tab values(1,'One'); [12400] LOG: statement: COMMIT [12400] LOG: statement: BEGIN [12400] LOG: statement: insert into my_tab values(2,'Two'); [12400] LOG: statement: COMMIT [12400] LOG: statement: BEGIN [12400] LOG: statement: insert into my_tab values(3,'Three'); [12400] LOG: statement: COMMIT Step 14: Try an update statement UPDAYTE my_tab SET b = 'threee' WHERE b like 'Three'; Step 15: Check server log For First Server (172.16.214.173) [107539] LOG: statement: BEGIN [107539] LOG: statement: update my_tab set b = 'threee' where b like 'Three'; [107539] LOG: statement: COMMIT For First Server (172.16.214.172) [12400] LOG: statement: BEGIN [12400] LOG: statement: update my_tab set b = 'threee' where b like 'Three'; [12400] LOG: statement: COMMIT Step 16: Select data from the table test_pgp=# select * from my_tab; a | b ---+-------- 1 | One 2 | Two 3 | threee (3 rows) 92/96
  • 93. Replication in PostgreSQL - Deep Dive EnterpriseDB Step 17: Check server log and observe load balancing For First Server (172.16.214.173) [107539] LOG: statement: select * from my_tab; For First Server (172.16.214.172) (Nothing) Step 18: Check node status test_pgp=# show pool_nodes; node_id | hostname | port | status | lb_weight | ---------+----------------+------+--------+-----------+ 0 | 172.16.214.173 | 5432 | up | 0.500000 | 1 | 172.16.214.172 | 5432 | up | 0.500000 | role | select_cnt | load_balance_node | replication_delay --------+------------+-------------------+------------------- master | 5 | true | 0 slave | 0 | false | 0 (2 rows) Step 19: Stop the master server i.e. 172.16.214.173 Step 20: Run the select query test_pgp=# select * from my_tab; FATAL: unable to read data from DB node 0 DETAIL: EOF encountered with backend server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. 93/96
  • 94. Replication in PostgreSQL - Deep Dive EnterpriseDB Step 21: Run the select query again test_pgp=# select * from my_tab; a | b ---+------- 1 | One 2 | Two 3 | threee (3 rows) Step 22: Check node status test_pgp=# show pool_nodes; node_id | hostname | port | status | lb_weight | ---------+----------------+------+--------+-----------+ 0 | 172.16.214.173 | 5432 | down | 0.500000 | 1 | 172.16.214.172 | 5432 | up | 0.500000 | role | select_cnt | load_balance_node | replication_delay --------+------------+-------------------+------------------- slave | 5 | false | 0 master | 1 | true | 0 (2 rows) Step 23: Create another table and insert a row in it create table time_test(a timestamp); insert into time_test values(now()); Step 24: Check server log and observe how pgPool translated now() to real time For First Server (172.16.214.173) [107539] LOG: statement: BEGIN [107539] LOG: statement: create table time_test(a timestamp); [107539] LOG: statement: COMMIT [107539] LOG: statement: BEGIN [107539] LOG: statement: SELECT now() 94/96
  • 95. Replication in PostgreSQL - Deep Dive EnterpriseDB [107539] LOG: statement: INSERT INTO "time_test" VALUES ("pg_catalog"."timestamptz" ('2019-03-14 04:36:22.324674-04'::text)) [107539] LOG: statement: COMMIT For First Server (172.16.214.172) [12400] LOG: statement: BEGIN [12400] LOG: statement: create table time_test(a timestamp); [12400] LOG: statement: COMMIT [12400] LOG: statement: BEGIN [12400] LOG: statement: INSERT INTO "time_test" VALUES ("pg_catalog"."timestamptz" ('2019-03-14 04:36:22.324674-04'::text)) [12400] LOG: statement: COMMIT 95/96
  • 96. Replication in PostgreSQL - Deep Dive EnterpriseDB 9.9 Other possibilities 9.9.1 EDB xDB Replication Server EDB xDB (cross database) Replication Server is an asynchronous replication system available for PostgreSQL based on publish subscribe model. xDB Replication Server can be used to implement replication systems based on either of two different replication models • Single-master (master-to-slave) replication • Multi-master replication. The following are the combinations of cross database replications that xDB Replication Server supports for single-master replication: Master Database Slave Database Oracle PostgreSQL Oracle EDB Postgres SQL Server PostgreSQL SQL Server EDB Postgres PostgreSQL SQL Server PostgreSQL EDB Postgres EDB Postgres SQL Server EDB Postgres Oracle EDB Postgres PostgreSQL For multi-master replication, xDB Replication Server supports the following servers: Master Database PostgreSQL EDB Postgres XDB replication server can either use trigger based method or logical decoding based model to perform replication. 96/96