Replication in PostgreSQL tutorial given in Postgres Conference 2019

Replication in PostgreSQL - Deep Dive EnterpriseDB
Table of Contents
1 Objectives........................................................................................................................................................3
2 Presenter...........................................................................................................................................................3
3 What is Replication..........................................................................................................................................4
4 Why use Replication........................................................................................................................................4
5 Models of Replication (Single Master & Multi Master)..................................................................................5
6 Classes of Replication (Unidirectional & Bidirectional).................................................................................5
7 Modes of Replication (Asynchronous & Synchronous)..................................................................................6
8 Types of Replication (Physical & Logical)......................................................................................................7
9 Methods Of Replication...................................................................................................................................8
9.1 Disk Based Replication.................................................................................................................................8
9.1.1 Introduction................................................................................................................................................8
9.1.2 Setup..........................................................................................................................................................8
9.1.3 Configuring PostgreSQL Replication using NAS.....................................................................................8
9.1.4 Steps to perform Failover........................................................................................................................12
9.2 File System Based.......................................................................................................................................13
9.2.1 Introduction to DRBD.............................................................................................................................13
9.2.2 Setup........................................................................................................................................................15
9.2.3 Configuring PostgreSQL Replication using DRBD with Protocol C......................................................18
9.3 Trigger Based..............................................................................................................................................28
9.3.1 Introduction to Slony-I.............................................................................................................................28
9.3.2 Advantages and Disadvantages of Slony.................................................................................................29
9.3.3 Setup........................................................................................................................................................30
9.3.4 Configuring PostgreSQL Replication using Slony-I...............................................................................30
9.3.5 Steps to perform controlled switchover...................................................................................................43
9.4 Introduction to WAL...................................................................................................................................44
9.4.1 What is WAL and Why is it required.......................................................................................................44
9.4.2 Transaction Log and WAL Segment Files...............................................................................................48
9.4.3 WAL Writer..............................................................................................................................................48
9.4.4 WAL Segment File Management.............................................................................................................49
9.4.5 WAL Example..........................................................................................................................................50
9.4.6 Overview of Replication Options based on WAL....................................................................................55
9.5 Log Shipping Based - File Level................................................................................................................57
9.5.1 Setup........................................................................................................................................................57
9.5.2 Configuring Replication using Log Shipping..........................................................................................57
9.6 Log Shipping Based - Block Level.............................................................................................................65
9.6.1 Physical Streaming Replication...............................................................................................................65
9.6.2 WAL Sender & WAL Receiver................................................................................................................65
9.6.3 WAL Streaming Protocol Details.............................................................................................................66
9.6.4 Setup........................................................................................................................................................67
9.6.5 Configuring PostgreSQL Replication using WAL Streaming..................................................................67
9.7 Logical Decoding Based.............................................................................................................................72
9.7.1 What is Logical Replication.....................................................................................................................72
1/96

9.7.2 Comparison of Physical and Logical Replication....................................................................................73
9.7.3 Publication & Subscription......................................................................................................................74
9.7.4 Logical Decoding Plugin.........................................................................................................................75
9.7.5 Logical replication slots...........................................................................................................................75
9.7.6 test_decoding and pg_recvlogical............................................................................................................75
9.7.7 Setup........................................................................................................................................................80
9.7.8 Configuring PostgreSQL Replication using Logical Decoding...............................................................80
9.7.9 Logical Replication Protocol Details.......................................................................................................83
9.8 Statement Based..........................................................................................................................................87
9.8.1 Introduction to pgpool-II.........................................................................................................................87
9.8.2 Setup........................................................................................................................................................88
9.8.3 Configuring PostgreSQL replication using pgpool-II..............................................................................88
9.9 Other possibilities.......................................................................................................................................96
2/96

1 Objectives
A) Familiarize with Replication in PostgreSQL.
B) Learn configuration and fail-over for each method of replication in PostgreSQL using a two node
cluster.
2 Presenter
My name is Abbas, I have a Masters in Computer Engineering. I have spent most of my career in product
development. I work as a Senior Architect at EnterpriseDB. My work highlights are as follows:
• Migration Portal for online schema migration from Oracle to PostgreSQL
• xDB Replication Server
• Schema Cloning with support for parallelism using Background Workers
• Distributed Transactions (XA) Compliance for PostgreSQL using PgBouncer
• Oracle Compatible Packages for IBM DB2 : UTL_ENCODE, UTL_TCP, UTL_SMTP, UTL_MAIL
• HDFS_FDW, Mongo_FDW, MySQL FDW
• Postgres-XC
Email : abbas.butt@enterprisedb.com
Linkedin : https://pk.linkedin.com/in/abbasbutt
Blog : https://abbas-technical.blogspot.com
3/96

3 What is Replication
Replication is the process of copying data from one database server to a another database
server. The source database server is usually called Master Server, whereas the target database
server is called Slave Server.
4 Why use Replication
Replication of data can have many use cases. For Example:
• Remove reporting queries load from the production OLTP system. This improves
reporting queries time as well as transaction processing performance.
• Fault tolerance : In the event of failure of the master database server, the slave database
server can take over since it is already up to date with the master server. In this
configuration the slave server can also be called standby server. This configuration can
also be used for regular maintenance of the primary server.
• Data migration : To upgrade database server hardware, or to deploy the same system for
another customer.
• Testing systems in parallel : In case we decide to port the application from one DBMS
to another, the results from old and new systems on the same data must be compared to
ensure whether the new system works as expected.
4/96
Master SlaveData

5 Models of Replication (Single Master & Multi Master)
In Single-Master Replication (SMR) changes to table rows in a designated master database
server are replicated to one or more slave database servers. The replicated tables in the slave
database are not permitted to accept any changes (except from the master) and even if they do,
changes are not replicated back to the master server.
In Multi-Master Replication (MMR) changes to table rows in more than one designated
masters are replicated to their counterpart tables in every other database. In this model often
conflict resolution schemes are employed to avoid duplicate primary keys for example.
MMR adds to the use cases of replication in the following manner:
• Write availability and scalability
• Multi-master replication allows you to employ a WAN connected network of master
databases that can be geographically close to groups of clients, yet maintain data
consistency across master databases.
6 Classes of Replication (Unidirectional & Bidirectional)
Single-Master Replication (SMR) is also termed as unidirectional since replication data flows
in one direction only from master to slave, whereas in Multi-Master Replication (MMR)
replication data flows in both directions, it is therefore called bidirectional replication.
5/96
Master-I Master - IIData

7 Modes of Replication (Asynchronous & Synchronous)
In synchronous mode of replication transactions on the master database are declared complete
only when the changes have been replicated to all the slaves in addition to the master. All
slaves have to be available all the time for the transactions to complete on the master.
In Asynchronous mode the transactions on the master server are declared complete when the
changes have been done on the master server. These changes are replicated to the slaves later in
time. In this mode the slaves can remain out-of-sync for a certain duration which is called
replication lag.
6/96
Master
Slave - I
Slave - II
Time
An insert to a replicated table
Master
Slave - I
Slave - II
Time
An insert to a replicated table
Replication Lag

8 Types of Replication (Physical & Logical)
Before we discuss physical and logical replication replication, lets first discuss the context of
the terms physical and logical here.
Logical Operation Physical Operation
1 initdb Creates a base directory for the cluster
2 CREATE DATABASE Create a sub-directory in the base directory
3 CREATE TABLE Creates a file within the sub-directory of the database
4 INSERT Changes the file that was created for this particular table
and writes new WAL Records in the current WAL segment
For example:
ramp=# create table sample_tbl(a int, b varchar(255));
CREATE TABLE
ramp=# SELECT pg_relation_filepath('sample_tbl');
pg_relation_filepath
----------------------
base/34740/706736
(1 row)
ramp=# SELECT datname, oid FROM pg_database WHERE datname = 'ramp';
datname | oid
---------+-------
ramp | 34740
(1 row)
ramp=# SELECT relname, oid FROM pg_class WHERE relname = 'sample_tbl';
relname | oid
------------+--------
sample_tbl | 706736
(1 row)
Physical replication deals with files and directories, it has no knowledge of what these files and
directories represent. It is done at file system level or disk level.
Logical replication on the other hand deals with databases, tables and DML operations. It is
therefore possible in logical replication to replicate a certain set of tables only. It is done at
database cluster level.
7/96

9 Methods Of Replication
9.1 Disk Based Replication
9.1.1 Introduction
A network attached storage with at least two disks can provide transparent replication by using
mirroring i.e. RAID-1. Mirroring provides replication by copying all data from one disk to the
other as if the second disk was mirror image of the first. This configuration provides fault
tolerance in case of a single disk failure.
9.1.2 Setup
The setup consists of one Centos 7 machine with PostgreSQL 10.7 installed and a Western
Digital My Cloud Home 4 TB NAS.
9.1.3 Configuring PostgreSQL Replication using NAS
Step 1: Connect the NAS device to the Internet
The device needs a DHCP server running on the network and needs Internet for
first time configuration.
Step 2: Make sure PostgreSQL machine is connected to the same network as your device
Step 3: Make sure you are able to access the device through the web interface
mycloud.com/hello
Create an account with email and password.
8/96
PostgreSQL
WD My Cloud Home
4 TB
Internet

Step 4: Find the Mac address of your NAS device.
The MAC address for my device is 00:00:c0:08:d7:01
Step 5: Find the IP address of the device
On the PostgreSQL machine run the command
arp -a
and look for an entry like this
? (172.24.37.136) at 00:00:c0:08:d7:01 [ether] on ens160u3u1c2
The IP address of the NAS device is therefore
172.24.37.136
9/96

Step 6: Check the public share on the device
smbclient -N -L 172.24.37.136
Sharename Type Comment
--------- ---- -------
Public Disk
IPC$ IPC IPC Service (MyCloudDevice)
Reconnecting with SMB1 for workgroup listing.
Server Comment
--------- -------
Workgroup Master
--------- -------
WORKGROUP BFAS91-WIN
Step 7: Mount the public share of the device on a local folder on the PostgreSQL machine
mkdir /home/abbas/mc2
Step 7.1: Create a local folder
mkdir /home/abbas/mc2
Step 7.2: Edit the /etc/fstab file and add the following line in it, modes are important
//172.24.37.136/public/
/home/abbas/mc2/
cifs
credentials=/home/abbas/.smbcredentials,
uid=abbas,gid=abbas,rw,dir_mode=0700,file_mode=0700 0 0
Step 7.3: Create the credentials file as follows
vim ~/.smbcredentials
username=abbas
password=abc123
Step 7.4: Mount
sudo mount -a
Step 7.5: Check the mounted folder
ls -l /home/abbas/mc2
total 4
-rwx------. 1 abbas abbas 1135 Mar 10 05:30 for_nas.txt
drwx------. 2 abbas abbas 0 Mar 11 07:08 for_pg
10/96

Step 8: Create a folder to initdb, note the permissions, that’s why modes are important in
step 7.2
mkdir /home/abbas/mc2/data
ls -l /home/abbas/mc2
total 4
drwx------. 2 abbas abbas 0 Mar 11 07:42 data
-rwx------. 1 abbas abbas 1135 Mar 10 05:30 for_nas.txt
drwx------. 2 abbas abbas 0 Mar 11 07:08 for_pg
Step 8: Initialize cluster
./initdb -D /home/abbas/mc2/data/
Step 9: Run the server
./postgres -D /home/abbas/mc2/data -p 7654
Step 10: Create a new table
./psql -p 7654 postgres
create table test_tab(a int, b varchar(10));
SELECT pg_relation_filepath('test_tab');
----------------------
base/13212/16384
(1 row)
11/96

Step 11: Connect to the device using nautilus
Step 12: Check the relation file and its path
9.1.4 Steps to perform Failover
In a two disk NAS device that has RAID-1 build in to it, the user can simply remove the faulty
disk and replace it with a new disk, the database server will never notice the absence of the
second disk or its replacement.
12/96
Enter device address
smb://172.24.37.136
Press connect button
smb://172.24.37.136/public/data/base/13212

9.2 File System Based
9.2.1 Introduction to DRBD
Distributed Replicated Block Device (DRBD) is a software module that provides disk or
partition mirroring between network hosts. DRBD is a virtual block device driver implemented
as a kernel module. It provides replication solution which is independent of the application that
is generating the data to be replicated.
PostgreSQL is configured to use data directory on the DRBD controlled partition. When
PostgreSQL writes any data, DRBD module not only writes that data on the disk but also sends
the same data on the network to the connected secondary. The DRBD module on the secondary
receives the data from the network and writes it to the disk.
In DRBD the most commonly used data synchronization mode is Single-Primary. In the
single primary mode only one cluster node manipulates the data at any moment. DRBD can
also support Dual-Primay mode. We are using Single Primary Mode with ext4 file system.
DRBD Supports three replication protocols:
Protocol A - Asynchronous replication protocol. Local write operations on the primary node
are considered completed as soon as the local disk write has finished, and the replication
packet has been placed in the local TCP send buffer. In the event of forced fail-over, data loss
may occur.
13/96
PostgreSQL
DRBD
Write
Primary Secondary
DRBD
sda2sda2
Replication

Protocol B - Memory synchronous (semi-synchronous) replication protocol. Local write
operations on the primary node are considered completed as soon as the local disk write has
occurred, and the replication packet has reached the peer node. Normally, no writes are lost in
case of forced fail-over.
Protocol C - Synchronous replication protocol. Local write operations on the primary node are
considered completed only after both the local and the remote disk write have been confirmed.
As a result, loss of a single node is guaranteed not to lead to any data loss.
Most commonly used replication protocol in DRBD setup is Protocol C.
14/96

9.2.2 Setup
The setup consists of two CentOS 7 machines connected via LAN installed with two partitions.
While installing CentOS 7, choose "Installation Destination" option
Deselect "Automatically configure partitioning" and
Select "I will configure partitioning"
After clicking Done, Manual Partitioning screen will appear
Click the + button to add a mount point
Mount Point /
Desired Capacity 15 GiB
File System ext4
15/96

For swap Enter Desired Capacity 4GiB
Mount Point /for_data
Desired Capacity 12 GiB
File System ext4
16/96

Click Done
Accept Changes
17/96

9.2.3 Configuring PostgreSQL Replication using DRBD with Protocol C
All steps are for both primary and secondary node, unless mentioned otherwise.
Step 1: Disable and stop firewall on both the nodes
sudo firewall-cmd --state
sudo systemctl stop firewalld
sudo systemctl disable firewalld
sudo systemctl mask --now firewalld
Step 2: Change hostname
sudo hostnamectl set-hostname primary
sudo hostnamectl set-hostname secondary
Step 3: Install Extra Packages for Enterprise Linux (EPEL) repository
sudo yum install epel-release
sudo rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
sudo rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm
Step 4: Install DRBD
sudo yum install drbd90-utils kmod-drbd90
Step 5: Restart the System
Step 6: Install the kernel module
sudo modprobe drbd
18/96

Step 7: Create configuration file
sudo vim /etc/drbd.d/pgconf.res
resource pgconf
{
protocol C;
on primary
{
device /dev/drbd0;
disk /dev/sda2;
address 172.16.214.151:7788;
meta-disk internal;
}
on secondary
{
device /dev/drbd0;
disk /dev/sda2;
address 172.16.214.150:7788;
meta-disk internal;
}
}
Step 8: Unmount the disk
df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 15G 5.7G 8.2G 41% /
devtmpfs 1.4G 0 1.4G 0% /dev
tmpfs 1.4G 58M 1.3G 5% /dev/shm
tmpfs 1.4G 11M 1.4G 1% /run
tmpfs 1.4G 0 1.4G 0% /sys/fs/cgroup
/dev/sda2 12G 41M 12G 1% /for_data
tmpfs 278M 4.0K 278M 1% /run/user/42
tmpfs 278M 44K 278M 1% /run/user/1000
19/96

sudo umount /dev/sda2
df -h
/dev/sda1 15G 5.7G 8.2G 41% /
Step 9: Delete file system from the disk, DRBD needs a disk without any file system
sudo yum install util-linux
sudo wipefs /dev/sda2
offset type
----------------------------------------------------------------
0x438 ext4 [filesystem]
UUID: 8def5959-4dc9-4605-ad61-bd3b597966a3
sudo wipefs -a /dev/sda2
/dev/sda2: 2 bytes were erased at offset 0x00000438 (ext4): 53 ef
Step 10: Create DRBD device meta data
sudo drbdadm create-md pgconf
md_offset 12883849216
al_offset 12883816448
bm_offset 12883423232
Found some data
==> This might destroy existing data! <==
Do you want to proceed?
[need to type 'yes' to confirm] yes
initializing activity log
initializing bitmap (384 KB) to all zero
20/96

Writing meta data...
New drbd meta data block successfully created.
success
Step 11: Associate the DRBD disk with the backing device on the both nodes
sudo lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 30G 0 disk
├─sda1 8:1 0 15G 0 part /
├─sda2 8:2 0 12G 0 part
└─sda3 8:3 0 3G 0 part [SWAP]
sr0 11:0 1 1024M 0 rom
NAME This is the device name.
MAJ:MIN This column shows the major and minor device number.
RM This column shows whether the device is removable or not.
SIZE This is column give information on the size of the device.
RO This indicates whether a device is read-only.
TYPE This column shows information whether the block device is a disk or a
partition(part) within a disk. In this example sda is a disk, sda1, sda2
& sda3 are partitions and sr0 is read only memory (rom)
MOUNTPOINT This column indicates mount point on which the device is mounted.
sudo drbdadm up pgconf
sudo lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 30G 0 disk
├─sda1 8:1 0 15G 0 part /
├─sda2 8:2 0 12G 0 part
│ └─drbd0 147:0 0 12G 1 disk
└─sda3 8:3 0 3G 0 part [SWAP]
sr0 11:0 1 1024M 0 rom
21/96

Step 12: Start drbd on both the nodes
sudo systemctl start drbd
sudo systemctl enable drbd
Step 13: Start initial full synchronization on the primary node
sudo drbdadm primary pgconf --force
Step 14: Build ext4 file system on DRBD device on the primary node
sudo mkfs -t ext4 /dev/drbd0
mke2fs 1.42.9 (28-Dec-2013)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
786432 inodes, 3145367 blocks
157268 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=2151677952
96 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208
Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
22/96

Step 15: Mount the DRBD device on the primary node
sudo mount /dev/drbd0 /for_data
df -h
/dev/sda1 15G 5.7G 8.2G 41% /
/dev/drbd0 12G 41M 12G 1% /for_data
Step 16: Check the connections between primary and secondary nodes
sudo netstat -n | grep 7788
tcp 0 0 172.16.214.151:47609 172.16.214.150:7788 ESTABLISHED
tcp 0 0 172.16.214.151:7788 172.16.214.150:40336 ESTABLISHED
Step 17: Check DRBD processes
ps -ef --forest | grep drbd
root 11109 2 0 13:35 ? 00:00:00 _ [drbd-reissue]
root 88248 2 0 21:37 ? 00:00:00 _ [drbd_w_pgconf]
root 88250 2 0 21:37 ? 00:00:00 _ [drbd0_submit]
root 88256 2 0 21:37 ? 00:00:02 _ [drbd_s_pgconf]
root 88262 2 6 21:37 ? 00:01:29 _ [drbd_r_pgconf]
root 88269 2 0 21:37 ? 00:00:00 _ [drbd_a_pgconf]
root 88270 2 0 21:37 ? 00:00:00 _ [drbd_as_pgconf]
23/96

Step 18: Check the output of drbdmon tool
drbdmon
Step 19: Install PostgreSQL
git clone git://git.postgresql.org/git/postgresql.git
cd postgresql/
git checkout REL_11_STABLE
./configure --prefix=/usr/local/pg11 --enable-debug CFLAGS=-O0
make && make install
24/96

Step 20: Initialize the cluster
./initdb -D /for_data/data
Note that we are using the DRBD device to store data, since that device is being
replicated to the secondary.
Step 21: Start the server
./postgres -D /for_data/data/ -p 6543
Step 22: Create a table and insert some rows
create table for_testing(id int primary key, value varchar(255));
insert into for_testing values(1, 'One');
insert into for_testing values(2, 'Two');
insert into for_testing values(3, 'Three');
Step 23: Simulate disk failure on the primary node
./pg_ctl stop /for_data/data/
sudo umount /for_data
25/96

Step 1: Install PostgreSQL on the secondary node
Step 2: Check the data directory replicated by DRBD
sudo mkdir /usr/local/pg11
sudo chown abbas:abbas /usr/local/pg11
sudo drbdadm primary pgconf
sudo mount /dev/drbd0 /for_data/
ls -l /for_data/
total 20
drwx------ 19 abbas abbas 4096 Jan 3 05:08 data
drwx------ 2 root root 16384 Jan 2 21:45 lost+found
ls -l /for_data/data/
total 116
drwx------ 5 abbas abbas 4096 Jan 3 03:59 base
drwx------ 2 abbas abbas 4096 Jan 3 04:00 global
drwx------ 2 abbas abbas 4096 Jan 3 03:59 pg_commit_ts
drwx------ 2 abbas abbas 4096 Jan 3 03:59 pg_dynshmem
-rw------- 1 abbas abbas 4513 Jan 3 03:59 pg_hba.conf
-rw------- 1 abbas abbas 1636 Jan 3 03:59 pg_ident.conf
drwx------ 4 abbas abbas 4096 Jan 3 05:08 pg_logical
drwx------ 4 abbas abbas 4096 Jan 3 03:59 pg_multixact
drwx------ 2 abbas abbas 4096 Jan 3 03:59 pg_notify
drwx------ 2 abbas abbas 4096 Jan 3 03:59 pg_replslot
drwx------ 2 abbas abbas 4096 Jan 3 03:59 pg_serial
drwx------ 2 abbas abbas 4096 Jan 3 03:59 pg_snapshots
drwx------ 2 abbas abbas 4096 Jan 3 05:08 pg_stat
drwx------ 2 abbas abbas 4096 Jan 3 05:08 pg_stat_tmp
drwx------ 2 abbas abbas 4096 Jan 3 03:59 pg_subtrans
drwx------ 2 abbas abbas 4096 Jan 3 03:59 pg_tblspc
drwx------ 2 abbas abbas 4096 Jan 3 03:59 pg_twophase
-rw------- 1 abbas abbas 3 Jan 3 03:59 PG_VERSION
drwx------ 3 abbas abbas 4096 Jan 3 03:59 pg_wal
drwx------ 2 abbas abbas 4096 Jan 3 03:59 pg_xact
-rw------- 1 abbas abbas 88 Jan 3 03:59 postgresql.auto.conf
-rw------- 1 abbas abbas 23866 Jan 3 03:59 postgresql.conf
-rw------- 1 abbas abbas 64 Jan 3 03:59 postmaster.opts
26/96

Step 3: Start the server on this data directory
./postgres -D /for_data/data/ -p 6543
Step 4: Check the table and the data in it
psql (11.1)
Type "help" for help.
postgres=# select * from for_testing;
id | value
----+-------
1 | One
2 | Two
3 | Three
(3 rows)
27/96

9.3 Trigger Based
9.3.1 Introduction to Slony-I
Slony is a master to multiple slaves AFTER ROW trigger based asynchronous logical
replication system for PostgreSQL. Slony supports cascading. Direct subscribers put load on
master, indirect subscribers put load on direct subscribers.
Slony uses the following terminology:
Cluster : A named set of PostgreSQL instances between which replication takes place.
Node : A named PostgreSQL instance that participates as master/slave in a replication cluster.
Set : A set of tables that need to be replicated between two nodes.
Origin & Subscriber : Each replication set has an origin (master) and a subscriber. Origin is
where the modifications of the data take place and subscriber is where those changes get
replicated to.
Slon daemon : Slon daemon runs on each node in the cluster. It manages replication activity
for that node. Slon processes replication events. Replication events are of two types:
Configuration events : which occur when the configuration of the cluster is changed. Slon in
this case would replicate the changed configuration to all the nodes. For example adding a
table to a subscribed set.
SYNC events : which occur when replicated tables are updated. Lets look in detail how does
an insert in a table on origin node gets replicated to a slave node. The following diagram shows
a two node cluster that is using Slony for replication between one master and one slave.
Each slon daemon establishes connection with master as well as slave database.
When the slony replication system is installed it performs the following steps:
• Creates an AFTER INSERT UPDATE DELETE ROW trigger on the table to be
replicated on the master node.
• Creates an trigger to deny any writes to the replicated table on the slave.
• Creates tables and functions required to support replication in a separate schema
named after the cluster name .
When the client inserts a row in the table on the master then following happens to do the
replication:
• The after row trigger inserts a log row in the table sl_log_1 or sl_log_2 table.
• The slon daemon on the master inserts a row in sl_event and issues a NOTIFY. This
28/96

generates a SYNC event.
• The slon daemon on the slave listens to the notification and reads the sl_log_1 or
sl_log_2 table form the remote database.
• The slon daemon constructs the insert statement and executes it locally to replicate the
row to the slave.
9.3.2 Advantages and Disadvantages of Slony
Advantages:
• Slony allows to replicate a small subset of tables in a database.
• Slony works across different PostgreSQL major versions.
• Slony provide ability to create additional indexes on slaves.
• Slony can be used to upgrade from an older PostgreSQL version to a newer one.
• Baring tables of the set, slony allows slaves to be used for read/write activity.
• Load of indirect slaves is not put on the master. Only direct slaves are server’s load.
Disadvantages:
• Slony cannot replicate large objects, DDL commands, users and roles.
• Slony is asynchronous and cannot provide ability to failover with zero transaction loss.
• Slony puts load on the master. The more the slaves the more the load.
• Slony mandates the use of primary key on all the tables to be replicated.
29/96
Origin
Slave
PostgreSQL
PostgreSQL
slon
slon
Remote connection
Remote connection
Local
connection
Local
connection

9.3.3 Setup
The setup consists of two CentOS 7 machines connected via LAN on which PostgreSQL
version 10.7 and slony version 2.2.6 is installed.
9.3.4 Configuring PostgreSQL Replication using Slony-I
Step 1: Disable and stop firewall on both the nodes
Step 2: Install PostgreSQL and Slony
Download postgresql-10.7-1-linux-x64.run from EnterpriseDB website and install all the
components.
Run StackBuilder and install Slony 2.2.6.
Step 3: Configure trust authentication in both master and slave
As postgres user do the following
cd /opt/PostgreSQL/10/bin/
./pg_ctl stop -D ../data
vim ../data/pg_hba.conf
host all all 172.16.214.163/24 trust
./pg_ctl start -D ../data
Step 4: Export environment variables
export CLUSTERNAME=slony_example
export MASTERDBNAME=for_slony
export SLAVEDBNAME=for_slony
export MASTERHOST=172.16.214.163
export SLAVEHOST=172.16.214.162
export MASTERPORT=5432
export SLAVEPORT=5432
30/96

export REPLICATIONUSER=postgres
export PATH=$PATH:/opt/PostgreSQL/10/bin/
Step 5: Make sure both servers are accessible from both machines
./psql -h $MASTERHOST -p $MASTERPORT -U $REPLICATIONUSER $MASTERDBNAME
psql.bin (10.7)
postgres=# q
./psql -h $SLAVEHOST -p $SLAVEPORT -U $REPLICATIONUSER $SLAVEDBNAME
psql.bin (10.7)
postgres=# q
Step 6: Create database, tables and insert some values on master
./createdb -h $MASTERHOST -p $MASTERPORT -U $REPLICATIONUSER $MASTERDBNAME
./psql -h $MASTERHOST -p $MASTERPORT -U $REPLICATIONUSER $MASTERDBNAME
CREATE TABLE student(sid INT PRIMARY KEY, sname VARCHAR(255), saddress VARCHAR(255));
CREATE TABLE teacher(tid INT PRIMARY KEY, tname VARCHAR(255), tsubject VARCHAR(255));
INSERT INTO student VALUES(1, 'Edward', 'Main Campus');
INSERT INTO student VALUES(2, 'Linda', 'Girls Hostel');
INSERT INTO student VALUES(3, 'Jason', 'Boys Hostel');
INSERT INTO teacher VALUES(1, 'Gary', 'Physics');
INSERT INTO teacher VALUES(2, 'Karen', 'Maths');
INSERT INTO teacher VALUES(3, 'Carol', 'History');
Step 7: Create database and tables on slave
./createdb -h $SLAVEHOST -p $SLAVEPORT -U $REPLICATIONUSER $SLAVEDBNAME
31/96

Step 8: Create and execute slony setup script to do the following steps
./slony_setup.sh
<stdin>:21: Possible unsupported PostgreSQL version (100700) 10.7, defaulting to 8.4 support
<stdin>:36: Possible unsupported PostgreSQL version (100700) 10.7, defaulting to 8.4 support
Step 8.1: Define the schema name that slony uses to create all slony objects, in our
example it is _slony_example
cluster name = $CLUSTERNAME;
Step 8.2: Provide connection info that is used by slonik to connect to master and slave
node 1 admin conninfo = 'dbname=$MASTERDBNAME host=$MASTERHOST
port=$MASTERPORT user=$REPLICATIONUSER';
node 2 admin conninfo = 'dbname=$SLAVEDBNAME host=$SLAVEHOST port=$SLAVEPORT
user=$REPLICATIONUSER';
Step 8.3: Initialize the first node. Its id MUST be 1. This creates the schema
_slony_example containing all replication system specific database objects. The main
tables that store change log are _slony_example.sl_log_1 & _slony_example.sl_log_2. The
main function that adds change log to these tables is _slony_example.logtrigger which
calls the C function _Slony_I_logTrigger
init cluster ( id=1, comment = 'Master Node');
Step 8.4: Create a table set that can be subscribed by slaves
create set (id=1, origin=1, comment='some tables');
set add table (set id=1, origin=1, id=1, fully qualified name =
'public.student', comment='student table');
set add table (set id=1, origin=1, id=2, fully qualified name =
'public.teacher', comment='teacher table');
Step 8.5: Create a slave node
store node (id=2, comment = 'Slave node', event node=1);
Step 8.6: Provide connection info for nodes to be able to connect to listen for events
store path (server = 1, client = 2, conninfo='dbname=$MASTERDBNAME
host=$MASTERHOST port=$MASTERPORT user=$REPLICATIONUSER');
store path (server = 2, client = 1, conninfo='dbname=$SLAVEDBNAME
host=$SLAVEHOST port=$SLAVEPORT user=$REPLICATIONUSER');
32/96

The setup script performs the following actions
Action 1: It creates the following triggers on each of the table in the set on the master
for_slony=# d+ student
Table "public.student"
Column | Type |
----------+------------------------+
sid | integer |
sname | character varying(255) |
saddress | character varying(255) |
Indexes:
"student_pkey" PRIMARY KEY, btree (sid)
Triggers:
_slony_example_logtrigger AFTER INSERT OR DELETE OR UPDATE ON student
FOR EACH ROW EXECUTE PROCEDURE
_slony_example.logtrigger('_slony_example','1','k')
_slony_example_truncatetrigger BEFORE TRUNCATE ON student
FOR EACH STATEMENT EXECUTE PROCEDURE
_slony_example.log_truncate('1')
Disabled user triggers:
_slony_example_denyaccess BEFORE INSERT OR DELETE OR UPDATE ON student
_slony_example.denyaccess('_slony_example')
_slony_example_truncatedeny BEFORE TRUNCATE ON student
_slony_example.deny_truncate()
33/96

Action 2: It creates the following triggers on each of the table in the set on the slave
for_slony=# d+ student
Table "public.student"
Column | Type |
----------+------------------------+
sid | integer |
sname | character varying(255) |
saddress | character varying(255) |
Indexes:
"student_pkey" PRIMARY KEY, btree (sid)
Triggers:
_slony_example_denyaccess BEFORE INSERT OR DELETE OR UPDATE ON student
_slony_example.denyaccess('_slony_example')
_slony_example_truncatedeny BEFORE TRUNCATE ON student
_slony_example.deny_truncate()
Disabled user triggers:
_slony_example_logtrigger AFTER INSERT OR DELETE OR UPDATE ON student
_slony_example.logtrigger('_slony_example', '1', 'k')
_slony_example_truncatetrigger BEFORE TRUNCATE ON student
_slony_example.log_truncate('1')
34/96

Action 4: Creates the following triggers
List of triggers
----------------+------------------------------------+
Schema | Name |
----------------+------------------------------------+
_slony_example | logapply
_slony_example | log_truncate
_slony_example | deny_truncate
_slony_example | logtrigger
_slony_example | denyaccess
_slony_example | lockedset
Action 5: Creates around 150 functions in the _slony_example schema
It does not run any daemon on master or slave, i.e. it does not start
replication process, It does not copy any data from master to slave.
Step 9: Start the slon daemon on both the master and slave
slon $CLUSTERNAME "dbname=$MASTERDBNAME user=$REPLICATIONUSER
host=$MASTERHOST port=$MASTERPORT"
slon $CLUSTERNAME "dbname=$SLAVEDBNAME user=$REPLICATIONUSER host=$SLAVEHOST
port=$SLAVEPORT"
slon daemon should emit messages of the sort
INFO remoteWorkerThread_2: SYNC 5000000178 done in 0.003 seconds
NOTICE: Slony-I: log switch to sl_log_2 complete - truncate sl_log_1
INFO cleanupThread: 0.020 seconds for cleanupEvent()
Connection problems result in errors like this
WARN remoteListenThread_2: DB connection failed - sleep 10 seconds
ERROR slon_connectdb: PQconnectdb("dbname=for_slony host=w.x.y.z port=5432 user=postgres")
failed - fe_sendauth: no password supplied
36/96

Step 10: Start the subscription
./slony_sub.sh
The following script instructs slony to subscribe set whose id is 1 and
whose provider (master) id is 1 for receiver (slave) whose id is 2
#!/bin/sh
slonik <<_EOF_
# ----
# This defines which namespace the replication system uses
# ----
# ----
# Admin conninfo’s are used by the slonik program to connect
# to the node databases. So these are the PQconnectdb arguments
# that connect from the administrators workstation (where
# slonik is executed).
# ----
node 1 admin conninfo = 'dbname=$MASTERDBNAME host=$MASTERHOST
port=$MASTERPORT user=$REPLICATIONUSER';
node 2 admin conninfo = 'dbname=$SLAVEDBNAME host=$SLAVEHOST
port=$SLAVEPORT user=$REPLICATIONUSER';
# ----
# Node 2 subscribes set 1
# ----
subscribe set ( id = 1, provider = 1, receiver = 2, forward = no);
_EOF_
37/96

slon will emit messages similar to the following, which copy initial data
form master to slave
CONFIG version for "dbname=for_slony host=172.16.214.163 port=5432 user=postgres" is 100700
CONFIG remoteWorkerThread_1: connected to provider DB
CONFIG remoteWorkerThread_1: prepare to copy table "public"."student"
CONFIG remoteWorkerThread_1: prepare to copy table "public"."teacher"
CONFIG remoteWorkerThread_1: all tables for set 1 found on subscriber
CONFIG remoteWorkerThread_1: copy table "public"."student"
CONFIG remoteWorkerThread_1: Begin COPY of table "public"."student"
NOTICE: truncate of "public"."student" succeeded
CONFIG remoteWorkerThread_1: 62 bytes copied for table "public"."student"
CONFIG remoteWorkerThread_1: 0.082 seconds to copy table "public"."student"
CONFIG remoteWorkerThread_1: copy table "public"."teacher"
CONFIG remoteWorkerThread_1: Begin COPY of table "public"."teacher"
NOTICE: truncate of "public"."teacher" succeeded
CONFIG remoteWorkerThread_1: 45 bytes copied for table "public"."teacher"
CONFIG remoteWorkerThread_1: 0.031 seconds to copy table "public"."teacher"
INFO remoteWorkerThread_1: copy_set SYNC found, use event seqno 5000000311.
INFO remoteWorkerThread_1: 0.018 seconds to build initial setsync status
INFO copy_set 1 done in 0.172 seconds
CONFIG enableSubscription: sub_set=1
CONFIG storeListen: li_origin=1 li_receiver=2 li_provider=1
CONFIG remoteWorkerThread_1: update provider configuration
CONFIG remoteWorkerThread_1: added active set 1 to provider 1
CONFIG version for "dbname=for_slony host=172.16.214.163 port=5432 user=postgres" is 100700
38/96

Step 13: Try insert on master
insert into student values(4, 'David', 'Kent');
Check the log entry:
select * from _slony_example.sl_log_2;
------------+----------+-------------+---------------+------------------+
log_origin | log_txid | log_tableid | log_actionseq | log_tablenspname |
------------+----------+-------------+---------------+------------------+
1 | 5446 | 1 | 1 | public |
------------+----------+-------------+---------------+------------------+
------------------+-------------+-----------------+----------------------------------
log_tablerelname | log_cmdtype | log_cmdupdncols | log_cmdargs
------------------+-------------+-----------------+----------------------------------
student | I | 0 | {sid,4,sname,David,saddress,Kent}
------------------+-------------+-----------------+----------------------------------
for_slony=# select ctid,xmin, xmax, cmin, * from student;
ctid | xmin | xmax | cmin | sid | sname | saddress
-------+------+------+------+-----+--------+--------------
(0,1) | 560 | 0 | 0 | 1 | Edward | Main Campus
(0,2) | 561 | 0 | 0 | 2 | Linda | Girls Hostel
(0,3) | 562 | 0 | 0 | 3 | Jason | Boys Hostel
(0,4) | 5446 | 0 | 0 | 4 | David | Kent
(4 rows)
40/96

Step 14: Try update on master
for_slony=# update student set saddress = 'Whales' where sid = 4;
UPDATE 1
------------+----------+-------------+---------------+------------------+
------------+----------+-------------+---------------+------------------+
1 | 6239 | 1 | 2 | public |
------------+----------+-------------+---------------+------------------+
------------------+-------------+-----------------+------------------------
------------------+-------------+-----------------+------------------------
student | U | 1 | {saddress,Whales,sid,4}
------------------+-------------+-----------------+------------------------
Step 15: Check result on slave
-----+--------+--------------
4 | David | Whales
(4 rows)
41/96

Step 16: Try delete on master
for_slony=# delete from student where sid = 4;
DELETE 1
------------+----------+-------------+---------------+------------------+
------------+----------+-------------+---------------+------------------+
1 | 6407 | 1 | 3 | public |
------------+----------+-------------+---------------+------------------+
------------------+-------------+-----------------+------------------------
------------------+-------------+-----------------+------------------------
student | D | 0 | {sid,4}
------------------+-------------+-----------------+------------------------
Step 17: Check result on slave
-----+--------+--------------
(3 rows)
42/96

9.3.5 Steps to perform controlled switchover
A small slonik script can achieve a controlled switch over in which we switch roles of the two
nodes completely. The old master would now become slave and the old slave would be the new
master. Please note that this is a planned activity and it has nothing to do with any type of
failure.
#!/bin/sh
slonik <<_EOF_
node 1 admin conninfo = 'dbname=$MASTERDBNAME host=$MASTERHOST port=$MASTERPORT user=$REPLICATIONUSER';
node 2 admin conninfo = 'dbname=$SLAVEDBNAME host=$SLAVEHOST port=$SLAVEPORT user=$REPLICATIONUSER';
lock set (id = 1, origin = 1);
wait for event (origin = 1, confirmed = 2, wait on=1);
move set (id = 1, old origin = 1, new origin = 2);
wait for event (origin = 1, confirmed = 2, wait on=1);
_EOF_
After the command runs the slony trigger definitions on tables in the set would have changed
on the new master. _slony_example_denyaccess & _slony_example_denyaccess triggers would
get disabled and _slony_example_logtrigger & _slony_example_truncatetrigger enabled on the
new master. Changes to the tables in the set would therefore be possible on the new master.
43/96

9.4 Introduction to WAL
9.4.1 What is WAL and Why is it required
In PostgreSQL all changes made by every transaction are first saved in a log file and then the
result of the transaction is sent to the initiating client. Data files are not changed on every
transaction. This is a standard mechanism to prevent data loss in case of situations like OS
crash, hardware failure, PostgreSQL crash etc. This mechanism is called Write Ahead
Logging and the log file is call Write Ahead Log (WAL).
Each change that the transaction performs (INSERT, UPDATE, DELETE, COMMIT) is
written in the log as a WAL record. WAL records are first written into an in-memory WAL
buffer. On transaction commit the records are written into a WAL segment file on the disk.
Log sequence number (LSN) of a WAL record represents the location/position where it is
saved in the log file. LSN is used as a unique id of the WAL record. Logically transaction log is
file whose size is 2^64 bytes. LSN is therefore a 64bit number represented as two 32 bit
hexadecimal numbers separated by a /. For example:
select pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
0/2BDBBD0
(1 row)
44/96

In the event of a system crash the database can recover committed transactions from the WAL.
While recovering PostgreSQL starts recovery from the last REDO point or checkpoint. A
checkpoint is a point in the transaction log at which all data files have been updated to reflect
the information in the log. The process of saving the WAL records from the log file to the
actual data files is called check-pointing.
Lets consider a case where database crashes after two transactions which perform one insert
each and WAL is used for recovery.
1. Assume a CHECKPOINT is issued which stores the location of the latest REDO point in the
current WAL segment. This also flushes all dirty pages in the shared buffer pool to the disk.
This guarantees that WAL records before the REDO point are no longer needed for recovery,
since all data has been flushed to the disk pages.
2. First INSERT statement is issued. The table’s page is loaded from disk to the buffer pool.
3. A tuple is inserted into the loaded page.
4. WAL record of this insert is saved into the WAL buffer at location LSN_1.
5. Update page LSN, which identifies WAL record for last change to this page, from LSN_0 to
LSN_1.
6. First COMMIT statement is issued.
7. WAL record of this commit action is written into the WAL buffer, and then, all WAL records
in the WAL buffer upto this page’s LSN are flushed to the WAL segment file.
8. For the second INSERT and commit steps 2 to 7 are repeated.
45/96

Operation
performed
by the
client
CHECKPOINT BEGIN;
INSERT INTO TAB
VALUES (‘A’);
COMMIT; BEGIN;
INSERT INTO TAB
VALUES (‘B’);
COMMIT;
Shared
buffer pool
WAL
Buffer
WAL
Segment
Data files
containing
pages
46/96
TAB
LSN_0 LSN_1
A
A
LSN_1
COMMIT
A
LSN_1
COMMIT
TAB
LSN_1 LSN_2
A
B
LSN_2
COMMIT
B
B
LSN_2
COMMIT
TAB
LSN_0
TAB
LSN_0
REDO Point
CHECKPOINT
REDO Point
CHECKPOINT

In the event of an operating system crash, all of data on the shared buffer pool will be lost,
however all modifications of the page have been written into the WAL segment files as history
data. The following steps show how our database cluster can recover back to the state
immediately before the crash using WAL records. There is no need to do anything special,
since PostgreSQL will automatically enter into the recovery-mode after restarting.
1. PostgreSQL reads the WAL record of the first INSERT statement from the appropriate WAL
segment file.
2. PostgreSQL loads the table's page from the database cluster into the shared buffer pool.
3. PostgreSQL compares the WAL record's LSN (LSN_1) with page LSN (LSN_0). Since
LSN_1 is greater than LSN_0, the tuple in the WAL record is inserted into the page and page's
LSN is updated to LSN_1.
The remaining WAL records are replayed in the similar manner.
Shared
Buffer
pool
WAL Segment
Data
files
containing
pages
47/96
TAB
LSN_0
TAB
LSN_0 LSN_1
A
TAB
LSN_1 LSN_2
AB
TAB
LSN_0
A
LSN_1
COMMIT B
LSN_2
COMMIT
REDO
Point

9.4.2 Transaction Log and WAL Segment Files
In PostgreSQL transaction log is a virtual file with a capacity of 8-byte length.
Physically the log is divided into 16 Mega byte files, each of which is called a WAL
segment.
WAL segment file name is a 24 digit number with the naming rule as follows:
Assuming that current time line ID is 0x00000001 the first WAL segment file names
will be
00000001 00000000 0000000
00000001 00000000 0000001
00000001 00000000 0000002
……….
00000001 00000001 0000000
00000001 00000001 0000001
00000001 00000001 0000002
…………
00000001 FFFFFFFF FFFFFFFD
00000001 FFFFFFFF FFFFFFFE
00000001 FFFFFFFF FFFFFFFF
For Example:
select pg_walfile_name('0/2BDBBD0');
pg_walfile_name
--------------------------
000000010000000000000002
9.4.3 WAL Writer
WAL writer is a background process to check WAL buffer periodically and write all
unwritten WAL records to into the WAL segments. WAL writer avoids burst of IO
activity and spans it over time with little amount of IO activity. The configuration
48/96
Timeline ID
8 digits
Log Sequence Number / 256
8 digits
Log Sequence Number % 256
8 digits

parameter wal_writer_delay controls how often WAL writer flushes the WAL, with
default value of 200 ms.
9.4.4 WAL Segment File Management
WAL segment files are stored in pg_wal sub-directory. PostgreSQL switches for a new
WAL segment file under the following conditions:
1. WAL segment has been filled up.
2. The function pg_switch_wal has been issued.
3. archive_mode is enabled and the time set to archive_timeout has been exceeded.
Switched WAL files can either be removed or recycled i.e. renamed and reused for
future. The number of WAL files that the server would retain at any point in time
depends on server configuration as well as server activity.
Whenever the checkpoint starts, PostgreSQL estimates and prepares the number of WAL
segment files required for this checkpoint cycle. Such estimate is made with regards to
the numbers of files consumed in previous checkpoint cycles. They are counted from the
segment that contains the prior REDO point, and the value is to be between
min_wal_size (by default, 80 MB, i.e. 5 files) and max_wal_size (1 GB, i.e. 64 files). If
a checkpoint starts, necessary files will be held and recycled, while the unnecessary ones
removed.
A specific example is shown in the diagram below. Assuming that there are six files
before checkpoint starts, WAL_3 contains the prior REDO point (or REDO point in
version 11), and PostgreSQL estimates that five files are needed. In this case, WAL_1
will be renamed as WAL_7 for recycling and WAL_2 will be removed.
49/96
WAL_1 WAL_2 WAL_3 WAL_4 WAL_5 WAL_6
WAL_1 WAL_2 WAL_3 WAL_4 WAL_5 WAL_6 WAL_7
REDO Point
Estimated number of WAL segments needed by server
Unneeded file
WAL_1 renamed as WAL_7 to be re-used

9.4.5 WAL Example
Step 1:SELECT datname, oid FROM pg_database WHERE datname = 'postgres';
datname | oid
----------+-------
postgres | 15709
(1 row)
Note the database OID i.e. 15709
Step 2: SELECT oid,* from pg_tablespace;
oid | spcname | spcowner | spcacl | spcoptions
------+------------+----------+--------+------------
1663 | pg_default | 10 | |
1664 | pg_global | 10 | |
(2 rows)
Note the table space OID i.e. 1663
Step 3: SELECT pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
0/1C420B8
(1 row)
Note the LSN i.e. 0/1C420B8
Step 4: CREATE TABLE abc(a VARCHAR(10));
Step 5: SELECT pg_relation_filepath('abc');
----------------------
base/15709/16384
(1 row)
Note the relation file name base/15709/16384
50/96

Step 6: ./pg_waldump --path=/tmp/sd/pg_wal –start=0/1C420B8
and use the Start LSN noted in step 3.
Note that the WAL contains the instruction to create physical file
15709 → database postgres → noted in step 1
16384 → table abc → noted in step 5
rmgr Len(rec
/tot)
tx lsn prev desc
XLOG 30/ 30 0 0/01C420B8 0/01C42080 NEXTOID 24576
Storage 42/ 42 0 0/01C420D8 0/01C420B8 CREATE base/15709/16384
Heap 203/203 1216 0/01C42108 0/01C420D8 INSERT off 2, blkref #0: rel 1663/15709/1247 blk 0
Btree 64/ 64 1216 0/01C421D8 0/01C42108 INSERT_LEAF off 298, blkref #0: rel 1663/15709/2703 blk 2
Btree 64/ 64 1216 0/01C42218 0/01C421D8 INSERT_LEAF off 7, blkref #0: rel 1663/15709/2704 blk 5
Heap 80/ 80 1216 0/01C42258 0/01C42218 INSERT off 30, blkref #0: rel 1663/15709/2608 blk 9
Btree 72/ 72 1216 0/01C422A8 0/01C42258 INSERT_LEAF off 243, blkref #0: rel 1663/15709/2673 blk 51
Btree 72/ 72 1216 0/01C422F0 0/01C422A8 INSERT_LEAF off 170, blkref #0: rel 1663/15709/2674 blk 61
Heap 203/203 1216 0/01C42338 0/01C422F0 INSERT off 6, blkref #0: rel 1663/15709/1247 blk 1
Btree 64/64 1216 0/01C42408 0/01C42338 INSERT_LEAF off 298, blkref #0: rel 1663/15709/2703 blk 2
Btree 72/ 72 1216 0/01C42448 0/01C42408 INSERT_LEAF off 3, blkref #0: rel 1663/15709/2704 blk 1
Heap 80/ 80 1216 0/01C42490 0/01C42448 INSERT off 36, blkref #0: rel 1663/15709/2608 blk 9
Btree 72/ 72 1216 0/01C424E0 0/01C42490 INSERT_LEAF off 243, blkref #0: rel 1663/15709/2673 blk 51
Btree 72/ 72 1216 0/01C42528 0/01C424E0 INSERT_LEAF off 97, blkref #0: rel 1663/15709/2674 blk 57
Heap 199/199 1216 0/01C42570 0/01C42528 INSERT off 2, blkref #0: rel 1663/15709/1259 blk 0
Btree 64/ 64 1216 0/01C426B8 0/01C42678 INSERT_LEAF off 217, blkref #0: rel 1663/15709/3455 blk 5
Heap 171/171 1216 0/01C426F8 0/01C426B8 INSERT off 53, blkref #0: rel 1663/15709/1249 blk 16
Btree 64/ 64 1216 0/01C427A8 0/01C426F8 INSERT_LEAF off 185, blkref #0: rel 1663/15709/2658 blk 25
Btree 64/ 64 1216 0/01C427E8 0/01C427A8 INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16
Heap 171/171 1216 0/01C42828 0/01C427E8 INSERT off 54, blkref #0: rel 1663/15709/1249 blk 16
Btree 72/ 72 1216 0/01C428D8 0/01C42828 INSERT_LEAF off 186, blkref #0: rel 1663/15709/2658 blk 25
Btree 64/ 64 1216 0/01C42920 0/01C428D8 INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16
51/96

Heap 171/171 1216 0/01C42960 0/01C42920 INSERT off 55, blkref #0: rel 1663/15709/1249 blk 16
Btree 72/ 72 1216 0/01C42A10 0/01C42960 INSERT_LEAF off 187, blkref #0: rel 1663/15709/2658 blk 25
Btree 64/ 64 1216 0/01C42A58 0/01C42A10 INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16
Heap 171/171 1216 0/01C42A98 0/01C42A58 INSERT off 1, blkref #0: rel 1663/15709/1249 blk 17
Btree 72/ 72 1216 0/01C42B48 0/01C42A98 INSERT_LEAF off 186, blkref #0: rel 1663/15709/2658 blk 25
Btree 64/ 64 1216 0/01C42B90 0/01C42B48 INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16
Heap 171/171 1216 0/01C42BD0 0/01C42B90 INSERT off 3, blkref #0: rel 1663/15709/1249 blk 17
Btree 72/ 72 1216 0/01C42C80 0/01C42BD0 INSERT_LEAF off 188, blkref #0: rel 1663/15709/2658 blk 25
Btree 64/ 64 1216 0/01C42CC8 0/01C42C80 INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16
Heap 171/171 1216 0/01C42D08 0/01C42CC8 INSERT off 5, blkref #0: rel 1663/15709/1249 blk 17
Btree 72/ 72 1216 0/01C42DB8 0/01C42D08 INSERT_LEAF off 186, blkref #0: rel 1663/15709/2658 blk 25
Btree 64/ 64 1216 0/01C42E00 0/01C42DB8 INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16
Heap 171/171 1216 0/01C42E40 0/01C42E00 INSERT off 30, blkref #0: rel 1663/15709/1249 blk 32
Btree 72/ 72 1216 0/01C42EF0 0/01C42E40 INSERT_LEAF off 189, blkref #0: rel 1663/15709/2658 blk 25
Btree 64/ 64 1216 0/01C42F38 0/01C42EF0 INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16
Heap 80/ 80 1216 0/01C42F78 0/01C42F38 INSERT off 25, blkref #0: rel 1663/15709/2608 blk 11
Btree 72/ 72 1216 0/01C42FC8 0/01C42F78 INSERT_LEAF off 131, blkref #0: rel 1663/15709/2673 blk 44
Btree 72/ 72 1216 0/01C43010 0/01C42FC8 INSERT_LEAF off 66, blkref #0: rel 1663/15709/2674 blk 46
Standby 42/ 42 1216 0/01C43058 0/01C43010 LOCK xid 1216 db 15709 rel 16384
Txn 405/405 1216 0/01C43088 0/01C43058 COMMIT 2019-03-04 07:42:23.165514 EST;... snapshot 2608
relcache 16384
Standby 50/ 50 0 0/01C43220 0/01C43088 RUNNING_XACTS nextXid 1217 latestCompletedXid 1216
oldestRunningXid 1217
pg_current_wal_lsn
--------------------
0/1C43258
(1 row)
Step 8: INSERT INTO abc VALUES('pkn');
52/96

Step 9: ./pg_waldump --path=/tmp/sd/pg_wal --start=0/1C43258
and use start LSN from step 7.
1663 → pg_default tablespace → noted in step 2
15709 → database postgres → noted in step 1
16384 → table abc → noted in step 5
rmgr Len
(rec/
tot)
tx lsn prev desc
Heap 59/59 1217 0/01C43258 0/01C43220 INSERT+INIT off 1, blkref #0: rel 1663/15709/16384 blk 0
Transaction 34/34 1217 0/01C43298 0/01C43258 COMMIT 2019-03-04 07:43:45.887511 EST
Standby 54/54 0 0/01C432C0 0/01C43298 RUNNING_XACTS nextXid 1218 latestCompletedXid 1216
oldestRunningXid 1217; 1 xacts: 1217
pg_current_wal_lsn
--------------------
0/1C432F8
(1 row)
Step 11: INSERT INTO abc VALUES('ujy');
Step 12: ./pg_waldump --path=/tmp/sd/pg_wal –start=0/1C432F8
and use start LSN as noted in step 10.
rmgr Len
(rec/
tot)
tx lsn prev desc
Heap 59/59 1218 0/01C432F8 0/01C432C0 INSERT off 2, blkref #0: rel 1663/15709/16384 blk 0
Transaction 34/34 1218 0/01C43338 0/01C432F8 COMMIT 2019-03-04 07:44:25.449151 EST
Standby 50/50 0 0/01C43360 0/01C43338 RUNNING_XACTS nextXid 1219 latestCompletedXid 1218
oldestRunningXid 1219
53/96

Step 13: Check the actual tuples in the WAL segment files.
---------+---------------------------------------------------+----------------+
Offset | Hex Bytes | ASCII chars |
---------+---------------------------------------------------+----------------+
00000060 | 3b 00 00 00 c3 04 00 00 28 00 40 02 00 00 00 00 |;.......(.@.....|
00000070 | 00 0a 00 00 ec 28 75 6e 00 20 0a 00 7f 06 00 00 |.....(un. ......|
00000080 | 5d 3d 00 00 00 40 00 00 00 00 00 00 ff 03 01 00 |]=...@..........|
00000090 | 02 08 18 00 09 70 6b 6e 03 00 00 00 00 00 00 00 |.....pkn........|
000000a0 | 22 00 00 00 c3 04 00 00 60 00 40 02 00 00 00 00 |".......`.@.....|
000000b0 | 00 01 00 00 dd 4c 87 04 ff 08 e4 73 44 e7 41 26 |.....L.....sD.A&|
000000c0 | 02 00 00 00 00 00 00 00 32 00 00 00 00 00 00 00 |........2.......|
000000d0 | a0 00 40 02 00 00 00 00 10 08 00 00 9e 01 36 88 |..@...........6.|
000000e0 | ff 18 00 00 00 00 00 00 00 00 00 03 00 00 c4 04 |................|
000000f0 | 00 00 c4 04 00 00 c3 04 00 00 00 00 00 00 00 00 |................|
00000100 | 3b 00 00 00 c4 04 00 00 c8 00 40 02 00 00 00 00 |;.........@.....|
00000110 | 00 0a 00 00 33 df b4 71 00 20 0a 00 7f 06 00 00 |....3..q. ......|
00000120 | 5d 3d 00 00 00 40 00 00 00 00 00 00 ff 03 01 00 |]=...@..........|
00000130 | 02 08 18 00 09 75 6a 79 04 00 00 00 00 00 00 00 |.....ujy........|
00000140 | 22 00 00 00 c4 04 00 00 00 01 40 02 00 00 00 00 |".........@.....|
00000150 | 00 01 00 00 96 2e 96 a6 ff 08 d8 f3 79 ed 41 26 |............y.A&|
00000160 | 02 00 00 00 00 00 00 00 32 00 00 00 00 00 00 00 |........2.......|
00000170 | 40 01 40 02 00 00 00 00 10 08 00 00 eb 6b 95 36 |@.@..........k.6|
00000180 | ff 18 00 00 00 00 00 00 00 00 00 03 00 00 c5 04 |................|
00000190 | 00 00 c5 04 00 00 c4 04 00 00 00 00 00 00 00 00 |................|
000001a0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
54/96

9.4.6 Overview of Replication Options based on WAL
Continuous WALArchiving
Copying WAL files as they are generated, into any location other than pg_wall sub-
directory for the purpose of archiving them is called WAL archiving. To archive a script
provided by the user is invoked by PostgreSQL each time a WAL file is generated. The
script can use scp command to copy the file to one or more locations. The location can
be an NFS mount. Once archived the WAL segment files can be used to recover
database to any specified point in time.
Log Shipping Based Replication - File Level
The process of copying log files to another PostgreSQL server for the purpose of
creating another standby server by replaying WAL files is called Log Shipping. This
server is configured to be in recovery mode. The sole purpose of this server is to apply
any new WAL files that they arrive. This second server then, becomes a warm backup of
the primary PostgreSQL server also termed as standby. The standby can also be
configured to be a read replica, where it can also serve read-only queries. This is called a
hot standby.
Log Shipping Based Replication - Block Level
Streaming replication improves the log shipping process. Instead of waiting for the
WAL switch the records are sent as and when they are generated thus improving
replication delay. The second improvement is that the standby server will connect to the
primary server over the network, using a replication protocol. The primary server can
then send WAL records directly over this connection without having to rely on scripts
provided by the end user.
How long the primary should retain WAL segment files?
Without any streaming replication clients, the server can discard/recycle the WAL
segment file once the archive script reports success, unless they are not required for
crash recovery.
In the presence of standby clients though, there is a problem : The server needs to keep
around WAL files long enough for as long as the slowest standby needs them. If the
standby, that was taken down for a while, comes back online and asks the primary for a
WAL file that the primary no longer has, then the replication fails with an error similar
to:
ERROR: requested WAL segment 00000001000000010000002D has
55/96

already been removed
The primary should therefore keep track of how far behind the standby is, and to not
delete/recycle WAL files that any standbys still need. This feature is provided through
replication slots.
Each replication slot has a name which is used to identify the slot. Each slot is
associated with:
(a) The oldest WAL segment file required by the consumer of the slot. WAL segment
files later than this are not deleted/recycled during checkpoints.
(b) The oldest transaction ID required to be retained by the consumer of the slot. Rows
needed by any transactions later than this are not deleted by vacuum.
56/96

9.5 Log Shipping Based - File Level
9.5.1 Setup
The setup consists of two CentOS 7 machines on which PostgreSQL version 10.7 is
installed. Both systems are loosely coupled, sharing only the WAL archive.
9.5.2 Configuring Replication using Log Shipping
Step 1: Disable and stop firewall on both the machines
Step 2: Create a folder on standby that will archive WALs received form the server
sudo mkdir /opt/PostgreSQL/10/from_primary
sudo chown postgres:postgres /opt/PostgreSQL/10/from_primary
In /etc/passwd change home directory of user postgres to
/opt/PostgreSQL/10/from_primary
Step 3: Change home directory of postgres user on Primary
sudo mkdir /opt/PostgreSQL/10/home
sudo chown postgres:postgres /opt/PostgreSQL/10/home/
57/96
Primary
Standby
PostgreSQLPostgreSQL
WAL Archive
archive_command
Copies WAL files from
pg_wal to Archive
restore_command
Copies WAL files from
Archive to pg_wal

In /etc/passwd change home directory of user postgres to /opt/PostgreSQL/10/home/
Step 4: Configure password-less ssh & scp between Primary and Standby
Login as postgres user on Primary
su - postgres
Password:
Last login: Fri Feb 22 05:54:11 EST 2019 on pts/0
Generate public – private key pair on Primary
-bash-4.2$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/opt/PostgreSQL/10/home/.ssh/id_rsa):
Created directory '/opt/PostgreSQL/10/home/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /opt/PostgreSQL/10/home/.ssh/id_rsa.
Your public key has been saved in /opt/PostgreSQL/10/home/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:jqjjYf8OcKp4tgtfPcLWG6liAot660/4CLrIRq01BqI
postgres@localhost.localdomain
The key's randomart image is:
+---[RSA 2048]----+
| |
| |
| |
|.. |
|o + . S |
|E. @ + + |
|*.O X B . |
|OOBX + + |
|X@XB=o+ |
+----[SHA256]-----+
58/96

Copy the key to standby and add it to authorized_keys
-bash-4.2$ ssh-copy-id -i ~/.ssh/id_rsa.pub postgres@172.16.214.165
/bin/ssh-copy-id: INFO: Source of key(s) to be installed:
"/opt/PostgreSQL/10/home/.ssh/id_rsa.pub"
The authenticity of host '172.16.214.165 (172.16.214.165)' can't be established.
ECDSA key fingerprint is SHA256:VsSASWJWx6v7CvSbH8hjnzX6AFBn0vNimsAj0Wcih84.
ECDSA key fingerprint is MD5:ad:0c:42:f1:88:3f:f4:f9:8f:59:bf:e4:85:dc:15:b6.
Are you sure you want to continue connecting (yes/no)?
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out
any that are already installed
The authenticity of host '172.16.214.165 (172.16.214.165)' can't be established.
ECDSA key fingerprint is SHA256:VsSASWJWx6v7CvSbH8hjnzX6AFBn0vNimsAj0Wcih84.
ECDSA key fingerprint is MD5:ad:0c:42:f1:88:3f:f4:f9:8f:59:bf:e4:85:dc:15:b6.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are
prompted now it is to install the new keys
postgres@172.16.214.165's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'postgres@172.16.214.165'"
and check to make sure that only the key(s) you wanted were added.
Test password less SSH
-bash-4.2$ ssh postgres@172.16.214.165
Last login: Fri Feb 22 05:53:39 2019
-bash-4.2$ exit
logout
Connection to 172.16.214.165 closed.
-bash-4.2$
Test password less SCP
From Primary try this
su - postgres
-bash-4.2$scp 1.txt postgres@172.16.214.165:/opt/PostgreSQL/10/from_primary
1.txt
100% 3446 3.3MB/s 00:00
59/96

Check on standby
su - postgres
-bash-4.2$ pwd
-bash-4.2$ ls -l
total 4
-rw-r--r--. 1 postgres postgres 3446 Feb 22 08:17 1.txt
Step 5: Update the postgresql.conf file on primary
wal_level = replica
archive_mode = on
archive_command = 'if ssh postgres@172.16.214.165 test ! -f
"/opt/PostgreSQL/10/from_primary/%f" ; then scp %p
postgres@172.16.214.165:/opt/PostgreSQL/10/from_primary/; fi'
The archive_command will be executed every time a new WAL file is generated. This
archive command uses two place holders
%p : The complete path of the WAL file along with its name
%f : The name of the WAL file
This command tests to make sure that the WAL is not present on the standby and if it is
not present, it copies the WAL file to the archive folder.
Step 6: Create database & tables on primary
./createdb test_db
INSERT INTO student VALUES(1, 'Edward', 'Main Campus');
INSERT INTO student VALUES(2, 'Linda', 'Girls Hostel');
INSERT INTO student VALUES(3, 'Jason', 'Boys Hostel');
INSERT INTO teacher VALUES(1, 'Gary', 'Physics');
INSERT INTO teacher VALUES(2, 'Karen', 'Maths');
INSERT INTO teacher VALUES(3, 'Carol', 'History');
60/96

Step 7: Take base backup using the command
./pg_basebackup --pgdata=/opt/PostgreSQL/10/for_standby/ --format=p
--write-recovery-conf --checkpoint=fast --label=for_test --progress
--verbose --host=localhost --port=5432 --username=postgres
Password:
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
32578/32578 kB (100%), 1/1 tablespace
pg_basebackup: write-ahead log end point: 0/20000F8
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: base backup completed
--pgdata : Target folder for the base backup
--format : plain
Step 8: Modify the recovery.conf in the base backup
standby_mode = 'on'
restore_command = 'cp "/opt/PostgreSQL/10/from_primary/%f" "%p"'
The restore_command is invoked by the standby server periodically. Our restore
command copies the newly arrived WAL file to the pg_wal folder of the stand by server.
Step 9: Transfer the base backup to the standby server
sudo mkdir /opt/PostgreSQL/10/bb_data/
sudo mv /tmp/for_standby.tar.gz /opt/PostgreSQL/10/bb_data/
sudo chown postgres:postgres /opt/PostgreSQL/10/bb_data/
sudo chown postgres:postgres /opt/PostgreSQL/10/bb_data/for_standby.tar.gz
sudo chmod 700 /opt/PostgreSQL/10/bb_data/
sudo chmod 700 /opt/PostgreSQL/10/bb_data/for_standby
Step 10: Unzip base backup
su – postgres
cd bb_data/
-bash-4.2$ ls -l
total 3840
61/96

Step 13: Restart primary and create a few new tables in the test database
CREATE TABLE test_tab AS SELECT * FROM GENERATE_SERIES(1, 100000) AS id;
SELECT 100000
CREATE TABLE another_tab AS SELECT * FROM GENERATE_SERIES(1, 100000) AS id;
SELECT 100000
Step 14: Force a WAL file switch
test_db=# select pg_switch_wal();
pg_switch_wal
---------------
0/3C58940
(1 row)
Step 15: Check WAL file on standby
-bash-4.2$ pwd
-bash-4.2$ ls -l
total 16388
-rw-------. 1 postgres postgres 16777216 Feb 23 02:08 000000010000000000000003
-rw-r--r--. 1 postgres postgres 3446 Feb 22 08:17 1.txt
-bash-4.2$ pwd
/opt/PostgreSQL/10/bb_data/for_standby/pg_wal
-bash-4.2$ ls -l
total 32768
drwx------. 2 postgres postgres 43 Feb 23 02:08 archive_status
Step 16: Check the tables on the standby
test_db=# d+
List of relations
Schema | Name | Type | Owner | Size | Description
--------+-------------+-------+----------+---------+-------------
public | another_tab | table | postgres | 3568 kB |
63/96

9.6 Log Shipping Based - Block Level
9.6.1 Physical Streaming Replication
In streaming replication the standby server connects to the primary server and receives
WAL records using a replication protocol. This provides two advantages:
1. The standby server does not need to wait for the WAL file to fill up, hence replication
lag is improved.
2. The dependency on the user provided script and an intermediate shared storage
between the servers is removed.
9.6.2 WAL Sender & WAL Receiver
A process called WAL receiver, running on the standby server connects, using the
connection details provided in primary_conninfo parameter of recovery.conf, to the
primary server using a TCP/IP connection. In the primary server another process called
WAL sender, is in charge of sending the WAL records to the standby server as and
when they are generated. WAL receiver saves the WAL records in WAL as if they were
generated by client activity of locally connected clients. Once WAL records reach WAL
segment files the stand by server constantly keeps replaying the WAL so that standby
and primary are up to date.
65/96
PostgreSQL Primary
Primary
Standby
PostgreSQL Standby
WAL
Sender
WAL
Receiver
WAL Records
WAL Records
W1 W2 W3 W4
WAL Records
W1 W2 W3 W4

9.6.3 WAL Streaming Protocol Details
66/96
Start up Request
What is server's authentication scheme?
While we are asking this question please note
Parameter Name Parameter Value
user postgres
database replication
replication true ← Instructs to start WAL Sender process for this client
application_name walreceiver
Server is expecting password in MD5 format
52 00 00 00 0c 00 00 00 05 04 43 16 6a
Authentication Request Length md5 password salt generated by server
Password response
70 00 00 00 0b md5b094d71396249f3ca84a23b86d4ee7b9
Password response Length MD5 Password terminated by null
MD5 password is computed by md5(md5(password || username), salt)
Standby Server
WAL Receiver
Primary Server
WAL Sender
Authentication Reply
52 00 00 00 08 00 00 00 00
Authentication Reply Length User authenticated
Status Parameters
'S'|Length 4 bytes|Param Name | Param Value
Simple Query : IDENTIFY_SYSTEM
Query Response
WAL Receiver verifies that the systemid in response is same as in base backup
systemid timeline logpos dbname
6661510093306984809 1 0/3000140
Simple Query : START_REPLICATION SLOT "node_a_slot" 0/3000000 TIMELINE 1
WAL Data as CopyData messages
'd'|Length 4 bytes| WAL Data
Query Response
Server responds with CopyBothResponse ‘W’, and starts to stream WAL
'W'|Length 4 bytes|COPY format is Textual | Copy Data has 0 columns

9.6.4 Setup
version 10.7 is installed.
9.6.5 Configuring PostgreSQL Replication using WAL Streaming
Step 1: Disable and stop firewall on both the machines
Step 2: On primary allow replication connections & connections from the same
network. Modify pg_hba.conf.
Local all all md5
host all all 172.16.214.167/24 md5
host all all ::1/128 md5
local replication all md5
host replication all 172.16.214.167/24 md5
host replication all ::1/128 md5
Step 3: On primary edit postgresql.conf to modify the following parameters
max_wal_senders = 10
wal_level = replica
max_replication_slots = 10
synchronous_commit = on
synchronous_standby_names = '*'
listen_addresses = '*'
67/96

Step 4: Start the primary server
./postgres -D ../pr_data -p 5432
Step 5: Take base backup to boost strap the stand by server
./pg_basebackup
--pgdata=/tmp/sb_data/
--format=p
--write-recovery-conf
--checkpoint=fast
--label=mffb
--progress
--verbose
--host=172.16.214.167
--port=5432
--username=postgres
Step 6: Check the base backup label file
START WAL LOCATION: 0/2000028 (file 000000010000000000000002)
CHECKPOINT LOCATION: 0/2000060
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2019-02-24 05:25:30 EST
LABEL: mffb
Step 7: In the base backup, add the following line in the recovery.conf
primary_slot_name = 'node_a_slot'
Step 8: Check the /tmp/sb_data/recovery.conf file
standby_mode = 'on'
primary_conninfo = 'user=enterprisedb
password=abc123
host=172.16.214.167
port=5432
68/96

sslmode=prefer
sslcompression=1
krbsrvname=postgres
target_session_attrs=any'
primary_slot_name = 'node_a_slot'
Step 9: Connect to the primary server and issue this command
edb=# SELECT * FROM pg_create_physical_replication_slot('node_a_slot');
slot_name | xlog_position
-------------+---------------
node_a_slot |
(1 row)
edb=# SELECT slot_name, slot_type, active FROM pg_replication_slots;
slot_name | slot_type | active
-------------+-----------+--------
node_a_slot | physical | f
(1 row)
Step 10: Transfer the base backup to the standby server
scp /tmp/sb_data.tar.gz abbas@172.16.214.166:/tmp
sudo mv /tmp/sb_data /opt/PostgreSQL/10/
sudo chown postgres:postgres /opt/PostgreSQL/10/sb_data/
sudo chown -R postgres:postgres /opt/PostgreSQL/10/sb_data/
sudo chmod 700 /opt/PostgreSQL/10/sb_data/
Step 11: Start the stand-by server
./postgres -D ../sb_data/ -p 5432
The primary will show this in log
LOG: standby "walreceiver" is now a synchronous standby with priority 1
69/96

The standby will show
LOG: database system was interrupted; last known up at 2018-10-24 15:49:55
LOG: entering standby mode
LOG: redo starts at 0/3000028
LOG: consistent recovery state reached at 0/30000F8
LOG: started streaming WAL from primary at 0/4000000 on timeline 1
Step 12: Connect to primary server and issue some simple commands
-bash-4.2$ ./edb-psql -p 9666 edb
Password:
psql.bin (9.6.10.17)
create table abc(a int, b varchar(250));
insert into abc values(1,'One');
insert into abc values(2,'Two');
insert into abc values(3,'Three');
Step 13: Check data on slave
./psql -p 5432 -U postgres postgres
psql.bin (10.7)
postgres=# select * from abc;
a | b
---+-------
1 | One
2 | Two
3 | Three
(3 rows)
70/96

Step 1: Crash the primary server
Step 2: Promote the stand by server
./pg_ctl promote -D ../sb_data/
server promoting
Step 3: Connect to the promoted stand by server and insert a row
-bash-4.2$ ./edb-psql -p 9777 edb
Password:
psql.bin (9.6.10.17)
edb=# insert into abc values(4,'Four');
71/96

9.7 Logical Decoding Based
9.7.1 What is Logical Replication
Physical Streaming replication as described in section 9.6 creates byte-by-byte read-only
replica of the primary server. The replica contains all databases, tables, roles,
tablespaces etc. With streaming replication we get all or nothing. What if we want a
replica of only a single table? This is where Logical Replication comes into play.
Logical replication can replay DML operations happening on a subset of tables in a
primary server on a stand-by server by:
*) Logically decoding WAL records
*) Streaming them over to stand-by server
*) Apply them to the table in the standby server in the correct transnational order
72/96

9.7.2 Comparison of Physical and Logical Replication
Feature Physical
Replication
Logical
Replication
Replica will be read only Yes No
Replica will contain every thing Yes No
Replica can contain a subset of data in the primary No Yes
Triggers will fire on DML operations No Yes
Will work across different PostgreSQL versions No Yes
Will work across different Operating Systems No Yes
Table in the standby can have extra columns, indexes or security No Yes
DML operations are possible on tables in stand-by No Yes
Will work even if table has no primary key Yes No
DDL commands are replicated to standby Yes No
Sequence data is replicated to standby Yes No
TRUNCATE command is replicated to the standby Yes No
Large objects are replicated to the standby Yes No
Constraint validation is performed on standby No Yes
Standby needs base backup of the primary server Yes No
DML operations can be filtered before sending to standby No Yes
Tables have to be created with the same name on standby manually No Yes
73/96

9.7.3 Publication & Subscription
Logical replication defines two entities: A publisher and a subscriber. A publisher is a
node that defines a certain group of tables, called a publication, to which a subscriber
can subscribe, by creating a subscription, to receive the changes to that particular group
of tables.
74/96
PostgreSQL
Publisher Subscriber
PostgreSQL
Apply
Decoded &
Filtered WAL
Records
WAL Records
W1 W2 W3 W4
WAL
Records
W1 W2 W3 W4
Logical
Decoding
Plugin
WAL Sender
Tables
Tables
Publication
Subscription

9.7.4 Logical Decoding Plugin
In order to transform WAL internal representation to any format that can be used by the
client, a plugin can be installed into PostgreSQL. The plugin is supposed to implement
well defined call back functions which are called by the logical decoding framework at
appropriate time to allow the plugin to perform the format conversion. A plugin for
example can convert WAL records to SQL statements:
9.7.5 Logical replication slots
Logical replication slots are supposed to be consumed by logical replication. Physical
replication slots, being physical, work at the cluster level and are used to stream cluster
level changes to the standby. Logical replication slots on the other hand stream sequence
of changes from a single database. Each logical slot needs a decoding plugin that will be
used to transform the WAL records to a format required by the consumer.
9.7.6 test_decoding and pg_recvlogical
test_decoding is an example decoding plugin that is provided with PostgreSQL and
pg_recvlogical is an example of utility that can be used to receive changes from a
logical replication slot. Lets see them both in action:
Step 1: Make the following changes in postgresq.conf on primary server
wal_level = logical
log_connections = on
log_disconnections = on
log_statement = 'all'
log_replication_commands = on
Step 2: Make the following changes in pg_hba.conf on the primary server
host replication all 172.16.214.167/24 trust
75/96
WAL Records
W1 W2 W3 W4 Plugin
SQL Statements
INSERT INTO tab VALUES(1,2)
UPDATE tab set b = 10;
……

Step 3: Create a database on the primary server
./createdb -p 7654 mydb -U postgres
Step 4: Connect the client with the primary server
./psql -p 7654 mydb -U postgres
psql.bin (10.7)
mydb=# SELECT pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
0/16998C0
(1 row)
mydb=#SELECT * FROM
pg_create_logical_replication_slot('my_slot', 'test_decoding');
slot_name | lsn
-----------+-----------
my_slot | 0/1699930
(1 row)
mydb=# select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary |
-----------+---------------+-----------+--------+----------+-----------+
my_slot | test_decoding | logical | 16384 | mydb | f |
active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
--------+------------+------+--------------+-------------+---------------------
f | | | 556 | 0/16998F8 | 0/1699930
This replication slot is asking
1) VACUUM should not remove catalog tuples deleted by any transaction later than 556.
2) The consumer of this replication slot needs all segments including and after 0/16998F8
3) The consumer of this logical replication slot has confirmed receiving data upto and before
0/1699930. Most of the time a slot will require older WAL (i.e. restart_lsn) than the confirmed
position (i.e. confirmed_flush_lsn). The flush position is just a marker saved by the consumer,
76/96

the actual WALs required is always determined by restart_lsn. If this is the first slot being
created in the cluster then the restart_lsn will be the current WAL LSN at the time when this slot
was created.
Step 5: On stand by start pg_recvlogical utility
./pg_recvlogical
--slot=my_slot
--verbose
-d mydb
-h 172.16.214.167
-p 7654
-U postgres
--start
-f -
pg_recvlogical:starting log streaming at 0/0 (slot my_slot)
pg_recvlogical:streaming initiated
pg_recvlogical:confirming write up to 0/0, flush to 0/0 (slot my_slot)
Step 6: On Primary create a table
create table test(a varchar(10));
Step 7: Check the output of pg_recvlogical
BEGIN 556
COMMIT 556
pg_recvlogical: confirming write up to 0/16B0580, flush to 0/16B0580 (slot my_slot)
Step 8: Insert a few rows in the table on primary
mydb=# insert into test values('qaz');
mydb=# insert into test values('wsx');
77/96
WAL_1 WAL_2 WAL_3 WAL_4 WAL_5 WAL_6
restart_lsn confirmed_flush_lsn
Required WALsWALs Not Required

mydb=# insert into test values('edc');
BEGIN 557
table public.test: INSERT: a[character varying]:'qaz'
COMMIT 557
BEGIN 558
table public.test: INSERT: a[character varying]:'wsx'
COMMIT 558
pg_recvlogical: confirming write up to 0/16B06D0, flush to 0/16B06D0 (slot my_slot)
BEGIN 559
table public.test: INSERT: a[character varying]:'edc'
COMMIT 559
Step 10: Update rows in the table on primary
update test set a = 'tgb';
BEGIN 560
table public.test: UPDATE: a[character varying]:'tgb'
COMMIT 560
pg_recvlogical: confirming write up to 0/16B08B8, flush to 0/16B08B8 (slot my_slot)
Step 12: Delete rows in the table on primary
delete from test;
BEGIN 561
table public.test: DELETE: (no-tuple-data)
COMMIT 561
78/96

9.7.7 Setup
version 10.7 is installed.
9.7.8 Configuring PostgreSQL Replication using Logical Decoding
Step 1:Disable and stop firewall on both the machines
Step 2: On primary allow replication connections & connections from the same
network. Modify pg_hba.conf.
Local all all trust
host all all ::1/128 trust
local replication all trust
host replication all 172.16.214.167/24 trust
host replication all ::1/128 trust
Step 3: On publisher edit postgresql.conf to modify the following parameters
max_wal_senders = 10
wal_level = logical
log_replication_commands = on
Step 4: Start the publisher server
./postgres -D /tmp/data/ -p 5432
80/96

Step 5: Create a database on the publisher server’s
./createdb -p 5432 -U postgres src_db
Step 6: Connect to the publisher server and create a table with some rows
create table t1 (id integer primary key, val text);
create user replicant with replication;
grant select on t1 to replicant;
insert into t1 (id, val) values (10, 'ten'),
(20, 'twenty'),
(30, 'thirty');
Step 7: Create the publication on the publisher server
create publication pub1 for table t1;
Step 8: Start the subscriber
./postgres -D /tmp/data/ -p 5432
Step 9: Create the database on the subscriber
./createdb -p 5432 -U postgres dst_db
Step 10: Connect to the subscriber server and create the table with an additional
column
create table t1 (id integer primary key, val text, val2 text);
Step 11: Create the subscription
create subscription sub1
connection 'host=172.16.214.167
port=5432
dbname=src_db
user=replicant'
publication pub1;
81/96

Step 12: Check the data in the subscribed table
dst_db=# select * from t1;
id | val | val2
----+--------+------
10 | ten |
20 | twenty |
30 | thirty |
(3 rows)
82/96

9.7.9 Logical Replication Protocol Details
83/96
Start up Request
user replicant
database src_db
replication database ← The connection goes into logical replication mode
application_name sub1
Subscriber Publisher
52 00 00 00 08 00 00 00 00
Status Parameters
SELECT DISTINCT t.schemaname, t.tablename
FROM pg_catalog.pg_publication_tables t
WHERE t.pubname IN ('pub1')
schemaname | tablename
------------+------------
public | t1
CREATE_REPLICATION_SLOT "sub1" LOGICAL pgoutput NOEXPORT_SNAPSHOT
slot_name | consistent_point | snapshot_name | output_plugin
----------+------------------+---------------+--------------
sub1 | 0/16B9EA8 | | pgoutput
Disconnect

84/96
Start up Request
user replicant
database src_db
application_name sub1
52 00 00 00 08 00 00 00 00
Status Parameters
systemid | timeline | xlogpos | dbname
--------------------+-----------+-----------+--------------
6664876364497978284 | 1 | 0/16B9EA8 | src_db
START_REPLICATION SLOT "sub1" LOGICAL 0/0
(proto_version '1', publication_names '"pub1"')
Decoded WAL Data as CopyData messages
'd'|Length 4 bytes| Copy Data
Server responds with CopyBothResponse ‘W’
'W'|Length 4 bytes|COPY format is Textual | Copy Data has 0 columns
Simple Query : IDENTIFY_SYSTEM

85/96
Start up Request
user replicant
database src_db
application_name sub1_16393_sync_16385
52 00 00 00 08 00 00 00 00
Status Parameters
Command Completed and Transaction started
CREATE_REPLICATION_SLOT "sub1_16393_sync_16385"
TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
BEGIN READ ONLY ISOLATION LEVEL REPEATABLE READ
slot_name | consistent_point | snapshot_name | output_plugin
-----------------------+------------------+---------------+--------------
sub1_16393_sync_16385 | 0/16B9EE0 | | pgoutput
SELECT c.oid, c.relreplident FROM pg_catalog.pg_class c
INNER JOIN pg_catalog.pg_namespace n ON (c.relnamespace = n.oid)
WHERE n.nspname = 'public' AND c.relname = 't1' AND c.relkind = 'r'
oid | relreplident
--------+-------------------
16385 | d (primary key)
Cont. on next page

86/96
SELECT a.attname, a.atttypid, a.atttypmod, a.attnum = ANY(i.indkey)
FROM pg_catalog.pg_attribute a LEFT JOIN pg_catalog.pg_index i
ON (i.indexrelid = pg_get_replica_identity_index(16385))
WHERE a.attnum > 0::pg_catalog.int2 AND NOT a.attisdropped
AND a.attrelid = 16385 ORDER BY a.attnum
attname | atttypid | atttypmod | ?column?
--------+-----------+-----------+----------
id | 23 | -1 | t
val | 25 | -1 | f
COPY public.t1 TO STDOUT
10 ten
20 twenty
30 thirty
COMMIT
Transaction Complete
Disconnect

9.8 Statement Based
9.8.1 Introduction to pgpool-II
PgPool-II is a middle-ware system that sits between PostgreSQL servers and clients to
provide the following features:
• Connection Pooling
• Replication & Load Balancing
• Automated Failover
We are going to focus on the replication feature provided by pgPool-II
When used to replicate data pgPool receives the INSERT command from the client and
sends the command enclosed in a BEGIN-COMMIT block to all the PostgreSQL servers
under it.
87/96
Client Application
PgPool II
PostgreSQL PostgreSQL
INSERT INTO my_tab VALUES(1, ‘One’);
BEGIN
COMMIT
BEGIN
COMMIT

9.8.2 Setup
The setup consists of two CentOS 7 machines on which PostgreSQL 10.7 installed. On
one of the machines pgpool-II version 3.6.15 (subaruboshi) is also installed.
9.8.3 Configuring PostgreSQL replication using pgpool-II
Step 1: Modify the postgresql.conf files of both the PostgreSQL instances
logging_collector = off
Step 2: Modify the pg_hba.conf files of both the PostgreSQL instances
Step 3: Modify the pgpool.conf
cd /opt/edb/pgpool3.6/etc
cp pgpool.conf.sample pgpool.conf && vim pgpool.conf
backend_hostname0 = '172.16.214.173'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/data0'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_hostname1 = '172.16.214.172'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/data1'
backend_flag1 = 'ALLOW_TO_FAILOVER'
replication_mode = on
fail_over_on_backend_error = on
88/96

Step 4: Generate md5 for the password
/opt/edb/pgpool3.6/bin/pg_md5 abc123
e99a18c428cb38d5f260853678922e03
Step 5: Modify the pcp.conf
cp pcp.conf.sample pcp.conf && vim pcp.conf
postgres:e99a18c428cb38d5f260853678922e03
Step 6: Start both the servers
./postgres -D ../data
./postgres -D ../data
Step 7: Start pgpool
./pgpool -n
-f /opt/edb/pgpool3.6/etc/pgpool.conf
-F /opt/edb/pgpool3.6/etc/pcp.conf
Step 8: Create database
./createdb -p 9999 test_pgp -U postgres
Note we are connecting with pgPool
Step 9: Check database server logs
For First Server (172.16.214.173)
[104386] LOG: connection received: host=172.16.214.173 port=57524
[104386] LOG: connection authorized: user=postgres database=postgres
[104386] LOG: statement: SELECT pg_catalog.set_config('search_path', '', false)
[104386] LOG: statement: CREATE DATABASE test_pgp;
[104386] LOG: statement: DISCARD ALL
[104386] LOG: disconnection: session time: 0:00:00.787 user=postgres database=postgres
host=172.16.214.173 port=57524
For Second Server (172.16.214.172)
[12363] LOG: connection authorized: user=postgres database=postgres
[12363] LOG: statement: CREATE DATABASE test_pgp;
[12363] LOG: statement: DISCARD ALL
[12363] LOG: disconnection: session time: 0:00:00.704 user=postgres database=postgres
89/96

host=172.16.214.173 port=42138
Step 10: Create a new table
./psql -p 9999 test_pgp -U postgres
Note we are connecting with pgPool
create table my_tab(a int primary key, b varchar(10));
Step 11: Check server log
[107539] LOG: connection authorized: user=postgres database=test_pgp
[107539] LOG: statement: BEGIN
[107539] LOG: statement: create table my_tab(a int primary key, b varchar(10));
[107539] LOG: statement: COMMIT
[12400] LOG: connection authorized: user=postgres database=test_pgp
[12400] LOG: statement: create table my_tab(a int primary key, b varchar(10));
Step 12: Insert rows in the table
insert into my_tab values(1,'One');
insert into my_tab values(2,'Two');
insert into my_tab values(3,'Three');
[107539] LOG: statement:
SELECT count(*) from
( SELECT has_function_privilege
( 'postgres', 'pg_catalog.to_regclass(cstring)','execute' )
WHERE EXISTS
( SELECT * FROM pg_catalog.pg_proc AS p WHERE p.proname = 'to_regclass' )
90/96

) AS s
SELECT count(*) FROM pg_catalog.pg_attrdef AS d,
pg_catalog.pg_class AS c
WHERE d.adrelid = c.oid AND d.adsrc ~ 'nextval' AND
c.oid = pg_catalog.to_regclass('"my_tab"')
SELECT attname, d.adsrc as default_value,
coalesce
(
(
d.adsrc LIKE '%now()%' OR d.adsrc LIKE '%''now''::text%' OR
d.adsrc LIKE '%CURRENT_TIMESTAMP%' OR d.adsrc LIKE '%CURRENT_TIME%' OR
d.adsrc LIKE '%CURRENT_DATE%' OR d.adsrc LIKE '%LOCALTIME%' OR
d.adsrc LIKE '%LOCALTIMESTAMP%'
) AND
(
a.atttypid = 'timestamp'::regtype::oid OR
a.atttypid = 'timestamp with time zone'::regtype::oid OR
a.atttypid = 'date'::regtype::oid OR a.atttypid = 'time'::regtype::oid OR
a.atttypid = 'time with time zone'::regtype::oid
) ,
false
)
FROM pg_catalog.pg_class c,
pg_catalog.pg_attribute a LEFT JOIN
pg_catalog.pg_attrdef d ON
(a.attrelid = d.adrelid AND a.attnum = d.adnum)
WHERE c.oid = a.attrelid AND a.attnum >= 1 AND
a.attisdropped = 'f' AND c.oid = to_regclass('"my_tab"')
ORDER BY a.attnum
[107539] LOG: statement: insert into my_tab values(1,'One');
[107539] LOG: statement: insert into my_tab values(2,'Two');
[107539] LOG: statement: insert into my_tab values(3,'Three');
91/96
What are attribute names?
What are their default values if any?
Does any column has default value now()?
attname | default_value | coalesce
---------+---------------+----------
a | | f
b | | f

[12400] LOG: statement: insert into my_tab values(1,'One');
[12400] LOG: statement: insert into my_tab values(2,'Two');
[12400] LOG: statement: insert into my_tab values(3,'Three');
Step 14: Try an update statement
UPDAYTE my_tab SET b = 'threee' WHERE b like 'Three';
[107539] LOG: statement: update my_tab set b = 'threee' where b like 'Three';
[12400] LOG: statement: update my_tab set b = 'threee' where b like 'Three';
Step 16: Select data from the table
test_pgp=# select * from my_tab;
a | b
---+--------
1 | One
2 | Two
3 | threee
(3 rows)
92/96

Step 17: Check server log and observe load balancing
[107539] LOG: statement: select * from my_tab;
(Nothing)
Step 18: Check node status
test_pgp=# show pool_nodes;
node_id | hostname | port | status | lb_weight |
---------+----------------+------+--------+-----------+
0 | 172.16.214.173 | 5432 | up | 0.500000 |
1 | 172.16.214.172 | 5432 | up | 0.500000 |
role | select_cnt | load_balance_node | replication_delay
--------+------------+-------------------+-------------------
master | 5 | true | 0
slave | 0 | false | 0
(2 rows)
Step 19: Stop the master server i.e. 172.16.214.173
Step 20: Run the select query
FATAL: unable to read data from DB node 0
DETAIL: EOF encountered with backend
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
93/96

Step 21: Run the select query again
a | b
---+-------
1 | One
2 | Two
3 | threee
(3 rows)
Step 22: Check node status
test_pgp=# show pool_nodes;
node_id | hostname | port | status | lb_weight |
---------+----------------+------+--------+-----------+
0 | 172.16.214.173 | 5432 | down | 0.500000 |
1 | 172.16.214.172 | 5432 | up | 0.500000 |
role | select_cnt | load_balance_node | replication_delay
--------+------------+-------------------+-------------------
slave | 5 | false | 0
master | 1 | true | 0
(2 rows)
Step 23: Create another table and insert a row in it
create table time_test(a timestamp);
insert into time_test values(now());
Step 24: Check server log and observe how pgPool translated now() to real time
[107539] LOG: statement: create table time_test(a timestamp);
[107539] LOG: statement: SELECT now()
94/96

[107539] LOG: statement: INSERT INTO "time_test" VALUES ("pg_catalog"."timestamptz"
('2019-03-14 04:36:22.324674-04'::text))
[12400] LOG: statement: create table time_test(a timestamp);
[12400] LOG: statement: INSERT INTO "time_test" VALUES ("pg_catalog"."timestamptz"
('2019-03-14 04:36:22.324674-04'::text))
95/96

9.9 Other possibilities
9.9.1 EDB xDB Replication Server
EDB xDB (cross database) Replication Server is an asynchronous replication system
available for PostgreSQL based on publish subscribe model.
xDB Replication Server can be used to implement replication systems based on either of
two different replication models
• Single-master (master-to-slave) replication
• Multi-master replication.
The following are the combinations of cross database replications that xDB Replication
Server supports for single-master replication:
Master Database Slave Database
Oracle PostgreSQL
Oracle EDB Postgres
SQL Server PostgreSQL
SQL Server EDB Postgres
PostgreSQL SQL Server
PostgreSQL EDB Postgres
EDB Postgres SQL Server
EDB Postgres Oracle
EDB Postgres PostgreSQL
For multi-master replication, xDB Replication Server supports the following servers:
Master Database
PostgreSQL
EDB Postgres
XDB replication server can either use trigger based method or logical decoding based
model to perform replication.
96/96

Replication in PostgreSQL tutorial given in Postgres Conference 2019

Recommended

More Related Content

What's hot (20)

Similar to Replication in PostgreSQL tutorial given in Postgres Conference 2019 (20)

Recently uploaded (20)

Replication in PostgreSQL tutorial given in Postgres Conference 2019