PostgreSQL Replication Tutorial

PostgreSQL: Understanding replication
Hans-J¨urgen Sch¨onig
www.postgresql-support.de

Welcome to PostgreSQL replication

What you will learn
How PostgreSQL writes data
What the transaction log does
How to set up streaming replication
How to handle Point-In-Time-Recovery
Managing conﬂicts
Monitoring replication
More advanced techniques

How PostgreSQL writes data

Writing a row of data
Understanding how PostgreSQL writes data is key to
understanding replication
Vital to understand PITR
A lot of potential to tune the system

Write the log ﬁrst (1)
It is not possible to send data to a data table directly.
What if the system crashes during a write?
A data ﬁle could end up with broken data at potentially
unknown positions
Corruption is not an option

Write the log ﬁrst (2)
Data goes to the xlog (= WAL) ﬁrst
WAL is short for “Write Ahead Log”
IMPORTANT: The xlog DOES NOT contain SQL
It contains BINARY changes

The xlog
The xlog consists of a set of 16 MB ﬁles
The xlog consists of various types of records (heap changes,
btree changes, etc.)
It has to be ﬂushed to disk on commit to achieve durability

Expert tip: Debugging the xlog
Change WAL DEBUG in src/include/pg conﬁg manual.h
Recompile PostgreSQL
NOTE: This is not for normal use but just for training purposes

Enabling wal debug
test=# SET wal_debug TO on;
SET
test=# SET client_min_messages TO debug;
SET

Observing changes
Every change will go to the screen now
It helps to understand how PostgreSQL works
Apart from debugging: The practical purpose is limited

Making changes
Data goes to the xlog first
Then data is put into shared buffers
At some later point data is written to the data files
This does not happen instantly leaving a lot of room for
optimization and tuning

A consistent view of the data
Data is not sent to those data files instantly.
Still: End users will have a consistent view of the data
When a query comes in, it checks the I/O cache (=
shared buffers) and asks the data files only in case of a cache
miss.
xlog is about the physical not about the logical level

Sustaining writes
We cannot write to the xlog forever without recycling it.
The xlog is recycled during a so called “checkpoint”.
Before the xlog can be recycled, data must be stored safely in
those data ﬁles
Checkpoints have a huge impact on performance

Checkpoint parameters:
checkpoint timeout = 15min
max wal size = 5GB
min wal size = 800MB
checkpoint completion target = 0.5
checkpoint warning = 30s

Checkpointing to frequently
Checkpointing is expensive
PostgreSQL warns about too frequent checkpoints
This is what checkpoint warning is good for

min wal size and max wal size (1)
This is a replacement for checkpoint segments
Now the xlog size is auto-tuned
The new conﬁguration was introduced in PostgreSQL 9.5

Instead of having a single knob (checkpoint segments) that both
triggers checkpoints, and determines how many checkpoints to
recycle, they are now separate concerns. There is still an internal
variable called CheckpointSegments, which triggers checkpoints.
But it no longer determines how many segments to recycle at a
checkpoint. That is now auto-tuned by keeping a moving average
of the distance between checkpoints (in bytes), and trying to keep
that many segments in reserve.

The advantage of this is that you can set max wal size very high,
but the system won’t actually consume that much space if there
isn’t any need for it. The min wal size sets a ﬂoor for that; you
can eﬀectively disable the auto-tuning behavior by setting
min wal size equal to max wal size.

How does it impact replication
The xlog has all the changes needed and can therefore be
used for replication.
Copying data ﬁles is not enough to achieve a consistent view
of the data
It has some implications related to base backups

Setting up streaming replication

The basic process
S: Install PostgreSQL on the slave (no initdb)
M: Adapt postgresql.conf
M: Adapt pg hba.conf
M: Restart PostgreSQL
S: Pull a base backup
S: Start the slave

Changing postgresql.conf
wal level: Ensure that there is enough xlog generated by the
master (recovering a server needs more xlog than just simple
crash-safety)
max wal senders: When a slave is streaming, connects to the
master and fetches xlog. A base backup will also need 1 / 2
wal senders
hot standby: This is not needed because it is ignored on the
master but it saves some work on the slave later on

Changing pg hba.conf
Rules for replication have to be added.
Note that “all” databases does not include replication
A separate rule has to be added, which explicitly states
“replication” in the second column
Replication rules work just like any other pg hba.conf rule
Remember: The ﬁrst line matching rules

Restarting PostgreSQL
To activate those settings in postgresql.conf the master has to
be restarted.
If only pg hba.conf is changed, a simple SIGHUP (pg ctl
reload) is enough.

Using pg basebackup (1)
pg basebackup will fetch a copy of the data from the master
While pg basebackup is running, the master is fully
operational (no downtime needed)
pg basebackup connects through a database connection and
copies all data ﬁles as they are
In most cases this does not create a consistent backup
The xlog is needed to “repair” the base backup (this is exactly
what happens during xlog replay anyway)

Using pg basebackup (2)
pg_basebackup -h master.com -D /slave
--xlog-method=stream --checkpoint=fast -R

xlog-method: Self-contained backups
By default a base backup is not self-contained.
The database does not start up without additional xlog.
This is ﬁne for Point-In-Time-Recovery because there is an
archive around.
For streaming it can be a problem.
–xlog-method=stream opens a second connection to fetch
xlog during the base backup

checkpoint=fast: Instant backups
By default pg basebackup starts as soon as the master
checkpoints.
This can take a while.
–checkpoint=fast makes the master check instantly.
In case of a small backup an instant checkpoint speeds things
up.

-R: Generating a config file
For a simple streaming setup all PostgreSQL has to know is
already passed to pg basebackup (host, port, etc.).
-R automatically generates a recovery.conf file, which is quite
ok in most cases.

Backup throttling
–max-rate=RATE: maximum transfer rate to transfer data
directory (in kB/s, or use suﬃx “k” or “M”)
If your master is weak a pg basebackup running at full speed
can lead to high response times and disk wait.
Slowing down the backup can help to make sure the master
stays responsive.

Adjusting recovery.conf
A basic setup needs:
primary conninfo: A connect string pointing to the master
server
standby mode = on: Tells the system to stream instantly
Additional conﬁguration parameters are available

Starting up the slave
Make sure the slave has connected to the master
Make sure it has reached a consistent state
Check for wal sender and wal receiver processes

Promoting a slave to master
Promoting a slave to a master is easy:
pg_ctl -D ... promote
After promotion recovery.conf will be renamed to
recovery.done

One word about security
So far replication has been done as superuser
This is not necessary
Creating a user, which can do just replication makes sense
CREATE ROLE foo ... REPLICATION ... NOSUPERUSER;

Monitoring replication

Simple checks
The most basic and most simplistic check is to check for
wal sender (on the master)
wal receiver (on the slave)
Without those processes the party is over

More detailed analysis
pg stat replication contains a lot of information
Make sure an entry for each slave is there
Check for replication lag

Checking for replication lag
A sustained lag is not a good idea.
The distance between the sender and the receiver can be
measured in bytes
SELECT client_addr,
pg_xlog_location_diff(pg_current_xlog_location(),
sent_location)
FROM pg_stat_replication;
In asynchronous replication the replication lag can vary
dramatically (for example during CREATE INDEX, etc.)

Creating large clusters

Handling more than 2 nodes
A simple 2 node cluster is easy.
In case of more than 2 servers, life is a bit harder.
If you have two slaves and the master fails: Who is going to
be the new master?
Unless you want to resync all your data, you should better
elect the server containing most of the data already
Comparing xlog positions is necessary

Timeline issues
When a slave is promoted the timeline ID is incremented
Master and slave have to be in the same timeline
In case of two servers it is important to connect one server to
the second one ﬁrst and do the promotion AFTERWARDS.
This ensures that the timeline switch is already replicated
from the new master to the surviving slave.

Cascading slaves
Slaves can be connected to slaves
Cascading can make sense to reduce bandwidth requirements
Cascading can take load from the master
Use pg basebackup to fetch data from a slave as if it was a
master

Conﬂicts

How conﬂicts happen
During replication conﬂicts can happen
Example: The master might want to remove a row still visible
to a reading transaction on the slave

What happens during a conﬂict
PostgreSQL will terminate a database connection after some
time
max standby archive delay = 30s
max standby streaming delay = 30s
Those settings deﬁne the maximum time the slave waits
during replay before replay is resumed.
In rare cases a connection might be aborted quite soon.

Reducing conﬂicts
Conﬂicts can be reduced nicely by setting
hot standby feedback.
The slave will send its oldest transaction ID to tell the master
that cleanup has to be deferred.

Making replication more reliable

What happens if a slave reboots?
If a slave is gone for too long, the master might recycle its
transaction log
The slave needs a full history of the xlog
Setting wal keep segments on the master helps to prevent the
master from recycling transaction log too early
I recommend to always use wal keep segments to make sure
that a slave can be started after a pg basebackup

Making use of replication slots
Replication slots have been added in PostgreSQL 9.4
There are two types of replication slots:
Physical replication slots (for streaming)
Logical replication slots (for logical decoding)

Conﬁguring replication slots
Change max replication slots and restart the master
Run . . .
test=# SELECT *
FROM pg_create_physical_replication_slot(’some_name’);
slot_name | xlog_position
-----------+---------------
some_name |
(1 row)

Tweaking the slave
Add this replication slot to primary slot name on the slave:
primary_slot_name = ’some_name’
The master will ensure that xlog is only recycled when it has
been consumed by the slave.

A word of caution
If a slave is removed make sure the replication slot is dropped.
Otherwise the master might run out of disk space.
NEVER use replication slots without monitoring the size of
the xlog on the sender.

Key advantages of replication slots
The diﬀerence between master and slave can be arbitrary.
During bulk load or CREATE INDEX this can be essential.
It can help to overcome the problems caused by slow networks.
It can help to avoid resyncs.

Moving to synchronous replication

Synchronous vs. asynchronous
Asynchronous replication: Commits on the slave can happen
long after the commit on the master.
Synchronous replication: A transaction has to be written to a
second server.
Synchronous replication potentially adds some network
latency to the scenery

The application name
During normal operations the application name setting can be
used to assign a name to a database connection.
In case of synchronous replication this variable is used to
determine synchronous candidates.

Conﬁguring synchronous replication:
Master:
add names to synchronous standby names
Slave:
add an application name to your connect string in
primary conninfo

Fail safety
Synchronous replication needs 2 active servers
If no two servers are left, replication will wait until a second
server is available.
Use AT LEAST 3 servers for synchronous replication to avoid
risk.

Point-In-Time-Recovery

What it does
PITR can be used to reach (almost) any point after a base
backup.
It is more of a backup strategy than a replication thing.
Replication and PITR can be combined.

Conﬁguring for PITR
S: create an archive (ideally this is not on the master)
M: Change postgresql.conf
set wal level
set max wal senders (if pg basebackup is desired)
set archive mode to on
set a proper archive command to archive xlog
M: adapt pg hba.conf (if pg basebackup is desired)
M: restart the master

pg basebackup, etc.
Perform a pg basebackup as performed before
–xlog-method=stream and -R are not needed
In the archive a .backup file will be available after
pg basebackup
You can delete all xlog files older than the oldest base backup
you want to keep.
The .backup file will guide you

Restoring from a crash
Take a base backup.
Write a recovery.conf ﬁle:
restore command: Tell PostgreSQL where to ﬁnd xlog
recovery target time (optional): Use a timestamp to tell the
system how far to recover
Start the server
Make sure the system has reached consistent state

More conﬁg options

recovery min apply delay: Delayed replay
This settings allows you to tell the slave that a certain delay is
desired.
Example: A stock broker might want to provide you with 15
minute old data

pause at recovery target
Make sure that the recovery does not stop at a speciﬁed point
in time.
Make PostgreSQL wait when a certain point is reached.
This is essential in case you do not know precisely how far to
recover

recovery target name
Sometimes you want to recover to a certain point in time,
which has been speciﬁed before.
To specify a point in time run . . .
SELECT pg_create_restore_point(’some_name’);
Use this name in recovery.conf to recover to this very speciﬁc
point

Finally . . .

Contact us . . .
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt Austria
More than 15 years of PostgreSQL experience:
Training
Consulting
24x7 support

PostgreSQL Replication Tutorial

Recommended

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to PostgreSQL Replication Tutorial (20)

Recently uploaded (20)

PostgreSQL Replication Tutorial