pgbackrest: failed to promote instance (ERROR: [073]: unable to chdir() to '/var/lib/dblab/dblab_pool/data': [2] No such file or directory)
Goal
DLE config:
server:
verificationToken: "secret_token"
port: 2345
embeddedUI:
enabled: true
dockerImage: "postgresai/ce-ui:latest"
host: ""
port: 2346
global:
engine: postgres
debug: true
database:
username: postgres
dbname: postgres
telemetry:
enabled: false
url: "https://postgres.ai/api/general"
poolManager:
mountDir: /var/lib/dblab
dataSubDir: data
clonesMountSubDir: clones
socketSubDir: sockets
observerSubDir: observer
preSnapshotSuffix: "_pre"
selectedPool: ""
databaseContainer: &db_container
dockerImage: "postgresai/extended-postgres:13"
containerConfig:
"shm-size": 1gb
databaseConfigs: &db_configs
configs:
shared_buffers: 256MB
shared_preload_libraries: "pg_stat_statements, auto_explain, logerrors"
work_mem: "10MB"
provision:
<<: *db_container
portPool:
from: 6000
to: 6100
useSudo: false
keepUserPasswords: false
retrieval:
jobs:
- physicalRestore
- physicalSnapshot
spec:
physicalRestore:
options:
<<: *db_container
tool: pgbackrest
sync:
enabled: true
healthCheck:
interval: 5
maxRetries: 200
configs:
shared_buffers: 256MB
recovery:
envs:
PGBACKREST_LOG_LEVEL_CONSOLE: detail
PGBACKREST_PROCESS_MAX: 2
PGBACKREST_REPO: 1
PGBACKREST_REPO1_TYPE: "s3"
PGBACKREST_REPO1_PATH: "/pgbackrest"
PGBACKREST_REPO1_S3_ENDPOINT: "s3.amazonaws.com"
PGBACKREST_REPO1_S3_BUCKET: "vitabaks"
PGBACKREST_REPO1_S3_REGION: "us-east-2"
PGBACKREST_REPO1_S3_KEY: "AKIAYCGN7VKQGIDGM4CE"
PGBACKREST_REPO1_S3_KEY_SECRET: "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"
pgbackrest:
stanza: my_stanza
forceInit: false
physicalSnapshot:
options:
skipStartSnapshot: false
<<: *db_configs
promotion:
<<: *db_container
enabled: true
healthCheck:
interval: 5
maxRetries: 200
queryPreprocessing:
queryPath: ""
maxParallelWorkers: 2
configs:
shared_buffers: 256MB
recovery:
preprocessingScript: ""
scheduler:
snapshot:
timetable: "*/10 * * * *"
retention:
timetable: "*/13 * * * *"
limit: 4
envs:
PGBACKREST_LOG_LEVEL_CONSOLE: detail
PGBACKREST_PROCESS_MAX: 2
PGBACKREST_REPO: 1
PGBACKREST_REPO1_TYPE: "s3"
PGBACKREST_REPO1_PATH: "/pgbackrest"
PGBACKREST_REPO1_S3_ENDPOINT: "s3.amazonaws.com"
PGBACKREST_REPO1_S3_BUCKET: "vitabaks"
PGBACKREST_REPO1_S3_REGION: "us-east-2"
PGBACKREST_REPO1_S3_KEY: "AKIAYCGN7VKQGIDGM4CE"
PGBACKREST_REPO1_S3_KEY_SECRET: "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"
cloning:
accessHost: "localhost"
maxIdleMinutes: 120
Run DLE:
sudo zpool create -f \
-O compression=on \
-O atime=off \
-O recordsize=128k \
-O logbias=throughput \
-m /var/lib/dblab/dblab_pool \
dblab_pool \
/dev/nvme1n1
sudo docker run \
--name dblab_server \
--label dblab_control \
--privileged \
--publish 127.0.0.1:2345:2345 \
--volume /var/run/docker.sock:/var/run/docker.sock \
--volume /var/lib/dblab:/var/lib/dblab/:rshared \
--volume ~/.dblab/engine/configs:/home/dblab/configs:ro \
--volume ~/.dblab/engine/meta:/home/dblab/meta \
--volume /sys/kernel/debug:/sys/kernel/debug:rw \
--volume /lib/modules:/lib/modules:ro \
--volume /proc:/host_proc:ro \
--env DOCKER_API_VERSION=1.39 \
--detach \
--restart on-failure \
postgresai/dblab-server:3.1.0-rc.0
DLE log:
2022/03/22 19:46:21 physical.go:479: [INFO] Promote the Postgres instance.
2022/03/22 19:46:21 configuration.go:120: [INFO] Configuration is already initialized
2022/03/22 19:46:21 configuration.go:490: [DEBUG] Appending configuration to /var/lib/dblab/dblab_pool/clones/clone_pre_20220322194620/data/postgresql.dblab.recovery.conf
2022/03/22 19:46:21 configuration.go:471: [DEBUG] Applying configuration: /var/lib/dblab/dblab_pool/clones/clone_pre_20220322194620/data/postgresql.dblab.promotion.conf
2022/03/22 19:46:21 tools.go:133: [DEBUG] Virtualization system: docker
2022/03/22 19:46:21 tools.go:143: [DEBUG] [{bind /var/lib/dblab/dblab_pool/clones/clone_pre_20220322194620/data /var/lib/dblab/dblab_pool/clones/clone_pre_20220322194620/data false 0xc000098fc0 <nil> <nil>} {bind /home/ubuntu/.dblab/engine/configs /home/dblab/configs true 0xc000098fd8 <nil> <nil>} {bind /home/ubuntu/.dblab/engine/meta /home/dblab/meta false 0xc000098ff0 <nil> <nil>} {bind /sys/kernel/debug /sys/kernel/debug false 0xc000099008 <nil> <nil>} {bind /lib/modules /lib/modules true 0xc000099020 <nil> <nil>} {bind /proc /host_proc true 0xc000099038 <nil> <nil>} {bind /var/run/docker.sock /var/run/docker.sock false 0xc000099050 <nil> <nil>}]
2022/03/22 19:46:21 tools.go:401: [INFO] Docker image "postgresai/extended-postgres:13" already exists locally
2022/03/22 19:46:21 physical.go:563: [INFO] Running container: dblab_promote_c8cic40rn5fmbqu0olo0. ID: 550eb61b0fe3e365c173f8ca8c58f6df80107c713e7a83174c682a5cf11badc1
2022/03/22 19:46:22 physical.go:339: [INFO] Starting PostgreSQL and waiting for readiness
2022/03/22 19:46:22 physical.go:340: [INFO] View logs using the command: docker logs --since 1m -f dblab_sync_c8cic40rn5fmbqu0olo0
2022/03/22 19:46:22 tools.go:285: [INFO] Check container readiness: ded70afbd880c488b2346db170301353e57a2e8a00d5e6660dab8344c6acdfed
2022/03/22 19:46:22 physical.go:654: [DEBUG] /var/lib/dblab/dblab_pool/clones/clone_pre_20220322194620/data
2022/03/22 19:46:22 physical.go:672: [DEBUG] Look up into file: /var/lib/dblab/dblab_pool/clones/clone_pre_20220322194620/data/pg_wal/archive_status
2022/03/22 19:46:22 physical.go:684: [DEBUG] no found dataStateAt in WAL files
2022/03/22 19:46:22 physical.go:582: [INFO] Starting PostgreSQL and waiting for readiness
2022/03/22 19:46:22 physical.go:583: [INFO] View logs using the command: docker logs --since 1m -f dblab_promote_c8cic40rn5fmbqu0olo0
2022/03/22 19:46:22 tools.go:285: [INFO] Check container readiness: 550eb61b0fe3e365c173f8ca8c58f6df80107c713e7a83174c682a5cf11badc1
2022/03/22 19:46:28 physical.go:346: [INFO] Sync instance has been running
2022/03/22 19:48:00 logging.go:16: [INFO] -> GET /healthz
2022/03/22 19:48:01 logging.go:16: [INFO] -> GET /status
2022/03/22 19:48:01 logging.go:16: [INFO] -> GET /snapshots
2022/03/22 19:48:01 mode_local.go:315: [INFO] no snapshots for pool dblab_pool
2022/03/22 19:48:01 util.go:31: [DEBUG] Response: []
2022/03/22 19:48:01 util.go:31: [DEBUG] Response: &{0xc0000aa800 {v3.1.0-rc.0-20220318-0945 2022-03-22 19:44:19 +0000 UTC 0xc00055aad8} [{dblab_pool zfs 0001-01-01 00:00:00 +0000 UTC refreshing [] {zfs 9881571328 9033936896 847634432 1928912896 178688 510976 2.27}}] {0 0 []} {physical refreshing 2022-03-22 19:44:19 +0000 UTC <nil> map[]} {postgresai/extended-postgres:13 map[shm-size:1gb]}}
2022/03/22 19:58:47 logging.go:16: [INFO] -> GET /healthz
2022/03/22 19:58:47 logging.go:16: [INFO] -> GET /status
2022/03/22 19:58:47 logging.go:16: [INFO] -> GET /snapshots
2022/03/22 19:58:47 mode_local.go:315: [INFO] no snapshots for pool dblab_pool
2022/03/22 19:58:47 util.go:31: [DEBUG] Response: []
2022/03/22 19:58:47 util.go:31: [DEBUG] Response: &{0xc0000aa860 {v3.1.0-rc.0-20220318-0945 2022-03-22 19:44:19 +0000 UTC 0xc0004fc9c8} [{dblab_pool zfs 0001-01-01 00:00:00 +0000 UTC refreshing [] {zfs 9881637376 9033775616 847861760 1929191424 178688 711168 2.27}}] {0 0 []} {physical refreshing 2022-03-22 19:44:19 +0000 UTC <nil> map[]} {postgresai/extended-postgres:13 map[shm-size:1gb]}}
2022/03/22 20:03:42 tools.go:343: [INFO] Container logs:
2022/03/22 20:03:42 tools.go:355: [INFO] Postgres logs: 2022-03-22 19:59:10.218 UTC,,,2030,"[local]",623a2a8e.7ee,1,"",2022-03-22 19:59:10 UTC,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,"","not initialized"
2022-03-22 19:59:10.221 UTC,"postgres","postgres",2030,"[local]",623a2a8e.7ee,2,"",2022-03-22 19:59:10 UTC,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,"","client backend"
2022-03-22 19:59:15.395 UTC,,,2042,"[local]",623a2a93.7fa,1,"",2022-03-22 19:59:15 UTC,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,"","not initialized"
2022-03-22 19:59:15.395 UTC,"postgres","postgres",2042,"[local]",623a2a93.7fa,2,"",2022-03-22 19:59:15 UTC,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,"","client backend"
2022-03-22 19:59:20.640 UTC,,,2056,"[local]",623a2a98.808,1,"",2022-03-22 19:59:20 UTC,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,"","not initialized"
2022-03-22 19:59:20.640 UTC,"postgres","postgres",2056,"[local]",623a2a98.808,2,"",2022-03-22 19:59:20 UTC,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,"","client backend"
2022-03-22 19:59:25.931 UTC,,,2069,"[local]",623a2a9d.815,1,"",2022-03-22 19:59:25 UTC,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,"","not initialized"
2022-03-22 19:59:25.932 UTC,"postgres","postgres",2069,"[local]",623a2a9d.815,2,"",2022-03-22 19:59:25 UTC,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,"","client backend"
2022-03-22 19:59:31.173 UTC,,,2083,"[local]",623a2aa3.823,1,"",2022-03-22 19:59:31 UTC,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,"","not initialized"
2022-03-22 19:59:31.174 UTC,"postgres","postgres",2083,"[local]",623a2aa3.823,2,"",2022-03-22 19:59:31 UTC,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,"","client backend"
2022-03-22 19:59:36.619 UTC,,,2095,"[local]",623a2aa8.82f,1,"",2022-03-22 19:59:36 UTC,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,"","not initialized"
2022-03-22 19:59:36.626 UTC,"postgres","postgres",2095,"[local]",623a2aa8.82f,2,"",2022-03-22 19:59:36 UTC,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,"","client backend"
2022-03-22 19:59:41.824 UTC,,,2108,"[local]",623a2aad.83c,1,"",2022-03-22 19:59:41 UTC,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,"","not initialized"
2022-03-22 19:59:41.824 UTC,"postgres","postgres",2108,"[local]",623a2aad.83c,2,"",2022-03-22 19:59:41 UTC,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,"","client backend"
2022-03-22 19:59:47.030 UTC,,,2122,"[local]",623a2ab3.84a,1,"",2022-03-22 19:59:47 UTC,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,"","not initialized"
2022-03-22 19:59:47.031 UTC,"postgres","postgres",2122,"[local]",623a2ab3.84a,2,"",2022-03-22 19:59:47 UTC,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,"","client backend"
2022-03-22 19:59:52.163 UTC,,,2134,"[local]",623a2ab8.856,1,"",2022-03-22 19:59:52 UTC,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,"","not initialized"
2022-03-22 19:59:52.163 UTC,"postgres","postgres",2134,"[local]",623a2ab8.856,2,"",2022-03-22 19:59:52 UTC,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,"","client backend"
2022-03-22 19:59:57.427 UTC,,,2148,"[local]",623a2abd.864,1,"",2022-03-22 19:59:57 UTC,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,"","not initialized"
2022-03-22 19:59:57.427 UTC,"postgres","postgres",2148,"[local]",623a2abd.864,2,"",2022-03-22 19:59:57 UTC,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,"","client backend"
2022/03/22 20:03:42 tools.go:371: [INFO] Removing container ID: 550eb61b0fe3e365c173f8ca8c58f6df80107c713e7a83174c682a5cf11badc1
2022/03/22 20:03:47 tools.go:377: [INFO] Container "550eb61b0fe3e365c173f8ca8c58f6df80107c713e7a83174c682a5cf11badc1" has been stopped
2022/03/22 20:03:48 tools.go:388: [INFO] Container "550eb61b0fe3e365c173f8ca8c58f6df80107c713e7a83174c682a5cf11badc1" has been removed
2022/03/22 20:03:48 runners.go:106: [DEBUG] Run(Local): "zfs destroy -R dblab_pool/clone_pre_20220322194620"
2022/03/22 20:03:48 runners.go:151: [DEBUG] Run(Local): output ""
2022/03/22 20:03:48 runners.go:106: [DEBUG] Run(Local): "zfs destroy -R dblab_pool@snapshot_20220322194620_pre"
2022/03/22 20:03:48 runners.go:151: [DEBUG] Run(Local): output ""
2022/03/22 20:03:48 physical.go:441: [INFO] Snapshot scheduler has been started
2022/03/22 20:03:48 main.go:179: [ERROR] Failed to run the data retrieval service: failed to promote instance: failed to readiness check: container health check failed
2022/03/22 20:03:48 main.go:180: [INFO] If you have problems or questions, please contact Postgres.ai: https://postgres.ai/contact
Promote instance PostgreSQL log:
2022-03-22 19:59:03.450 P00 INFO: archive-get command end: aborted with exception [073]
2022-03-22 19:59:03.459 P00 INFO: archive-get command begin 2.37: [00000002.history, pg_wal/RECOVERYHISTORY] --exec-id=2006-392d8520 --log-level-console=detail --pg1-path=/var/lib/dblab/dblab_pool/data --process-max=2 --repo=1 --repo1-path=/pgbackrest --repo1-s3-bucket=vitabaks --repo1-s3-endpoint=s3.amazonaws.com --repo1-s3-key=<redacted> --repo1-s3-key-secret=<redacted> --repo1-s3-region=us-east-2 --repo1-type=s3 --stanza=my_stanza
ERROR: [073]: unable to chdir() to '/var/lib/dblab/dblab_pool/data': [2] No such file or directory
2022-03-22 19:59:03.461 P00 INFO: archive-get command end: aborted with exception [073]
2022-03-22 19:59:08.448 P00 INFO: archive-get command begin 2.37: [0000000100000007000000BE, pg_wal/RECOVERYXLOG] --exec-id=2019-78eb190f --log-level-console=detail --pg1-path=/var/lib/dblab/dblab_pool/data --process-max=2 --repo=1 --repo1-path=/pgbackrest --repo1-s3-bucket=vitabaks --repo1-s3-endpoint=s3.amazonaws.com --repo1-s3-key=<redacted> --repo1-s3-key-secret=<redacted> --repo1-s3-region=us-east-2 --repo1-type=s3 --stanza=my_stanza
ERROR: [073]: unable to chdir() to '/var/lib/dblab/dblab_pool/data': [2] No such file or directory
We are observed an error in the postgresql log because the restore_command
specifies an incorrect path to the promote instance data directory, it is simply inherited from the sync instance.
/var/lib/dblab/dblab_pool/data
instead of /var/lib/dblab/dblab_pool/clones/clone_pre_20220322194620/data
ubuntu@ip-172-31-11-169:~$ sudo cat /var/lib/dblab/dblab_pool/clones/clone_pre_20220322194620/data/postgresql.dblab.recovery.conf
restore_command = 'pgbackrest --pg1-path=/var/lib/dblab/dblab_pool/data --stanza=my_stanza archive-get %f %p'
recovery_target = 'immediate'
recovery_target_action = 'promote'
ubuntu@ip-172-31-11-169:~$ sudo cat /var/lib/dblab/dblab_pool/clones/clone_pre_20220322194620/data/postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
# Recovery settings generated by pgBackRest restore on 2022-03-22 19:46:15
restore_command = 'pgbackrest --pg1-path=/var/lib/dblab/dblab_pool/data --stanza=my_stanza archive-get %f "%p"'
ubuntu@ip-172-31-11-169:~$ sudo docker exec -it dblab_promote_c8cic40rn5fmbqu0olo0 bash
root@550eb61b0fe3:/# echo $PGDATA
/var/lib/dblab/dblab_pool/clones/clone_pre_20220322194620/data
Issue: #265 (closed)
pgbackrest adds the restore_command parameter in the postgresql.auto.conf (or recovery.conf for PostgreSQL older than 12), and since the Postgres data directory path for Promote Instance is different, we override the restore_command specifying the variable
${PGDATA}
for thepg1-path
.
from DLE log:
2022/03/22 19:44:20 physical.go:218: [INFO] Running restore command: sudo -Eu postgres pgbackrest --type=standby --pg1-path=/var/lib/dblab/dblab_pool/data --stanza=my_stanza restore
TODO / How to implement
I suggest adding variable ${PGDATA} to the --pg1-path
and --recovery-option
options to the restore command.
Example:
sudo -Eu postgres pgbackrest --type=standby --pg1-path=${PGDATA} --stanza=my_stanza restore --recovery-option=restore_command='pgbackrest --pg1-path=${PGDATA} --stanza=my_stanza archive-get %f %p'
postgresql.auto.conf
sudo cat /var/lib/dblab/dblab_pool/clones/clone_pre_20220322205238/data/postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
# Recovery settings generated by pgBackRest restore on 2022-03-22 20:52:32
restore_command = 'pgbackrest --pg1-path=${PGDATA} --stanza=my_stanza archive-get %f %p'
Or, DLE can control the PostgreSQL restore_command parameter for promotion instance, in which case we do not need to specify the --recovery-option parameter in the pgbackrest restore command.
Acceptance criteria
restore_command specifies the correct path to the data directory for the promote instance, or the $PGDATA variable is used