Skip to content

pgbackrest: failed to promote instance (ERROR: [073]: unable to chdir() to '/var/lib/dblab/dblab_pool/data': [2] No such file or directory)

Goal

DLE config:

server:
  verificationToken: "secret_token"
  port: 2345
embeddedUI:
  enabled: true
  dockerImage: "postgresai/ce-ui:latest"
  host: ""
  port: 2346
global:
  engine: postgres
  debug: true
  database:
    username: postgres
    dbname: postgres
  telemetry:
    enabled: false
    url: "https://postgres.ai/api/general"
poolManager:
  mountDir: /var/lib/dblab
  dataSubDir: data
  clonesMountSubDir: clones
  socketSubDir: sockets
  observerSubDir: observer
  preSnapshotSuffix: "_pre"
  selectedPool: ""
databaseContainer: &db_container
  dockerImage: "postgresai/extended-postgres:13"
  containerConfig:
    "shm-size": 1gb
databaseConfigs: &db_configs
  configs:
    shared_buffers: 256MB
    shared_preload_libraries: "pg_stat_statements, auto_explain, logerrors"
    work_mem: "10MB"
provision:
  <<: *db_container
  portPool:
    from: 6000
    to: 6100
  useSudo: false
  keepUserPasswords: false
retrieval:
  jobs:
    - physicalRestore
    - physicalSnapshot
  spec:
    physicalRestore:
      options:
        <<: *db_container
        tool: pgbackrest
        sync:
          enabled: true
          healthCheck:
            interval: 5
            maxRetries: 200
          configs:
            shared_buffers: 256MB
          recovery:
        envs:
          PGBACKREST_LOG_LEVEL_CONSOLE: detail
          PGBACKREST_PROCESS_MAX: 2
          PGBACKREST_REPO: 1
          PGBACKREST_REPO1_TYPE: "s3"
          PGBACKREST_REPO1_PATH: "/pgbackrest"
          PGBACKREST_REPO1_S3_ENDPOINT: "s3.amazonaws.com"
          PGBACKREST_REPO1_S3_BUCKET: "vitabaks"
          PGBACKREST_REPO1_S3_REGION: "us-east-2"
          PGBACKREST_REPO1_S3_KEY: "AKIAYCGN7VKQGIDGM4CE"
          PGBACKREST_REPO1_S3_KEY_SECRET: "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"
        pgbackrest:
          stanza: my_stanza
          forceInit: false
    physicalSnapshot:
      options:
        skipStartSnapshot: false
        <<: *db_configs
        promotion:
          <<: *db_container
          enabled: true
          healthCheck:
            interval: 5
            maxRetries: 200
          queryPreprocessing:
            queryPath: ""
            maxParallelWorkers: 2
          configs:
            shared_buffers: 256MB
          recovery:
        preprocessingScript: ""
        scheduler:
          snapshot:
            timetable: "*/10 * * * *"
          retention:
            timetable: "*/13 * * * *"
            limit: 4
        envs:
          PGBACKREST_LOG_LEVEL_CONSOLE: detail
          PGBACKREST_PROCESS_MAX: 2
          PGBACKREST_REPO: 1
          PGBACKREST_REPO1_TYPE: "s3"
          PGBACKREST_REPO1_PATH: "/pgbackrest"
          PGBACKREST_REPO1_S3_ENDPOINT: "s3.amazonaws.com"
          PGBACKREST_REPO1_S3_BUCKET: "vitabaks"
          PGBACKREST_REPO1_S3_REGION: "us-east-2"
          PGBACKREST_REPO1_S3_KEY: "AKIAYCGN7VKQGIDGM4CE"
          PGBACKREST_REPO1_S3_KEY_SECRET: "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"
cloning:
  accessHost: "localhost"
  maxIdleMinutes: 120

Run DLE:

sudo zpool create -f \
  -O compression=on \
  -O atime=off \
  -O recordsize=128k \
  -O logbias=throughput \
  -m /var/lib/dblab/dblab_pool \
  dblab_pool \
  /dev/nvme1n1

sudo docker run \
  --name dblab_server \
  --label dblab_control \
  --privileged \
  --publish 127.0.0.1:2345:2345 \
  --volume /var/run/docker.sock:/var/run/docker.sock \
  --volume /var/lib/dblab:/var/lib/dblab/:rshared \
  --volume ~/.dblab/engine/configs:/home/dblab/configs:ro \
  --volume ~/.dblab/engine/meta:/home/dblab/meta \
  --volume /sys/kernel/debug:/sys/kernel/debug:rw \
  --volume /lib/modules:/lib/modules:ro \
  --volume /proc:/host_proc:ro \
  --env DOCKER_API_VERSION=1.39 \
  --detach \
  --restart on-failure \
  postgresai/dblab-server:3.1.0-rc.0

DLE log:

2022/03/22 19:46:21 physical.go:479: [INFO]   Promote the Postgres instance.
2022/03/22 19:46:21 configuration.go:120: [INFO]   Configuration is already initialized
2022/03/22 19:46:21 configuration.go:490: [DEBUG]  Appending configuration to  /var/lib/dblab/dblab_pool/clones/clone_pre_20220322194620/data/postgresql.dblab.recovery.conf
2022/03/22 19:46:21 configuration.go:471: [DEBUG]  Applying configuration:  /var/lib/dblab/dblab_pool/clones/clone_pre_20220322194620/data/postgresql.dblab.promotion.conf
2022/03/22 19:46:21 tools.go:133: [DEBUG]  Virtualization system:  docker
2022/03/22 19:46:21 tools.go:143: [DEBUG]  [{bind /var/lib/dblab/dblab_pool/clones/clone_pre_20220322194620/data /var/lib/dblab/dblab_pool/clones/clone_pre_20220322194620/data false  0xc000098fc0 <nil> <nil>} {bind /home/ubuntu/.dblab/engine/configs /home/dblab/configs true  0xc000098fd8 <nil> <nil>} {bind /home/ubuntu/.dblab/engine/meta /home/dblab/meta false  0xc000098ff0 <nil> <nil>} {bind /sys/kernel/debug /sys/kernel/debug false  0xc000099008 <nil> <nil>} {bind /lib/modules /lib/modules true  0xc000099020 <nil> <nil>} {bind /proc /host_proc true  0xc000099038 <nil> <nil>} {bind /var/run/docker.sock /var/run/docker.sock false  0xc000099050 <nil> <nil>}]
2022/03/22 19:46:21 tools.go:401: [INFO]   Docker image "postgresai/extended-postgres:13" already exists locally
2022/03/22 19:46:21 physical.go:563: [INFO]   Running container: dblab_promote_c8cic40rn5fmbqu0olo0. ID: 550eb61b0fe3e365c173f8ca8c58f6df80107c713e7a83174c682a5cf11badc1
2022/03/22 19:46:22 physical.go:339: [INFO]   Starting PostgreSQL and waiting for readiness
2022/03/22 19:46:22 physical.go:340: [INFO]   View logs using the command: docker logs --since 1m -f dblab_sync_c8cic40rn5fmbqu0olo0
2022/03/22 19:46:22 tools.go:285: [INFO]   Check container readiness:  ded70afbd880c488b2346db170301353e57a2e8a00d5e6660dab8344c6acdfed
2022/03/22 19:46:22 physical.go:654: [DEBUG]  /var/lib/dblab/dblab_pool/clones/clone_pre_20220322194620/data
2022/03/22 19:46:22 physical.go:672: [DEBUG]  Look up into file:  /var/lib/dblab/dblab_pool/clones/clone_pre_20220322194620/data/pg_wal/archive_status
2022/03/22 19:46:22 physical.go:684: [DEBUG]  no found dataStateAt in WAL files
2022/03/22 19:46:22 physical.go:582: [INFO]   Starting PostgreSQL and waiting for readiness
2022/03/22 19:46:22 physical.go:583: [INFO]   View logs using the command: docker logs --since 1m -f dblab_promote_c8cic40rn5fmbqu0olo0
2022/03/22 19:46:22 tools.go:285: [INFO]   Check container readiness:  550eb61b0fe3e365c173f8ca8c58f6df80107c713e7a83174c682a5cf11badc1
2022/03/22 19:46:28 physical.go:346: [INFO]   Sync instance has been running
2022/03/22 19:48:00 logging.go:16: [INFO]   ->  GET /healthz
2022/03/22 19:48:01 logging.go:16: [INFO]   ->  GET /status
2022/03/22 19:48:01 logging.go:16: [INFO]   ->  GET /snapshots
2022/03/22 19:48:01 mode_local.go:315: [INFO]   no snapshots for pool dblab_pool
2022/03/22 19:48:01 util.go:31: [DEBUG]  Response: []
2022/03/22 19:48:01 util.go:31: [DEBUG]  Response: &{0xc0000aa800 {v3.1.0-rc.0-20220318-0945 2022-03-22 19:44:19 +0000 UTC 0xc00055aad8} [{dblab_pool zfs 0001-01-01 00:00:00 +0000 UTC refreshing [] {zfs 9881571328 9033936896 847634432 1928912896 178688 510976 2.27}}] {0 0 []} {physical refreshing 2022-03-22 19:44:19 +0000 UTC <nil> map[]} {postgresai/extended-postgres:13 map[shm-size:1gb]}}
2022/03/22 19:58:47 logging.go:16: [INFO]   ->  GET /healthz
2022/03/22 19:58:47 logging.go:16: [INFO]   ->  GET /status
2022/03/22 19:58:47 logging.go:16: [INFO]   ->  GET /snapshots
2022/03/22 19:58:47 mode_local.go:315: [INFO]   no snapshots for pool dblab_pool
2022/03/22 19:58:47 util.go:31: [DEBUG]  Response: []
2022/03/22 19:58:47 util.go:31: [DEBUG]  Response: &{0xc0000aa860 {v3.1.0-rc.0-20220318-0945 2022-03-22 19:44:19 +0000 UTC 0xc0004fc9c8} [{dblab_pool zfs 0001-01-01 00:00:00 +0000 UTC refreshing [] {zfs 9881637376 9033775616 847861760 1929191424 178688 711168 2.27}}] {0 0 []} {physical refreshing 2022-03-22 19:44:19 +0000 UTC <nil> map[]} {postgresai/extended-postgres:13 map[shm-size:1gb]}}
2022/03/22 20:03:42 tools.go:343: [INFO]   Container logs:
 
2022/03/22 20:03:42 tools.go:355: [INFO]   Postgres logs:  2022-03-22 19:59:10.218 UTC,,,2030,"[local]",623a2a8e.7ee,1,"",2022-03-22 19:59:10 UTC,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,"","not initialized"
2022-03-22 19:59:10.221 UTC,"postgres","postgres",2030,"[local]",623a2a8e.7ee,2,"",2022-03-22 19:59:10 UTC,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,"","client backend"
2022-03-22 19:59:15.395 UTC,,,2042,"[local]",623a2a93.7fa,1,"",2022-03-22 19:59:15 UTC,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,"","not initialized"
2022-03-22 19:59:15.395 UTC,"postgres","postgres",2042,"[local]",623a2a93.7fa,2,"",2022-03-22 19:59:15 UTC,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,"","client backend"
2022-03-22 19:59:20.640 UTC,,,2056,"[local]",623a2a98.808,1,"",2022-03-22 19:59:20 UTC,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,"","not initialized"
2022-03-22 19:59:20.640 UTC,"postgres","postgres",2056,"[local]",623a2a98.808,2,"",2022-03-22 19:59:20 UTC,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,"","client backend"
2022-03-22 19:59:25.931 UTC,,,2069,"[local]",623a2a9d.815,1,"",2022-03-22 19:59:25 UTC,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,"","not initialized"
2022-03-22 19:59:25.932 UTC,"postgres","postgres",2069,"[local]",623a2a9d.815,2,"",2022-03-22 19:59:25 UTC,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,"","client backend"
2022-03-22 19:59:31.173 UTC,,,2083,"[local]",623a2aa3.823,1,"",2022-03-22 19:59:31 UTC,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,"","not initialized"
2022-03-22 19:59:31.174 UTC,"postgres","postgres",2083,"[local]",623a2aa3.823,2,"",2022-03-22 19:59:31 UTC,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,"","client backend"
2022-03-22 19:59:36.619 UTC,,,2095,"[local]",623a2aa8.82f,1,"",2022-03-22 19:59:36 UTC,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,"","not initialized"
2022-03-22 19:59:36.626 UTC,"postgres","postgres",2095,"[local]",623a2aa8.82f,2,"",2022-03-22 19:59:36 UTC,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,"","client backend"
2022-03-22 19:59:41.824 UTC,,,2108,"[local]",623a2aad.83c,1,"",2022-03-22 19:59:41 UTC,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,"","not initialized"
2022-03-22 19:59:41.824 UTC,"postgres","postgres",2108,"[local]",623a2aad.83c,2,"",2022-03-22 19:59:41 UTC,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,"","client backend"
2022-03-22 19:59:47.030 UTC,,,2122,"[local]",623a2ab3.84a,1,"",2022-03-22 19:59:47 UTC,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,"","not initialized"
2022-03-22 19:59:47.031 UTC,"postgres","postgres",2122,"[local]",623a2ab3.84a,2,"",2022-03-22 19:59:47 UTC,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,"","client backend"
2022-03-22 19:59:52.163 UTC,,,2134,"[local]",623a2ab8.856,1,"",2022-03-22 19:59:52 UTC,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,"","not initialized"
2022-03-22 19:59:52.163 UTC,"postgres","postgres",2134,"[local]",623a2ab8.856,2,"",2022-03-22 19:59:52 UTC,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,"","client backend"
2022-03-22 19:59:57.427 UTC,,,2148,"[local]",623a2abd.864,1,"",2022-03-22 19:59:57 UTC,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,"","not initialized"
2022-03-22 19:59:57.427 UTC,"postgres","postgres",2148,"[local]",623a2abd.864,2,"",2022-03-22 19:59:57 UTC,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,"","client backend"
2022/03/22 20:03:42 tools.go:371: [INFO]   Removing container ID: 550eb61b0fe3e365c173f8ca8c58f6df80107c713e7a83174c682a5cf11badc1
2022/03/22 20:03:47 tools.go:377: [INFO]   Container "550eb61b0fe3e365c173f8ca8c58f6df80107c713e7a83174c682a5cf11badc1" has been stopped
2022/03/22 20:03:48 tools.go:388: [INFO]   Container "550eb61b0fe3e365c173f8ca8c58f6df80107c713e7a83174c682a5cf11badc1" has been removed
2022/03/22 20:03:48 runners.go:106: [DEBUG]  Run(Local): "zfs destroy -R dblab_pool/clone_pre_20220322194620"
2022/03/22 20:03:48 runners.go:151: [DEBUG]  Run(Local): output ""
2022/03/22 20:03:48 runners.go:106: [DEBUG]  Run(Local): "zfs destroy -R dblab_pool@snapshot_20220322194620_pre"
2022/03/22 20:03:48 runners.go:151: [DEBUG]  Run(Local): output ""
2022/03/22 20:03:48 physical.go:441: [INFO]   Snapshot scheduler has been started
2022/03/22 20:03:48 main.go:179: [ERROR]  Failed to run the data retrieval service: failed to promote instance: failed to readiness check: container health check failed
2022/03/22 20:03:48 main.go:180: [INFO]   If you have problems or questions, please contact Postgres.ai: https://postgres.ai/contact

Promote instance PostgreSQL log:

2022-03-22 19:59:03.450 P00   INFO: archive-get command end: aborted with exception [073]
2022-03-22 19:59:03.459 P00   INFO: archive-get command begin 2.37: [00000002.history, pg_wal/RECOVERYHISTORY] --exec-id=2006-392d8520 --log-level-console=detail --pg1-path=/var/lib/dblab/dblab_pool/data --process-max=2 --repo=1 --repo1-path=/pgbackrest --repo1-s3-bucket=vitabaks --repo1-s3-endpoint=s3.amazonaws.com --repo1-s3-key=<redacted> --repo1-s3-key-secret=<redacted> --repo1-s3-region=us-east-2 --repo1-type=s3 --stanza=my_stanza
ERROR: [073]: unable to chdir() to '/var/lib/dblab/dblab_pool/data': [2] No such file or directory
2022-03-22 19:59:03.461 P00   INFO: archive-get command end: aborted with exception [073]
2022-03-22 19:59:08.448 P00   INFO: archive-get command begin 2.37: [0000000100000007000000BE, pg_wal/RECOVERYXLOG] --exec-id=2019-78eb190f --log-level-console=detail --pg1-path=/var/lib/dblab/dblab_pool/data --process-max=2 --repo=1 --repo1-path=/pgbackrest --repo1-s3-bucket=vitabaks --repo1-s3-endpoint=s3.amazonaws.com --repo1-s3-key=<redacted> --repo1-s3-key-secret=<redacted> --repo1-s3-region=us-east-2 --repo1-type=s3 --stanza=my_stanza
ERROR: [073]: unable to chdir() to '/var/lib/dblab/dblab_pool/data': [2] No such file or directory

We are observed an error in the postgresql log because the restore_command specifies an incorrect path to the promote instance data directory, it is simply inherited from the sync instance.

/var/lib/dblab/dblab_pool/data instead of /var/lib/dblab/dblab_pool/clones/clone_pre_20220322194620/data

ubuntu@ip-172-31-11-169:~$ sudo cat /var/lib/dblab/dblab_pool/clones/clone_pre_20220322194620/data/postgresql.dblab.recovery.conf

restore_command = 'pgbackrest --pg1-path=/var/lib/dblab/dblab_pool/data --stanza=my_stanza archive-get %f %p'
recovery_target = 'immediate'
recovery_target_action = 'promote'

ubuntu@ip-172-31-11-169:~$ sudo cat /var/lib/dblab/dblab_pool/clones/clone_pre_20220322194620/data/postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.

# Recovery settings generated by pgBackRest restore on 2022-03-22 19:46:15
restore_command = 'pgbackrest --pg1-path=/var/lib/dblab/dblab_pool/data --stanza=my_stanza archive-get %f "%p"'

ubuntu@ip-172-31-11-169:~$ sudo docker exec -it dblab_promote_c8cic40rn5fmbqu0olo0 bash
root@550eb61b0fe3:/# echo $PGDATA
/var/lib/dblab/dblab_pool/clones/clone_pre_20220322194620/data

Issue: #265 (closed)

pgbackrest adds the restore_command parameter in the postgresql.auto.conf (or recovery.conf for PostgreSQL older than 12), and since the Postgres data directory path for Promote Instance is different, we override the restore_command specifying the variable ${PGDATA} for the pg1-path.

from DLE log:

2022/03/22 19:44:20 physical.go:218: [INFO]   Running restore command:  sudo -Eu postgres pgbackrest --type=standby --pg1-path=/var/lib/dblab/dblab_pool/data --stanza=my_stanza restore

TODO / How to implement

I suggest adding variable ${PGDATA} to the --pg1-path and --recovery-option options to the restore command.

Example:

sudo -Eu postgres pgbackrest --type=standby --pg1-path=${PGDATA} --stanza=my_stanza restore --recovery-option=restore_command='pgbackrest --pg1-path=${PGDATA} --stanza=my_stanza archive-get %f %p'

postgresql.auto.conf

sudo cat /var/lib/dblab/dblab_pool/clones/clone_pre_20220322205238/data/postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.

# Recovery settings generated by pgBackRest restore on 2022-03-22 20:52:32
restore_command = 'pgbackrest --pg1-path=${PGDATA} --stanza=my_stanza archive-get %f %p'

Or, DLE can control the PostgreSQL restore_command parameter for promotion instance, in which case we do not need to specify the --recovery-option parameter in the pgbackrest restore command.

Acceptance criteria

restore_command specifies the correct path to the data directory for the promote instance, or the $PGDATA variable is used

Edited by Vitaliy Kukharik