From: Tom Lane Date: Wed, 28 Oct 2020 20:31:40 +0000 (-0400) Subject: Doc: clean up verify_heapam() documentation. X-Git-Tag: REL_14_BETA1~1404 X-Git-Url: http://git.postgresql.org/gitweb/?a=commitdiff_plain;h=4c49d8fc15eeb1dc69b0ddb2d986a1884a5d7f5f;p=postgresql.git Doc: clean up verify_heapam() documentation. I started with the intention of just suppressing a PDF build warning by removing the example output, but ended up doing more: correcting factual errors in the function's signature, moving a bunch of generalized handwaving into the "Using amcheck Effectively" section which seemed a better place for it, and improving wording and markup a little bit. Discussion: https://postgr.es/m/732904.1603728748@sss.pgh.pa.us --- diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml index 25e4bb2bfec..99fad708bf7 100644 --- a/doc/src/sgml/amcheck.sgml +++ b/doc/src/sgml/amcheck.sgml @@ -83,7 +83,7 @@ AND c.relpersistence != 't' -- Function may throw an error when this is omitted: AND c.relkind = 'i' AND i.indisready AND i.indisvalid ORDER BY c.relpages DESC LIMIT 10; - bt_index_check | relname | relpages + bt_index_check | relname | relpages ----------------+---------------------------------+---------- | pg_depend_reference_index | 43 | pg_depend_depender_index | 40 @@ -208,14 +208,14 @@ SET client_min_messages = DEBUG1; verify_heapam(relation regclass, on_error_stop boolean, check_toast boolean, - skip cstring, + skip text, startblock bigint, endblock bigint, blkno OUT bigint, offnum OUT integer, attnum OUT integer, msg OUT text) - returns record + returns setof record @@ -223,89 +223,17 @@ SET client_min_messages = DEBUG1; Checks a table for structural corruption, where pages in the relation contain data that is invalidly formatted, and for logical corruption, where pages are structurally valid but inconsistent with the rest of the - database cluster. Example usage: - -test=# select * from verify_heapam('mytable', check_toast := true); - blkno | offnum | attnum | msg --------+--------+--------+-------------------------------------------------------------------------------------------------- - 17 | 12 | | xmin 4294967295 precedes relation freeze threshold 17:1134217582 - 960 | 4 | | data begins at offset 152 beyond the tuple length 58 - 960 | 4 | | tuple data should begin at byte 24, but actually begins at byte 152 (3 attributes, no nulls) - 960 | 5 | | tuple data should begin at byte 24, but actually begins at byte 27 (3 attributes, no nulls) - 960 | 6 | | tuple data should begin at byte 24, but actually begins at byte 16 (3 attributes, no nulls) - 960 | 7 | | tuple data should begin at byte 24, but actually begins at byte 21 (3 attributes, no nulls) - 1147 | 2 | | number of attributes 2047 exceeds maximum expected for table 3 - 1147 | 10 | | tuple data should begin at byte 280, but actually begins at byte 24 (2047 attributes, has nulls) - 1147 | 15 | | number of attributes 67 exceeds maximum expected for table 3 - 1147 | 16 | 1 | attribute 1 with length 4294967295 ends at offset 416848000 beyond total tuple length 58 - 1147 | 18 | 2 | final toast chunk number 0 differs from expected value 6 - 1147 | 19 | 2 | toasted value for attribute 2 missing from toast table - 1147 | 21 | | tuple is marked as only locked, but also claims key columns were updated - 1147 | 22 | | multitransaction ID 1775655 is from before relation cutoff 2355572 -(14 rows) - - As this example shows, the Tuple ID (TID) of the corrupt tuple is given - in the (blkno, offnum) columns, and - for corruptions specific to a particular attribute in the tuple, the - attnum field shows which one. - - - Structural corruption can happen due to faulty storage hardware, or - relation files being overwritten or modified by unrelated software. - This kind of corruption can also be detected with - data page - checksums. - - - Relation pages which are correctly formatted, internally consistent, and - correct relative to their own internal checksums may still contain - logical corruption. As such, this kind of corruption cannot be detected - with checksums. Examples include toasted - values in the main table which lack a corresponding entry in the toast - table, and tuples in the main table with a Transaction ID that is older - than the oldest valid Transaction ID in the database or cluster. - - - Multiple causes of logical corruption have been observed in production - systems, including bugs in the PostgreSQL - server software, faulty and ill-conceived backup and restore tools, and - user error. - - - Corrupt relations are most concerning in live production environments, - precisely the same environments where high risk activities are least - welcome. For this reason, verify_heapam has been - designed to diagnose corruption without undue risk. It cannot guard - against all causes of backend crashes, as even executing the calling - query could be unsafe on a badly corrupted system. Access to catalog tables are performed and could - be problematic if the catalogs themselves are corrupted. - - - The design principle adhered to in verify_heapam is - that, if the rest of the system and server hardware are correct, under - default options, verify_heapam will not crash the - server due merely to structural or logical corruption in the target - table. - - - The check_toast attempts to reconcile the target - table against entries in its corresponding toast table. This option is - disabled by default and is known to be slow. - If the target relation's corresponding toast table or toast index is - corrupt, reconciling the target table against toast values could - conceivably crash the server, although in many cases this would - just produce an error. + database cluster. The following optional arguments are recognized: - on_error_stop + on_error_stop - If true, corruption checking stops at the end of the first block on + If true, corruption checking stops at the end of the first block in which any corruptions are found. @@ -314,23 +242,29 @@ test=# select * from verify_heapam('mytable', check_toast := true); - check_toast + check_toast - If true, toasted values are checked gainst the corresponding + If true, toasted values are checked against the target relation's TOAST table. + + This option is known to be slow. Also, if the toast table or its + index is corrupt, checking it against toast values could conceivably + crash the server, although in many cases this would just produce an + error. + Defaults to false. - skip + skip If not none, corruption checking skips blocks that - are marked as all-visible or all-frozen, as given. + are marked as all-visible or all-frozen, as specified. Valid options are all-visible, all-frozen and none. @@ -340,7 +274,7 @@ test=# select * from verify_heapam('mytable', check_toast := true); - startblock + startblock If specified, corruption checking begins at the specified block, @@ -349,12 +283,12 @@ test=# select * from verify_heapam('mytable', check_toast := true); target table. - By default, does not skip any blocks. + By default, checking begins at the first block. - endblock + endblock If specified, corruption checking ends at the specified block, @@ -363,7 +297,7 @@ test=# select * from verify_heapam('mytable', check_toast := true); table. - By default, does not skip any blocks. + By default, all blocks are checked. @@ -374,7 +308,7 @@ test=# select * from verify_heapam('mytable', check_toast := true); - blkno + blkno The number of the block containing the corrupt page. @@ -382,7 +316,7 @@ test=# select * from verify_heapam('mytable', check_toast := true); - offnum + offnum The OffsetNumber of the corrupt tuple. @@ -390,7 +324,7 @@ test=# select * from verify_heapam('mytable', check_toast := true); - attnum + attnum The attribute number of the corrupt column in the tuple, if the @@ -399,10 +333,10 @@ test=# select * from verify_heapam('mytable', check_toast := true); - msg + msg - A human readable message describing the corruption in the page. + A message describing the problem detected. @@ -460,7 +394,7 @@ test=# select * from verify_heapam('mytable', check_toast := true); amcheck can be effective at detecting various types of failure modes that data page - checksums will always fail to catch. These include: + checksums will fail to catch. These include: @@ -557,6 +491,45 @@ test=# select * from verify_heapam('mytable', check_toast := true); + + + + Structural corruption can happen due to faulty storage hardware, or + relation files being overwritten or modified by unrelated software. + This kind of corruption can also be detected with + data page + checksums. + + + + Relation pages which are correctly formatted, internally consistent, and + correct relative to their own internal checksums may still contain + logical corruption. As such, this kind of corruption cannot be detected + with checksums. Examples include toasted + values in the main table which lack a corresponding entry in the toast + table, and tuples in the main table with a Transaction ID that is older + than the oldest valid Transaction ID in the database or cluster. + + + + Multiple causes of logical corruption have been observed in production + systems, including bugs in the PostgreSQL + server software, faulty and ill-conceived backup and restore tools, and + user error. + + + + Corrupt relations are most concerning in live production environments, + precisely the same environments where high risk activities are least + welcome. For this reason, verify_heapam has been + designed to diagnose corruption without undue risk. It cannot guard + against all causes of backend crashes, as even executing the calling + query could be unsafe on a badly corrupted system. Access to catalog tables are performed and could + be problematic if the catalogs themselves are corrupted. + + + In general, amcheck can only prove the presence of corruption; it cannot prove its absence.