BookStack Code Mirror - website/blob - content/docs/admin/content-storage.md

   1 +++
   2 title = "Content Storage Format"
   3 date = "2024-03-03"
   4 type = "admin-doc"
   5 +++
   6
   7 When building up a documentation store, or assessing platform options, it can be useful to understand how data is stored
   8 so you know how portable & accessible your content is.
   9 This page aims to clearly lay-out how content is stored within BookStack and what our general project aims
  10 are when it comes to data storage, content formats, and how these may lead design & development decisions.
  11
  12 {{<toc>}}
  13
  14 ---
  15
  16 ### Our Goals & Ideals
  17
  18 When it comes to core content provided into BookStack we want to avoid "locking" users into our platform.
  19 We believe data should be portable and to common standard formats, so that's what we aim for.
  20 For core page content we aim to stick to relatively simple HTML, with an aim to Commonmark for markdown
  21 where used (see ["page content" section](#page-content) below for more details on this).
  22
  23 While we don't officially support export to (or import from) other specific platforms as part
  24 of the core project, we aim to provide options & guidance so this can be achieved
  25 where desired.
  26
  27 When assessing features & changes to BookStack, keeping content to simple standards in the interest
  28 of portability takes a very high level of priority in decisions.
  29 We have pushed back against many editor & content-format feature requests, in the interest of
  30 not complicating the content structure.
  31
  32 ---
  33
  34 ### Storage Formats
  35
  36 BookStack is primarily a database-driven system. The vast majority of data and metadata can be found
  37 in tables of the attached database. For anything not specifically mentioned below, it's likely in the database.
  38 If your aim is BookStack instance backup/restore of any kind, then
  39 you should focus on performing a database dump (in addition to backing up files) as per our
  40 [backup/restore guidance here](/docs/admin/backup-restore/).
  41
  42 #### Page Content
  43
  44 Page content is primarily stored as HTML, with the `html` column of the `pages` database table.
  45 We aim (as per our goals) to keep the range of HTML formats limited to common basic HTML with little
  46 depth/structure complexity where possible. There's a few custom classes used (for alignment & callout blocks)
  47 but we now try to avoid the addition & use of new custom classes. Some formatting options
  48 may use inline HTML styles (Text color for example).
  49
  50 For pages written in Markdown, the original input markdown will be stored within the `markdown` column
  51 of the `pages` table. Within BookStack's official functions, we generally focus on standardising
  52 markdown support to [commonmark](https://commonmark.org/), with the extra additions of markdown tables, task-lists, and HTML.
  53
  54 Page content may reference other items within BookStack, and other external resources.
  55 In official functions, we aim to standardise on always using full absolute URLs, rather than any relative
  56 or custom dynamic references. This ensures such links/references work regardless of usage context, and
  57 having a full absolute URL provides a base URL/host that can be easily searched upon & detected
  58 where required.
  59
  60 There is one custom dynamic feature when it comes to page content, that being is our
  61 [dynamic include tag](/docs/user/reusing-page-content/) system, which provides on-load
  62 inclusion of other content onto a page. Includes are not stored ready-parsed, they are
  63 handled at page display time since permissions can affect the result.
  64 Other than this, we avoid adding extra dynamic/"smart"/"magic" features to core page content.
  65
  66 #### Images & Drawings
  67
  68 Images are stored as standard image files, typically on the local filesystem but that
  69 can depend on [configured storage method](/docs/admin/upload-config/).
  70 When uploading, image file names may be altered/generated by BookStack.
  71 Upon upload BookStack will store the originally provided image file data but also
  72 create & store resized images for convenience, mainly for reduced file size for more efficient
  73 loading & display. These system resized images are stored within directories that have names staring with
  74 `scaled-` or `thumbs-`, with these directories being in the same location as the original image files.
  75
  76 Drawings are treated much the same as images.
  77 When a drawing is saved in the integrated [diagrams.net](https://diagrams.net) (Previously draw.io)
  78 editor, they're exported and saved within BookStack as standard PNG images.
  79 These PNG image files are embedded with the original drawing data, so they can be reloaded back into
  80 the diagrams.net to be fully editable again. You can drag/import these PNG files into any
  81 diagrams.net/draw.io instance for full re-editing capabilities.
  82
  83 #### Attachments
  84
  85 Attachments are stored as files, typically on the local filesystem but that
  86 can depend on [configured storage method](/docs/admin/upload-config/).
  87 Filenames, including the file extension, will be altered so it may be hard to identify
  88 attachments on the filesystem by name. If you need to do this, you can
  89 use the `attachments` table of the database as a reference. The `path` column
  90 represents attachment file locations, relative to top-level storage location.
  91
  92 ---
  93
  94 ### Egress Options
  95
  96 *Note: This is not relevant for BookStack-only backup/restore operations, see [our guidance here](/docs/admin/backup-restore/) for that.*
  97
  98 To get data out of BookStack (in bulk) there's two main ways:
  99
 100 - [The BookStack REST API](/docs/admin/hacking-bookstack/#bookstack-api)
 101 - Fetch/export directly from the database.
 102
 103 The REST API presents a nice scriptable, primarily JSON-based, interface. Various example scripts
 104 can be found in [our api-scripts repo](https://github.com/BookStackApp/api-scripts).
 105 The API covers all core content types, including their RAW underlying data.
 106 The API does provided access to export formats, but most of these may perform some transformation
 107 or be lossy in operation. The one exception may be the (contained) HTML export option since
 108 this will attempt to embed image content into the page which could make the content
 109 more portable, and potentially help avoid having to manage images separately.
 110
 111 Otherwise, directly interfacing with, or exporting from, the database is always an option.
 112 We attempt to have sensible table and column naming, with a simple overall database structure,
 113 so navigating around to extract/format data as required shouldn't be too much trouble if confident
 114 with database systems.
 115
 116 For either of these, some potential pain points for egress (migration away) from BookStack could be:
 117
 118 - Handling image/media/attachment content, and mapping/linking their use from content.
 119 - Supporting any additional metadata you want to migrate.
 120 - Handling any complex/custom elements of the content format.
 121
 122 The specifics & overall complexity can ultimately depend on what you need to migrate/transfer to, in addition to the skills & tools you have available.