--- /dev/null
++++
+title = "Content Storage Format"
+date = "2024-03-03"
+type = "admin-doc"
++++
+
+When building up a documentation store, or assessing platform options, it can be useful to understand how data is stored
+so you know how portable & accessible your content is.
+This page aims to clearly lay-out how content is stored within BookStack and what our general project aims
+are when it comes to data storage, content formats, and how these may lead design & development decisions.
+
+{{<toc>}}
+
+---
+
+### Our Goals & Ideals
+
+When it comes to core content provided into BookStack we want to avoid "locking" users into our platform.
+We believe data should be portable and to common standard formats, so that's what we aim for.
+For core page content we aim to stick to relatively simple HTML, with an aim to Commonmark for markdown
+where used (see ["page content" section](#page-content) below for more details on this).
+
+While we don't officially support export to (or import from) other specific platforms as part
+of the core project, we aim to provide options & guidance so this can be achieved
+where desired.
+
+When assessing features & changes to BookStack, keeping content to simple standards in the interest
+of portability takes a very high level of priority in decisions.
+We have pushed back against many editor & content-format feature requests, in the interest of
+not complicating the content structure.
+
+---
+
+### Storage Formats
+
+BookStack is primarily a database-driven system. The vast majority of data and metadata can be found
+in tables of the attached database. For anything not specifically mentioned below, it's likely in the database.
+If your aim is BookStack instance backup/restore of any kind, then
+you should focus on performing a database dump (in addition to backing up files) as per our
+[backup/restore guidance here](/docs/admin/backup-restore/).
+
+#### Page Content
+
+Page content is primarily stored as HTML, with the `html` column of the `pages` database table.
+We aim (as per our goals) to keep the range of HTML formats limited to common basic HTML with little
+depth/structure complexity where possible. There's a few custom classes used (for alignment & callout blocks)
+but we now try to avoid the addition & use of new custom classes. Some formatting options
+may use inline HTML styles (Text color for example).
+
+For pages written in Markdown, the original input markdown will be stored within the `markdown` column
+of the `pages` table. Within BookStack's official functions, we generally focus on standardising
+markdown support to [commonmark](https://commonmark.org/), with the extra additions of markdown tables, task-lists, and HTML.
+
+Page content may reference other items within BookStack, and other external resources.
+In official functions, we aim to standardise on always using full absolute URLs, rather than any relative
+or custom dynamic references. This ensures such links/references work regardless of usage context, and
+having a full absolute URL provides a base URL/host that can be easily searched upon & detected
+where required.
+
+There is one custom dynamic feature when it comes to page content, that being is our
+[dynamic include tag](/docs/user/reusing-page-content/) system, which provides on-load
+inclusion of other content onto a page. Includes are not stored ready-parsed, they are
+handled at page display time since permissions can affect the result.
+Other than this, we avoid adding extra dynamic/"smart"/"magic" features to core page content.
+
+#### Images & Drawings
+
+Images are stored as standard image files, typically on the local filesystem but that
+can depend on [configured storage method](/docs/admin/upload-config/).
+When uploading, image file names may be altered/generated by BookStack.
+Upon upload BookStack will store the originally provided image file data but also
+create & store resized images for convenience, mainly for reduced file size for more efficient
+loading & display. These system resized images are stored within directories that have names staring with
+`scaled-` or `thumbs-`, with these directories being in the same location as the original image files.
+
+Drawings are treated much the same as images.
+When a drawing is saved in the integrated [diagrams.net](https://diagrams.net) (Previously draw.io)
+editor, they're exported and saved within BookStack as standard PNG images.
+These PNG image files are embedded with the original drawing data, so they can be reloaded back into
+the diagrams.net to be fully editable again. You can drag/import these PNG files into any
+diagrams.net/draw.io instance for full re-editing capabilities.
+
+#### Attachments
+
+Attachments are stored as files, typically on the local filesystem but that
+can depend on [configured storage method](/docs/admin/upload-config/).
+Filenames, including the file extension, will be altered so it may be hard to identify
+attachments on the filesystem by name. If you need to do this, you can
+use the `attachments` table of the database as a reference. The `path` column
+represents attachment file locations, relative to top-level storage location.
+
+---
+
+### Egress Options
+
+*Note: This is not relevant for BookStack-only backup/restore operations, see [our guidance here](/docs/admin/backup-restore/) for that.*
+
+To get data out of BookStack (in bulk) there's two main ways:
+
+- [The BookStack REST API](/docs/admin/hacking-bookstack/#bookstack-api)
+- Fetch/export directly from the database.
+
+The REST API presents a nice scriptable, primarily JSON-based, interface. Various example scripts
+can be found in [our api-scripts repo](https://github.com/BookStackApp/api-scripts).
+The API covers all core content types, including their RAW underlying data.
+The API does provided access to export formats, but most of these may perform some transformation
+or be lossy in operation. The one exception may be the (contained) HTML export option since
+this will attempt to embed image content into the page which could make the content
+more portable, and potentially help avoid having to manage images separately.
+
+Otherwise, directly interfacing with, or exporting from, the database is always an option.
+We attempt to have sensible table and column naming, with a simple overall database structure,
+so navigating around to extract/format data as required shouldn't be too much trouble if confident
+with database systems.
+
+For either of these, some potential pain points for egress (migration away) from BookStack could be:
+
+- Handling image/media/attachment content, and mapping/linking their use from content.
+- Supporting any additional metadata you want to migrate.
+- Handling any complex/custom elements of the content format.
+
+The specifics & overall complexity can ultimately depend on what you need to migrate/transfer to, in addition to the skills & tools you have available.
\ No newline at end of file