Working with JSON Data in PostgreSQL vs. MongoDB

JSONB in PostgreSQL
Working with JSON in PostgreSQL vs. MongoDB
Dharshan Rangegowda
Founder, ScaleGrid.io | @dharshanrg

What is JSON?
● JSON stands for Javascript object
notation.
● Open standard format RFC 7159.
● Most popular format to store and
exchange documents.

Why does PostgreSQL need to care about JSON?
• Schema flexibility
• Dealing with transient or changing columns.
• Nested objects
• Might not need to deserialize to query.
• Handling objects from other systems
• E.g. Stripe transaction

PostgreSQL + JSON Timeline

PostgreSQL JSON Support
• Wave 1: PostgreSQL 9.2 (2012) added support for the JSON datatype
• Text field with JSON validation
• No index support
• Wave 2: PostgreSQL 9.4 (2014) added support for JSONB datatype
• Binary data structure to store JSON
• Index support

PostgreSQL JSON Support
• Wave 3: PostgreSQL 12 (2019) added support for SQL/JSON standard
• JSONPath support
• Powerful query and projection engine for JSON data
• Further improvements to JSONPath in PostgreSQL 13
• JSON roadmap

JSON vs. JSONB
• JSONB is what you should be using (in most cases)
• However, there are some scenarios where JSON is useful:
• JSON preserves the original formatting (a.k.a whitespace)
• JSON preserves ordering of the keys
• JSON preserves duplicate keys
• JSON is faster to ingest vs. JSONB

JSONB Anti Patterns
● What is the best way to use JSONB?
○ Do we even need columns any more?
○ Why not just use <int id, jsonb data>?
● JSONB has some high-level limitations you need to
be aware of:
○ Statistics collection
○ Storage bloat
● Commonly occurring fields should be stored as
columns.
○ Use JSONB for variable or intermittent columns.

JSONB Anti Patterns
● PostgreSQL collect stats on column data distribution
○ Most common values (MCV)
○ Fraction of null values
○ Histogram of distribution
● No column statistics collected for JSONB
○ Query planner doesn’t have stats to make smart decisions
○ Could make wrong choice – cross join vs hash join etc
● More details in blog post - When To Avoid JSONB
In A PostgreSQL Schema

JSONB Anti Patterns
● Storage bloat
○ Keys are stored in the data (Similar to MongoDB mmapv1)
○ Use smaller key names to reduce footprint
○ Relies on TOAST compression
○ Sample table with 1M rows (11GB of data)
○ PostgreSQL - 8.7 GB
○ MongoDB Snappy – 8GB, Zlib – 5.3 GB

JSONB & TOAST
● If the size of your column exceeds the
TOAST_TUPLE_THRESHOLD (2KB default) data
could be moved to out of line storage - TOAST
● TOAST also provides compression (pglz)
○ Decent Compression
○ MongoDB WiredTiger snappy/zlib is potentially better
● To access the data it needs to be De’TOASTed
○ Could result in performance overhead

JSONB Data Structures
Images courtesy: https://erthalion.info/2017/12/21/advanced-json-benchmarks/

BSON Data Structures

JSONB Operators
Operator Description
->, ->> Get JSON object field by key
@>, <@ Does the left JSONB value contain the right JSONB path/value entries at
the top level?
?, ?!, ?& Does the string exist as a top-level key within the JSON value?
@@, @@> JSONPath operators
Full list of operators can be found in the docs – JSONB op table

JSONB Functions
• PostgreSQL provides a wide variety of functions to create and process
JSON data
• Creation functions
• Processing functions

MongoDB Query language
• Query language based on JSON syntax
• db.books.find( {} ) , db.books.find( { publisher: "D" } )
• Array operators
• db.books.find( { tags: ["red", "blank"] } )
• AND and OR operators
• db.books.find( { $or: [ { publisher: "A" }, { criticrating: { $lt: 30 } } ] } )

MongoDB Query language
• Query nested documents
• db.books.find( { "size.uom": "in" } )
• Query an Array of objects
• db.books.find( { 'instock.qty': { $lte: 20 } } ))
• Project fields to return from query
• db.books.find( {prints: 1}, { $or: [ { publisher: "A" }, { criticrating: { $lt: 30 } } ] } )

JSONB Indexes
• JSONB provides a wide array of options to index your JSON data.
• We are going to dig into three types of indexes:
• GIN
• BTREE
• HASH

JSONB Indexes : GIN
• GIN stands for “Generalized
Inverted Indexes”
• GIN supports two operator classes
• jsonb_ops
• ?, ?|, ?&, @>, @@, @?
• [Index each key and value]
• jsonb_pathops
• @>, @@, @?
• [Index only the values]
Copyright © ScaleGrid.io

JSON sample data
demo=# select jsonb_pretty(data) from books where id = 1000021;
jsonb_pretty
------------------------------------
{ +
"tags": { +
"nk906585": { +
"ik844766": "iv364087"+
} +
}, +
"prints": [ +
{ +
"price": 100, +
"style": "hc" +
}, +
{ +
"price": 50, +
"style": "pb" +
} +
], +
"braille": false, +
"keywords": [ +
"abc", +
"kef", +
"keh" +
], +
"hardcover": true, +
"publisher": "nVxJVA8Bwx", +
"criticrating": 2 +
}

JSONB Indexes: GIN - ?
Find all books that are available in braille? Let’s create the GIN index on the ‘data’ JSONB column:
CREATE INDEX datagin ON books USING gin (data);
demo=# select * from books where data ? 'braille';
id | author | isbn | rating | data
---------+-----------------+------------+--------+---------------------------------------------------------------------------------------------------------------------
---------------------------------
------------------
1000005 | XEI7xShT8bPu6H7 | 2kD5XJDZUF | 0 | {"tags": {"nk455671": {"ik937456": "iv506075"}}, "braille": true, "keywords": ["abc", "kef", "keh"], "hardcover": false,
"publisher": "zSfZIAjGGs", "
criticrating": 4}
.....
demo=# explain analyze select * from books where data ? 'braille';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on books (cost=12.75..1005.25 rows=1000 width=158) (actual time=0.033..0.039 rows=15 loops=1)
Recheck Cond: (data ? 'braille'::text)
Heap Blocks: exact=2
-> Bitmap Index Scan on datagin (cost=0.00..12.50 rows=1000 width=0) (actual time=0.022..0.022 rows=15 loops=1)
Index Cond: (data ? 'braille'::text)
Planning Time: 0.102 ms
Execution Time: 0.067 ms
(7 rows)

JSONB Indexes: GIN - ?
What if we wanted to find books that were in braille or in hardcover?
demo=# explain analyze select * from books where data ?| array['braille','hardcover'];
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Recheck Cond: (data ?| '{braille,hardcover}'::text[])
Index Cond: (data ?| '{braille,hardcover}'::text[])
(7 rows)

JSONB Indexes: GIN
GIN index supports the “existence” operators only on “top level” keys. If the key is not at the top level, then
the index will not be used.
demo=# select * from books where data->'tags' ? 'nk455671';
---------+-----------------+------------+--------+---------------------------------------------------------------------------------------------------------------------
---------------------------------
------------------
criticrating": 4}
685122 | GWfuvKfQ1PCe1IL | jnyhYYcF66 | 3 | {"tags": {"nk455671": {"ik615925": "iv253423"}}, "publisher": "b2NwVg7VY3", "criticrating": 0}
(2 rows)
demo=# explain analyze select * from books where data->'tags' ? 'nk455671';
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Seq Scan on books (cost=0.00..38807.29 rows=1000 width=158) (actual time=0.018..270.641 rows=2 loops=1)
Filter: ((data -> 'tags'::text) ? 'nk455671'::text)
Rows Removed by Filter: 1000017
(5 rows)

JSONB Indexes: GIN
The way to check for existence in nested docs is to use “Expression indexes”. Let’s create an index on
data->tags:
CREATE INDEX datatagsgin ON books USING gin (data->'tags');
demo=# select * from books where data->'tags' ? 'nk455671';
---------+-----------------+------------+--------+-----------------------------------------------------------------------------------------------------------
criticrating": 4}
685122 | GWfuvKfQ1PCe1IL | jnyhYYcF66 | 3 | {"tags": {"nk455671": {"ik615925": "iv253423"}}, "publisher": "b2NwVg7VY3", "criticrating": 0}
(2 rows)
demo=# explain analyze select * from books where data->'tags' ? 'nk455671';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Recheck Cond: ((data ->'tags'::text) ? 'nk455671'::text)
-> Bitmap Index Scan on datatagsgin (cost=0.00..12.50 rows=1000 width=0) (actual time=0.021..0.021 rows=2 loops=1)
Index Cond: ((data ->'tags'::text) ? 'nk455671'::text)
(7 rows)

JSONB Indexes: GIN - @>
The “path” operator can be used for multi-level queries of your JSON data. Let’s use it similar to the ?
operator.
select * from books where data @> '{"braille":true}'::jsonb;
demo=# explain analyze select * from books where data @> '{"braille":true}'::jsonb;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Recheck Cond: (data @> '{"braille": true}'::jsonb)
Rows Removed by Index Recheck: 9
Index Cond: (data @> '{"braille": true}'::jsonb)
(8 rows)

The "path" operator can be used for multi level queries of your JSON data.
demo=# select * from books where data @> '{"publisher":"XlekfkLOtL"}'::jsonb;
-----+-----------------+------------+--------+-------------------------------------------------------------------------------------
346 | uD3QOvHfJdxq2ez | KiAaIRu8QE | 1 | {"tags": {"nk88": {"ik37": "iv161"}}, "publisher": "XlekfkLOtL", "criticrating": 3}
(1 row)
demo=# explain analyze select * from books where data @> '{"publisher":"XlekfkLOtL"}'::jsonb;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Recheck Cond: (data @> '{"publisher": "XlekfkLOtL"}'::jsonb)
Index Cond: (data @> '{"publisher": "XlekfkLOtL"}'::jsonb)

The JSON queries can be nested to many levels. You can also use the ># operation but GIN does not
support it.
demo=# select * from books where data @> '{"tags":{"nk455671":{"ik937456":"iv506075"}}}'::jsonb;
---------+-----------------+------------+--------+---------------------------------------------------------------------------------------------------------------------
---------------------------------
------------------
criticrating": 4}
(1 row)

JSONB Indexes: GIN - jsonb_pathops
GIN also supports a “pathops” option to reduce the size of the GIN index.
From the docs:
“The technical difference between a jsonb_ops and a jsonb_path_ops GIN index is that the former creates
independent index items for each key and value in the data, while the latter creates index items only for
each value in the data.”
On my small dataset of 1M books, you can see that the pathops GIN index is smaller – you should test
with your dataset to understand the savings.
CREATE INDEX dataginpathops ON books USING gin (data jsonb_path_ops);
public | dataginpathops | index | sgpostgres | books | 67 MB |
public | datatagsgin | index | sgpostgres | books | 84 MB |

Let’s rerun our query from before with the pathops index:
demo=# select * from books where data @> '{"tags":{"nk455671":{"ik937456":"iv506075"}}}'::jsonb;
---------+-----------------+------------+--------+---------------------------------------------------------------------------------------------------------------------
---------------------------------
------------------
criticrating": 4}
(1 row)
demo=# explain select * from books where data @> '{"tags":{"nk455671":{"ik937456":"iv506075"}}}'::jsonb;
QUERY PLAN
-----------------------------------------------------------------------------------------
Bitmap Heap Scan on books (cost=12.75..1005.25 rows=1000 width=158)
Recheck Cond: (data @> '{"tags": {"nk455671": {"ik937456": "iv506075"}}}'::jsonb)
-> Bitmap Index Scan on dataginpathops (cost=0.00..12.50 rows=1000 width=0)
Index Cond: (data @> '{"tags": {"nk455671": {"ik937456": "iv506075"}}}'::jsonb)
(4 rows)

The “jsonb_pathops” option supports only the @> operator.
Smaller index but more limited scenarios.
The following queries below can no longer leverage the GIN index:
select * from books where data ? 'tags'; => Sequential scan
select * from books where data @> '{"tags" :{}}'; => Sequential scan
select * from books where data @> '{"tags" :{"k7888":{}}}' => Sequential scan

JSONB Indexes: B-tree
• B-tree indexes are the most common index type in relational databases.
• If you index an entire JSONB column with a B-tree index, the only useful
operators are the comparison operators:
• =, <, <=, >, >=
• Can be used only for whole object comparisons.
• Very limited use case.

• Use B-tree “Expression indexes”
• B-tree expression indexes can support the common comparison operators '=', '<', '>', '>=', '<=‘ (which
GIN doesn't support).
• Retrieve all books with a data->criticrating > 4.
demo=# select * from books where data->'criticrating' > 4;
ERROR: operator does not exist: jsonb >= integer
LINE 1: select * from books where data->'criticrating’ > 4;
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
#Lets cast JSONB to integer
demo=# select * from books where (data->'criticrating')::int4 > 4;
#If you are using a version prior to pg11 you need to query as text and then cast
demo=# select * from books where (data->>'criticrating')::int4 > 4;

For expression indexes, the index needs to be an exact match with the query expression:
demo=# CREATE INDEX criticrating ON books USING BTREE (((data->'criticrating')::int4));
CREATE INDEX
demo=# explain analyze select * from books where (data->'criticrating')::int4 = 3;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Index Scan using criticrating on books (cost=0.42..4626.93 rows=5000 width=158) (actual time=0.069..70.221 rows=199883 loops=1)
Index Cond: (((data -> 'criticrating'::text))::integer = 3)
(4 rows)
demo=# explain analyze select * from books where (data->'criticrating')::int4 = 3;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Index Scan using criticrating on books (cost=0.42..4626.93 rows=5000 width=158) (actual time=0.069..70.221 rows=199883 loops=1)
Index Cond: (((data -> 'criticrating'::text))::integer = 3)
(4 rows)
1
From above we can see that the BTREE index is being used as expected.

JSONB Indexes: HASH
• If you are only interested in the "=" operator, then Hash indexes become interesting.
• Hash indexes tend to be smaller than B-tree indexes.
CREATE INDEX publisherhash ON books USING HASH ((data->'publisher'));
demo=# select * from books where data->'publisher' = 'XlekfkLOtL'
demo-# ;
-----+-----------------+------------+--------+-------------------------------------------------------------------------------------
346 | uD3QOvHfJdxq2ez | KiAaIRu8QE | 1 | {"tags": {"nk88": {"ik37": "iv161"}}, "publisher": "XlekfkLOtL", "criticrating": 3}
(1 row)
demo=# explain analyze select * from books where data->'publisher' = 'XlekfkLOtL';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Index Scan using publisherhash on books (cost=0.00..2.02 rows=1 width=158) (actual time=0.016..0.017 rows=1 loops=1)
Index Cond: ((data -> 'publisher'::text) = 'XlekfkLOtL'::text)
(4 rows)

JSONB Indexes: GIN - Trigram
• PostgreSQL supports string matching using Trigram indexes.
• Trigrams are basically words broken up into sequences of 3 letters.
• We can search for any arbitrary regex (not just left anchored).
CREATE EXTENSION pg_trgm;
CREATE INDEX publisher ON books USING GIN ((data->'publisher') gin_trgm_ops);
demo=# select * from books where data->'publisher' LIKE '%I0UB%';
----+-----------------+------------+--------+---------------------------------------------------------------------------------
4 | KiEk3xjqvTpmZeS | EYqXO9Nwmm | 0 | {"tags": {"nk3": {"ik1": "iv1"}}, "publisher": "MI0UBqZJDt", "criticrating": 1}
(1 row)
demo=# explain analyze select * from books where data->'publisher' LIKE '%I0UB%';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Recheck Cond: ((data -> 'publisher'::text) ~~ '%I0UB%'::text)
-> Bitmap Index Scan on publisher (cost=0.00..9.75 rows=100 width=0) (actual time=0.025..0.025 rows=1 loops=1)
Index Cond: ((data -> 'publisher'::text) ~~ '%I0UB%'::text)
(7 rows)

JSONB Indexes: GIN - Arrays
• GIN indexes are great for indexing arrays.
• Indexing and searching the keyword array.
CREATE INDEX keywords ON books USING GIN ((data->'keywords') jsonb_path_ops);
demo=# select * from books where data->'keywords' @> '["abc", "keh"]'::jsonb;
---------+-----------------+------------+--------+---------------------------------------------------------------------------------------------------------------------
--------------
1000003 | zEG406sLKQ2IU8O | viPdlu3DZm | 4 | {"tags": {"nk263020": {"ik203820": "iv817928"}}, "keywords": ["abc", "kef", "keh"], "publisher": "7NClevxuTM",
"criticrating": 2}
1000004 | GCe9NypHYKDH4rD | so6TQDYzZ3 | 4 | {"tags": {"nk780341": {"ik397357": "iv632731"}}, "keywords": ["abc", "kef", "keh"], "publisher": "fqaJuAdjP5",
"criticrating": 2}
(2 rows)
demo=# explain analyze select * from books where data->'keywords' @> '["abc", "keh"]'::jsonb;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Recheck Cond: ((data -> 'keywords'::text) @> '["abc", "keh"]'::jsonb)
-> Bitmap Index Scan on keywords (cost=0.00..54.50 rows=1000 width=0) (actual time=0.014..0.014 rows=2 loops=1)
Index Cond: ((data -> 'keywords'::text) @&amp;amp;amp;amp;gt; '["abc", "keh"]'::jsonb)
(7 rows)

SQL/JSON
• SQL standard added support for JSON – SQL Standard-2016 (SQL/JSON).
• SQL/JSON Data model
• JSONPath
• SQL/JSON functions
• With PG12 release, PostgreSQL has one of the best implementations of
SQL/JSON.

SQL/JSON 2016
● A sequence of SQL/JSON items, each item can be (recursively) any of:
○ SQL/JSON scalar — non-null value of SQL types: Unicode character string, numeric, Boolean
or datetime.
○ SQL/JSON null, value that is distinct from any value of any SQL type (not the same as NULL).
○ SQL/JSON arrays, ordered list of zero or more SQL/JSON items — SQL/JSON element
○ SQL/JSON objects — unordered collections of zero or more SQL/JSON members.
■ (key, SQL/JSON item)

JSONPath
.key Returns an object member with the specified key
[*] Wildcard array element accessor that returns all array elements
.* Wildcard member accessor that returns the values of all members located at the top level of
the current object
.** Recursive wildcard member accessor that processes all levels of the JSON hierarchy of the
current object and returns all the member values, regardless of their nesting level
JSONPath allows you to specify an expression (using a syntax similar to the
property access notation in Javascript) to query or project your JSON data.

SQL/JSON Functions
● PG 12 provides several functions to use JSONPATH to query your JSON
data
○ jsonb_path_exists - Checks whether JSON path returns any item for the
specified JSON value
○ jsonb_path_match - Returns the result of JSON path predicate check for
the specified JSON value.
○ jsonb_path_query - Gets all JSON items returned by JSON path for the
specified JSON value.

JSONPath
Finding books by publisher?
demo=# select * from books where data @@ '$.publisher == "ktjKEZ1tvq"';
---------+-----------------+------------+--------+---------------------------------------------------------------------------------------------------------------------
-------------
1000001 | 4RNsovI2haTgU7l | GwSoX67gLS | 2 | {"tags": {"nk542369": {"ik55240": "iv305393"}}, "keywords": ["abc", "def", "geh"], "publisher": "ktjKEZ1tvq",
"criticrating": 0}
(1 row)
demo=# explain analyze select * from books where data @@ '$.publisher == "ktjKEZ1tvq"';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Recheck Cond: (data @@ '($."publisher" == "ktjKEZ1tvq")'::jsonpath)
-&amp;amp;amp;gt; Bitmap Index Scan on datagin (cost=0.00..21.50 rows=1000 width=0) (actual time=0.110..0.110 rows=1 loops=1)
Index Cond: (data @@ '($."publisher" == "ktjKEZ1tvq")'::jsonpath)
(7 rows)

JSONPath
Add a JSONPath filter:
select * from books where jsonb_path_exists(data,'$.publisher ?(@ == "ktjKEZ1tvq")');
Build complicated filter expressions:
select * from books where jsonb_path_exists(data, '$.prints[*] ?(@.style=="hc" && @.price == 100)');
Index support for JSONPath is very limited.
demo=# explain analyze select * from books where jsonb_path_exists(data,'$.publisher ?(@ == "ktjKEZ1tvq")');
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Seq Scan on books (cost=0.00..36307.24 rows=333340 width=158) (actual time=0.019..480.268 rows=1 loops=1)
Filter: jsonb_path_exists(data, '$."publisher"?(@ == "ktjKEZ1tvq")'::jsonpath, '{}'::jsonb, false)
Rows Removed by Filter: 1000028
(5 rows)

JSONPath: Projection JSON
Select the last element of the array
demo=# select jsonb_path_query(data, '$.prints[$.size()]') from books where id = 1000029;
jsonb_path_query
------------------------------
{"price": 50, "style": "pb"}
(1 row)
Select only the hardcover prints from the array
demo=# select jsonb_path_query(data, '$.prints[*] ?(@.style=="hc")') from books where id = 1000029;
jsonb_path_query
-------------------------------
{"price": 100, "style": "hc"}
(1 row)
We can also chain the filters
demo=# select jsonb_path_query(data, '$.prints[*] ?(@.style=="hc") ?(@.price ==100)') from books where id = 1000029;
jsonb_path_query
-------------------------------
{"price": 100, "style": "hc"}
(1 row)

Roadmap
● Improvements to the JSONPath implementation in PG13
● Future Roadmap

Questions?

Working with JSON Data in PostgreSQL vs. MongoDB

Recommended

More Related Content

What's hot (20)

Similar to Working with JSON Data in PostgreSQL vs. MongoDB (20)

More from ScaleGrid.io (7)

Recently uploaded (20)

Working with JSON Data in PostgreSQL vs. MongoDB