SlideShare a Scribd company logo
Postgres Demystified
                        Craig Kerstiens
                        @craigkerstiens
                        http://www.craigkerstiens.com
https://speakerdeck.com/u/craigkerstiens/p/postgres-demystified
Postgres Demystified
Postgres Demystified
Postgres Demystified
                   We’re
                   Hiring
Getting Setup
            Postgres.app
Agenda
     Brief History
Developing w/ Postgres
 Postgres Performance
       Querying
Postgres History
                  Post Ingress
 Postgres
             Around since 1989/1995
PostgresQL
             Community Driven/Owned
MVCC

Each query sees transactions committed before it
   Locks for writing don’t conflict with reading
Why Postgres
Why Postgres

“ its the emacs of databases”
Developing w/ Postgres
Basics
psql is your friend
Basics
psql is your friend
  #   dt
  #   d
  #   d tablename
  #   x
  #   e
Datatypes
                                                  UUID
                            date                          boolean
    timestamptz array                 interval integer
      smallint line        bigint XML
                                             enum char
                                 money
             serial bytea                        point        float
inet                        polygon numeric                circle
            cidr   varchar          tsquery
                               text                timetz
  path              time                timestamp box
            macaddr
                          tsvector
Datatypes
                                                   UUID
                             date                          boolean
    timestamptz array                  interval integer
      smallint line         bigint XML
                                              enum char
                                  money
              serial bytea                        point        float
inet                         polygon numeric                circle
             cidr   varchar          tsquery
                                text                timetz
  path               time                timestamp box
            macaddr
                           tsvector
Datatypes
                                                   UUID
                             date                          boolean
    timestamptz array                  interval integer
      smallint line         bigint XML
                                              enum char
                                  money
              serial bytea                        point        float
inet                         polygon numeric                circle
             cidr   varchar          tsquery
                                text                timetz
  path               time                timestamp box
            macaddr
                           tsvector
Datatypes
                                                  UUID
                            date                          boolean
    timestamptz array                 interval integer
      smallint line        bigint XML
                                             enum char
                                 money
             serial bytea                        point         float
inet                        polygon numeric                 circle
            cidr   varchar          tsquery
                               text                timetz
  path              time                timestamp
            macaddr                                    box
                          tsvector
Datatypes
CREATE TABLE items (
    id serial NOT NULL,
    name varchar (255),
    tags varchar(255) [],
	

 created_at timestamp
);
Datatypes
CREATE TABLE items (
    id serial NOT NULL,
    name varchar (255),
    tags varchar(255) [],
	

 created_at timestamp
);
Datatypes
INSERT INTO items
VALUES (1, 'Ruby Gem', '{“Programming”,”Jewelry”}', now());

INSERT INTO items
VALUES (2, 'Django Pony', '{“Programming”,”Animal”}', now());
Datatypes
                                                  UUID
                            date                          boolean
    timestamptz array                 interval integer
      smallint line        bigint XML
                                             enum char
                                 money
             serial bytea                        point        float
inet                        polygon numeric                circle
            cidr   varchar          tsquery
                               text                timetz
  path              time                timestamp box
            macaddr
                          tsvector
Extensions
 dblink      hstore                trigram
                      uuid-ossp            pgstattuple
      citext                       pgrowlocks
                    pgcrypto
  isn         ltree            fuzzystrmatch
       cube         earthdistance dict_int
             tablefunc
unaccent                      btree_gist      dict_xsyn
Extensions
 dblink       hstore               trigram
                       uuid-ossp           pgstattuple
      citext                       pgrowlocks
                    pgcrypto
  isn         ltree            fuzzystrmatch
       cube         earthdistance dict_int
             tablefunc
unaccent                      btree_gist      dict_xsyn
NoSQL in your SQL
CREATE EXTENSION hstore;
CREATE TABLE users (
   id integer NOT NULL,
   email character varying(255),
   data hstore,
   created_at timestamp without time zone,
   last_login timestamp without time zone
);
hStore
INSERT INTO users
VALUES (
  1,
  'craig.kerstiens@gmail.com',
  'sex => "M", state => “California”',
  now(),
  now()
);
JSON
SELECT
  '{"id":1,"email": "craig.kerstiens@gmail.com",}'::json;


                          V8 w/ PLV8

                                                            9.2
JSON
SELECT
  '{"id":1,"email": "craig.kerstiens@gmail.com",}'::json;


                          V8 w/ PLV8

                                                            9.2
JSON
SELECT
  '{"id":1,"email": "craig.kerstiens@gmail.com",}'::json;



create or replace function
                          V8 w/ PLV8
js(src text) returns text as $$
  return eval(
  "(function() { " + src + "})"
  )();
$$ LANGUAGE plv8;                                           9.2
JSON
SELECT
  '{"id":1,"email": "craig.kerstiens@gmail.com",}'::json;



create or replace function
                          V8 w/ PLV8
js(src text) returns text as $$
  return eval(                               Bad Idea
  "(function() { " + src + "})"
  )();
$$ LANGUAGE plv8;                                           9.2
Range Types



              9.2
Range Types
CREATE TABLE talks (room int, during tsrange);
INSERT INTO talks VALUES
  (3, '[2012-09-24 13:00, 2012-09-24 13:50)');




                                                 9.2
Range Types
CREATE TABLE talks (room int, during tsrange);
INSERT INTO talks VALUES
  (3, '[2012-09-24 13:00, 2012-09-24 13:50)');


ALTER TABLE talks ADD EXCLUDE USING gist (during WITH &&);
INSERT INTO talks VALUES
   (1108, '[2012-09-24 13:30, 2012-09-24 14:00)');
ERROR: conflicting key value violates exclusion constraint
"talks_during_excl"
                                                       9.2
Full Text Search
Full Text Search
TSVECTOR - Text Data
TSQUERY - Search Predicates

Specialized Indexes and Operators
Datatypes
                                                  UUID
                            date                          boolean
    timestamptz array                 interval integer
      smallint line        bigint XML
                                             enum char
                                 money
             serial bytea                        point         float
inet                        polygon numeric                 circle
            cidr   varchar          tsquery
                               text                timetz
  path              time                timestamp
            macaddr                                    box
                          tsvector
PostGIS
PostGIS
1. New datatypes   i.e. (2d/3d boxes)
PostGIS
1. New datatypes   i.e. (2d/3d boxes)

2. New operators   i.e. SELECT foo && bar ...
PostGIS
1. New datatypes     i.e. (2d/3d boxes)

2. New operators     i.e. SELECT foo && bar ...

3. Understand relationships and distance
                     i.e. person within location, nearest distance
Postgres demystified
Performance
Sequential Scans
Sequential Scans

They’re Bad
Sequential Scans

They’re Bad (most of the time)
Indexes
Indexes

They’re Good
Indexes

They’re Good (most of the time)
Indexes
B-Tree
Generalized Inverted Index (GIN)
Generalized Search Tree (GIST)
K Nearest Neighbors (KNN)
Space Partitioned GIST (SP-GIST)
Indexes
B-Tree
  Default
  Usually want this
Indexes
Generalized Inverted Index (GIN)
  Use with multiple values in 1 column
  Array/hStore
Indexes
Generalized Search Tree (GIST)
  Full text search
  Shapes
Understanding Query Perf
           Given
    SELECT last_name
    FROM employees
    WHERE salary >= 50000;
Explain
# EXPLAIN SELECT last_name FROM employees WHERE salary >=
50000;
                         QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6)
  Filter: (salary >= 50000)
(3 rows)
Explain
# EXPLAIN SELECT last_name FROM employees WHERE salary >=
50000;
     Startup Cost
                         QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6)
  Filter: (salary >= 50000)
(3 rows)
Explain
# EXPLAIN SELECT last_name FROM employees WHERE salary >=
50000;                                             Max Time
     Startup Cost
                         QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6)
  Filter: (salary >= 50000)
(3 rows)
Explain
# EXPLAIN SELECT last_name FROM employees WHERE salary >=
50000;                                             Max Time
     Startup Cost
                         QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6)
  Filter: (salary >= 50000)
(3 rows)
                                                 Rows Returned
Explain
# EXPLAIN ANALYZE SELECT last_name FROM employees WHERE
salary >= 50000;
                         QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual
time=2.401..295.247 rows=1428 loops=1)
  Filter: (salary >= 50000)
Total runtime: 295.379
(3 rows)
Explain
# EXPLAIN ANALYZE SELECT last_name FROM employees WHERE
salary >= 50000;
     Startup Cost        QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual
time=2.401..295.247 rows=1428 loops=1)
  Filter: (salary >= 50000)
Total runtime: 295.379
(3 rows)
Explain
# EXPLAIN ANALYZE SELECT last_name FROM employees WHERE
salary >= 50000;
     Startup Cost        QUERY PLAN
                                 Max Time
-------------------------------------------------------------------------
Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual
time=2.401..295.247 rows=1428 loops=1)
  Filter: (salary >= 50000)
Total runtime: 295.379
(3 rows)
Explain
# EXPLAIN ANALYZE SELECT last_name FROM employees WHERE
salary >= 50000;
     Startup Cost        QUERY PLAN
                                 Max Time
-------------------------------------------------------------------------
Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual
time=2.401..295.247 rows=1428 loops=1)
  Filter: (salary >= 50000)
Total runtime: 295.379
(3 rows)                           Rows Returned
Explain
# EXPLAIN ANALYZE SELECT last_name FROM employees WHERE
salary >= 50000;
     Startup Cost        QUERY PLAN
                                 Max Time
-------------------------------------------------------------------------
Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual
time=2.401..295.247 rows=1428 loops=1)
  Filter: (salary >= 50000)
Total runtime: 295.379
(3 rows)                           Rows Returned
# CREATE INDEX idx_emps ON employees (salary);
# CREATE INDEX idx_emps ON employees (salary);
# EXPLAIN ANALYZE SELECT last_name FROM employees WHERE
salary >= 50000;
                         QUERY PLAN
-------------------------------------------------------------------------
Index Scan using idx_emps on employees (cost=0.00..8.49 rows=1
width=6) (actual time = 0.047..1.603 rows=1428 loops=1)
  Index Cond: (salary >= 50000)
Total runtime: 1.771 ms
(3 rows)
# CREATE INDEX idx_emps ON employees (salary);
# EXPLAIN ANALYZE SELECT last_name FROM employees WHERE
salary >= 50000;
                         QUERY PLAN
-------------------------------------------------------------------------
Index Scan using idx_emps on employees (cost=0.00..8.49 rows=1
width=6) (actual time = 0.047..1.603 rows=1428 loops=1)
  Index Cond: (salary >= 50000)
Total runtime: 1.771 ms
(3 rows)
Indexes Pro Tips
Indexes Pro Tips
CREATE INDEX CONCURRENTLY
Indexes Pro Tips
CREATE INDEX CONCURRENTLY
CREATE INDEX WHERE foo=bar
Indexes Pro Tips
CREATE INDEX CONCURRENTLY
CREATE INDEX WHERE foo=bar
SELECT * WHERE foo LIKE ‘%bar% is BAD
Indexes Pro Tips
CREATE INDEX CONCURRENTLY
CREATE INDEX WHERE foo=bar
SELECT * WHERE foo LIKE ‘%bar% is BAD
SELECT * WHERE Food LIKE ‘bar%’ is OKAY
Extensions
 dblink      hstore                trigram
                      uuid-ossp            pgstattuple
      citext                       pgrowlocks
                    pgcrypto
  isn         ltree            fuzzystrmatch
       cube         earthdistance dict_int
             tablefunc
unaccent                      btree_gist      dict_xsyn
Postgres demystified
Cache Hit Rate
SELECT
       'index hit rate' as name,
       (sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read) as ratio
     FROM pg_statio_user_indexes
     union all
     SELECT
      'cache hit rate' as name,
       case sum(idx_blks_hit)
         when 0 then 'NaN'::numeric
         else to_char((sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read),
'99.99')::numeric
       end as ratio
     FROM pg_statio_user_indexes;)
Index Hit Rate
SELECT
 relname,
 100 * idx_scan / (seq_scan + idx_scan),
 n_live_tup
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC;
Index Hit Rate
        relname      | percent_of_times_index_used | rows_in_table
---------------------+-----------------------------+---------------
 events              |                           0 |        669917
 app_infos_user_info |                           0 |        198218
 app_infos           |                          50 |        175640
 user_info           |                           3 |         46718
 rollouts            |                           0 |         34078
 favorites           |                           0 |          3059
 schema_migrations   |                           0 |             2
 authorizations      |                           0 |             0
 delayed_jobs        |                          23 |             0
pg_stats_statements



                      9.2
pg_stats_statements
$ select * from pg_stat_statements where query ~ 'from users where email';
userid                      │   16384
dbid                        │   16388
query                       │   select * from users where email = ?;
calls                       │   2
total_time                  │   0.000268
rows                        │   2
shared_blks_hit             │   16
shared_blks_read            │   0
shared_blks_dirtied         │   0
shared_blks_written         │   0
local_blks_hit              │   0
local_blks_read             │   0
local_blks_dirtied          │   0



                                                                             9.2
local_blks_written          │   0
temp_blks_read              │   0
temp_blks_written           │   0
time_read                   │   0
time_write                  │   0
pg_stats_statements



                      9.2
pg_stats_statements
SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit /
             nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
         FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5;
----------------------------------------------------------------------
query                 | UPDATE pgbench_branches SET bbalance
= bbalance + ? WHERE bid = ?;
calls                 | 3000
total_time | 9609.00100000002
rows                  | 2836
hit_percent | 99.9778970000200936
                                                               9.2
Postgres demystified
Querying
Window Functions
Example:
 Biggest spender by state
Window Functions
SELECT
 email,
 users.data->'state',
 sum(total(items)),
 rank() OVER
  (PARTITION BY users.data->'state'
  ORDER BY sum(total(items)) desc)
FROM
 users, purchases
WHERE purchases.user_id = users.id
GROUP BY 1, 2;
Window Functions
SELECT
 email,
 users.data->'state',
 sum(total(items)),
 rank() OVER
  (PARTITION BY users.data->'state'
  ORDER BY sum(total(items)) desc)
FROM
 users, purchases
WHERE purchases.user_id = users.id
GROUP BY 1, 2;
Extensions
 dblink      hstore                trigram
                      uuid-ossp            pgstattuple
      citext                       pgrowlocks
                    pgcrypto
  isn         ltree            fuzzystrmatch
       cube         earthdistance dict_int
             tablefunc
unaccent                      btree_gist      dict_xsyn
Fuzzystrmatch
Fuzzystrmatch
SELECT soundex('Craig'), soundex('Will'), difference('Craig', 'Will');
Fuzzystrmatch
SELECT soundex('Craig'), soundex('Will'), difference('Craig', 'Will');


SELECT soundex('Craig'), soundex('Greg'), difference('Craig', 'Greg');
SELECT soundex('Willl'), soundex('Will'), difference('Willl', 'Will');
Moving Data Around
copy (SELECT * FROM users) TO ‘~/users.csv’;

copy users FROM ‘~/users.csv’;
db_link
SELECT dblink_connect('myconn', 'dbname=postgres');
SELECT * FROM dblink('myconn','SELECT * FROM foo') AS t(a int, b text);

   a |      b
-------+------------
   1 | example
   2 | example2
Foreign Data Wrappers
oracle             mysql             odbc
         twitter          sybase redis           jdbc
files                couch             s3
        www                ldap
                                      informix
          mongodb
Foreign Data Wrappers
CREATE EXTENSION redis_fdw;

CREATE SERVER redis_server
	

 FOREIGN DATA WRAPPER redis_fdw
	

 OPTIONS (address '127.0.0.1', port '6379');

CREATE FOREIGN TABLE redis_db0 (key text, value text)
	

 SERVER redis_server
	

 OPTIONS (database '0');

CREATE USER MAPPING FOR PUBLIC
	

 SERVER redis_server
	

 OPTIONS (password 'secret');
Query Redis from Postgres
SELECT *
FROM redis_db0;

SELECT
 id,
 email,
 value as visits
FROM
 users,
 redis_db0
WHERE ('user_' || cast(id as text)) = cast(redis_db0.key as text)
   AND cast(value as int) > 40;
All are not equal
oracle             mysql             odbc
         twitter          sybase redis           jdbc
files                couch             s3
        www                ldap
                                      informix
          mongodb
Production Ready FDWs
oracle             mysql             odbc
         twitter          sybase redis           jdbc
files                couch             s3
        www                ldap
                                      informix
          mongodb
Readability
Readability
WITH top_5_products AS (
  SELECT products.*, count(*)
  FROM products, line_items
  WHERE products.id = line_items.product_id
  GROUP BY products.id
  ORDER BY count(*) DESC
  LIMIT 5
)

SELECT users.email, count(*)
FROM users, line_items, top_5_products
WHERE line_items.user_id = users.id
 AND line_items.product_id = top_5_products.id
GROUP BY 1
ORDER BY 1;
Common Table Expressions
WITH top_5_products AS (
  SELECT products.*, count(*)
  FROM products, line_items
  WHERE products.id = line_items.product_id
  GROUP BY products.id
  ORDER BY count(*) DESC
  LIMIT 5
)

SELECT users.email, count(*)
FROM users, line_items, top_5_products
WHERE line_items.user_id = users.id
 AND line_items.product_id = top_5_products.id
GROUP BY 1
ORDER BY 1;
Common Table Expressions
WITH top_5_products AS (
  SELECT products.*, count(*)
  FROM products, line_items
  WHERE products.id = line_items.product_id
  GROUP BY products.id
  ORDER BY count(*) DESC
  LIMIT 5
)

SELECT users.email, count(*)
FROM users, line_items, top_5_products
WHERE line_items.user_id = users.id
 AND line_items.product_id = top_5_products.id
GROUP BY 1
ORDER BY 1;
Common Table Expressions
WITH top_5_products AS (
  SELECT products.*, count(*)
  FROM products, line_items
  WHERE products.id = line_items.product_id
  GROUP BY products.id
  ORDER BY count(*) DESC
  LIMIT 5
)

SELECT users.email, count(*)
FROM users, line_items, top_5_products
WHERE line_items.user_id = users.id
 AND line_items.product_id = top_5_products.id
GROUP BY 1
                              ’
ORDER BY 1;
Common Table Expressions
WITH top_5_products AS (
  SELECT products.*, count(*)
  FROM products, line_items
  WHERE products.id = line_items.product_id
  GROUP BY products.id
  ORDER BY count(*) DESC
  LIMIT 5
)

SELECT users.email, count(*)
FROM users, line_items, top_5_products
WHERE line_items.user_id = users.id
 AND line_items.product_id = top_5_products.id
GROUP BY 1
                Don’t do this in production
                              ’
ORDER BY 1;
Brief History
Developing w/ Postgres
 Postgres Performance
       Querying
Extras
Extras
Listen/Notify
Extras
Listen/Notify
Per Transaction Synchronous Replication
Extras
Listen/Notify
Per Transaction Synchronous Replication
SELECT for UPDATE
Postgres - TLDR
         Datatypes                Fast Column Addition
    Conditional Indexes               Listen/Notify
     Transactional DDL              Table Inheritance
  Foreign Data Wrappers     Per Transaction sync replication
Concurrent Index Creation          Window functions
        Extensions                  NoSQL inside SQL
Common Table Expressions               Momentum
Questions?
                        Craig Kerstiens
                        @craigkerstiens
                        http://www.craigkerstiens.com
https://speakerdeck.com/u/craigkerstiens/p/postgres-demystified

More Related Content

Similar to Postgres demystified (20)

PDF
Going beyond Django ORM limitations with Postgres
Craig Kerstiens
 
PDF
Heroku Postgres SQL Tips, Tricks, Hacks
Salesforce Developers
 
PDF
Heroku Postgres Cloud Database Webinar
Salesforce Developers
 
PDF
Postgresql 9.3 overview
Aveic
 
PDF
Drizzles Approach To Improving Performance Of The Server
PerconaPerformance
 
PPTX
PostgreSQL.pptx
MAYURUGALE6
 
PDF
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
Dipti Borkar
 
PDF
Transition from relational to NoSQL Philly DAMA Day
Dipti Borkar
 
PDF
Introduction to NoSQL and Couchbase
Dipti Borkar
 
PDF
On Beyond (PostgreSQL) Data Types
Jonathan Katz
 
PDF
Extensions on PostgreSQL
Alpaca
 
PDF
50 Ways To Love Your Project
PostgreSQL Experts, Inc.
 
PDF
Exalead managing terrabytes
Jérémie BORDIER
 
PDF
Introduction to Tokyo Products
Mikio Hirabayashi
 
PDF
Introduction to tokyo products
jacky wu
 
PPTX
Nosql
Brandon Byars
 
PDF
PostgreSQL Modules Tutorial - chkpass, hstore, fuzzystrmach, isn
Sagar Arlekar
 
PDF
CMF: a pain in the F @ PHPDay 05-14-2011
Alessandro Nadalin
 
PDF
Breaking the-database-type-barrier-replicating-across-different-dbms
Linas Virbalas
 
Going beyond Django ORM limitations with Postgres
Craig Kerstiens
 
Heroku Postgres SQL Tips, Tricks, Hacks
Salesforce Developers
 
Heroku Postgres Cloud Database Webinar
Salesforce Developers
 
Postgresql 9.3 overview
Aveic
 
Drizzles Approach To Improving Performance Of The Server
PerconaPerformance
 
PostgreSQL.pptx
MAYURUGALE6
 
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
Dipti Borkar
 
Transition from relational to NoSQL Philly DAMA Day
Dipti Borkar
 
Introduction to NoSQL and Couchbase
Dipti Borkar
 
On Beyond (PostgreSQL) Data Types
Jonathan Katz
 
Extensions on PostgreSQL
Alpaca
 
50 Ways To Love Your Project
PostgreSQL Experts, Inc.
 
Exalead managing terrabytes
Jérémie BORDIER
 
Introduction to Tokyo Products
Mikio Hirabayashi
 
Introduction to tokyo products
jacky wu
 
PostgreSQL Modules Tutorial - chkpass, hstore, fuzzystrmach, isn
Sagar Arlekar
 
CMF: a pain in the F @ PHPDay 05-14-2011
Alessandro Nadalin
 
Breaking the-database-type-barrier-replicating-across-different-dbms
Linas Virbalas
 

Recently uploaded (20)

PDF
Proactive Server and System Monitoring with FME: Using HTTP and System Caller...
Safe Software
 
PDF
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
PDF
Enhancing Environmental Monitoring with Real-Time Data Integration: Leveragin...
Safe Software
 
PDF
ArcGIS Utility Network Migration - The Hunter Water Story
Safe Software
 
PDF
How to Comply With Saudi Arabia’s National Cybersecurity Regulations.pdf
Bluechip Advanced Technologies
 
PDF
Hello I'm "AI" Your New _________________
Dr. Tathagat Varma
 
PDF
5 Things to Consider When Deploying AI in Your Enterprise
Safe Software
 
PDF
Bitkom eIDAS Summit | European Business Wallet: Use Cases, Macroeconomics, an...
Carsten Stoecker
 
PPTX
2025 HackRedCon Cyber Career Paths.pptx Scott Stanton
Scott Stanton
 
PDF
My Journey from CAD to BIM: A True Underdog Story
Safe Software
 
PDF
Unlocking FME Flow’s Potential: Architecture Design for Modern Enterprises
Safe Software
 
PDF
Supporting the NextGen 911 Digital Transformation with FME
Safe Software
 
PDF
Pipeline Industry IoT - Real Time Data Monitoring
Safe Software
 
PDF
FME as an Orchestration Tool with Principles From Data Gravity
Safe Software
 
PDF
Java 25 and Beyond - A Roadmap of Innovations
Ana-Maria Mihalceanu
 
PDF
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
Edge AI and Vision Alliance
 
PDF
Simplify Your FME Flow Setup: Fault-Tolerant Deployment Made Easy with Packer...
Safe Software
 
PDF
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
PPTX
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
 
PPTX
Reimaginando la Ciberdefensa: De Copilots a Redes de Agentes
Cristian Garcia G.
 
Proactive Server and System Monitoring with FME: Using HTTP and System Caller...
Safe Software
 
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
Enhancing Environmental Monitoring with Real-Time Data Integration: Leveragin...
Safe Software
 
ArcGIS Utility Network Migration - The Hunter Water Story
Safe Software
 
How to Comply With Saudi Arabia’s National Cybersecurity Regulations.pdf
Bluechip Advanced Technologies
 
Hello I'm "AI" Your New _________________
Dr. Tathagat Varma
 
5 Things to Consider When Deploying AI in Your Enterprise
Safe Software
 
Bitkom eIDAS Summit | European Business Wallet: Use Cases, Macroeconomics, an...
Carsten Stoecker
 
2025 HackRedCon Cyber Career Paths.pptx Scott Stanton
Scott Stanton
 
My Journey from CAD to BIM: A True Underdog Story
Safe Software
 
Unlocking FME Flow’s Potential: Architecture Design for Modern Enterprises
Safe Software
 
Supporting the NextGen 911 Digital Transformation with FME
Safe Software
 
Pipeline Industry IoT - Real Time Data Monitoring
Safe Software
 
FME as an Orchestration Tool with Principles From Data Gravity
Safe Software
 
Java 25 and Beyond - A Roadmap of Innovations
Ana-Maria Mihalceanu
 
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
Edge AI and Vision Alliance
 
Simplify Your FME Flow Setup: Fault-Tolerant Deployment Made Easy with Packer...
Safe Software
 
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
 
Reimaginando la Ciberdefensa: De Copilots a Redes de Agentes
Cristian Garcia G.
 
Ad

Postgres demystified

  • 1. Postgres Demystified Craig Kerstiens @craigkerstiens http://www.craigkerstiens.com https://speakerdeck.com/u/craigkerstiens/p/postgres-demystified
  • 4. Postgres Demystified We’re Hiring
  • 5. Getting Setup Postgres.app
  • 6. Agenda Brief History Developing w/ Postgres Postgres Performance Querying
  • 7. Postgres History Post Ingress Postgres Around since 1989/1995 PostgresQL Community Driven/Owned
  • 8. MVCC Each query sees transactions committed before it Locks for writing don’t conflict with reading
  • 10. Why Postgres “ its the emacs of databases”
  • 13. Basics psql is your friend # dt # d # d tablename # x # e
  • 14. Datatypes UUID date boolean timestamptz array interval integer smallint line bigint XML enum char money serial bytea point float inet polygon numeric circle cidr varchar tsquery text timetz path time timestamp box macaddr tsvector
  • 15. Datatypes UUID date boolean timestamptz array interval integer smallint line bigint XML enum char money serial bytea point float inet polygon numeric circle cidr varchar tsquery text timetz path time timestamp box macaddr tsvector
  • 16. Datatypes UUID date boolean timestamptz array interval integer smallint line bigint XML enum char money serial bytea point float inet polygon numeric circle cidr varchar tsquery text timetz path time timestamp box macaddr tsvector
  • 17. Datatypes UUID date boolean timestamptz array interval integer smallint line bigint XML enum char money serial bytea point float inet polygon numeric circle cidr varchar tsquery text timetz path time timestamp macaddr box tsvector
  • 18. Datatypes CREATE TABLE items ( id serial NOT NULL, name varchar (255), tags varchar(255) [], created_at timestamp );
  • 19. Datatypes CREATE TABLE items ( id serial NOT NULL, name varchar (255), tags varchar(255) [], created_at timestamp );
  • 20. Datatypes INSERT INTO items VALUES (1, 'Ruby Gem', '{“Programming”,”Jewelry”}', now()); INSERT INTO items VALUES (2, 'Django Pony', '{“Programming”,”Animal”}', now());
  • 21. Datatypes UUID date boolean timestamptz array interval integer smallint line bigint XML enum char money serial bytea point float inet polygon numeric circle cidr varchar tsquery text timetz path time timestamp box macaddr tsvector
  • 22. Extensions dblink hstore trigram uuid-ossp pgstattuple citext pgrowlocks pgcrypto isn ltree fuzzystrmatch cube earthdistance dict_int tablefunc unaccent btree_gist dict_xsyn
  • 23. Extensions dblink hstore trigram uuid-ossp pgstattuple citext pgrowlocks pgcrypto isn ltree fuzzystrmatch cube earthdistance dict_int tablefunc unaccent btree_gist dict_xsyn
  • 24. NoSQL in your SQL CREATE EXTENSION hstore; CREATE TABLE users ( id integer NOT NULL, email character varying(255), data hstore, created_at timestamp without time zone, last_login timestamp without time zone );
  • 25. hStore INSERT INTO users VALUES ( 1, '[email protected]', 'sex => "M", state => “California”', now(), now() );
  • 26. JSON SELECT '{"id":1,"email": "[email protected]",}'::json; V8 w/ PLV8 9.2
  • 27. JSON SELECT '{"id":1,"email": "[email protected]",}'::json; V8 w/ PLV8 9.2
  • 28. JSON SELECT '{"id":1,"email": "[email protected]",}'::json; create or replace function V8 w/ PLV8 js(src text) returns text as $$ return eval( "(function() { " + src + "})" )(); $$ LANGUAGE plv8; 9.2
  • 29. JSON SELECT '{"id":1,"email": "[email protected]",}'::json; create or replace function V8 w/ PLV8 js(src text) returns text as $$ return eval( Bad Idea "(function() { " + src + "})" )(); $$ LANGUAGE plv8; 9.2
  • 30. Range Types 9.2
  • 31. Range Types CREATE TABLE talks (room int, during tsrange); INSERT INTO talks VALUES (3, '[2012-09-24 13:00, 2012-09-24 13:50)'); 9.2
  • 32. Range Types CREATE TABLE talks (room int, during tsrange); INSERT INTO talks VALUES (3, '[2012-09-24 13:00, 2012-09-24 13:50)'); ALTER TABLE talks ADD EXCLUDE USING gist (during WITH &&); INSERT INTO talks VALUES (1108, '[2012-09-24 13:30, 2012-09-24 14:00)'); ERROR: conflicting key value violates exclusion constraint "talks_during_excl" 9.2
  • 34. Full Text Search TSVECTOR - Text Data TSQUERY - Search Predicates Specialized Indexes and Operators
  • 35. Datatypes UUID date boolean timestamptz array interval integer smallint line bigint XML enum char money serial bytea point float inet polygon numeric circle cidr varchar tsquery text timetz path time timestamp macaddr box tsvector
  • 37. PostGIS 1. New datatypes i.e. (2d/3d boxes)
  • 38. PostGIS 1. New datatypes i.e. (2d/3d boxes) 2. New operators i.e. SELECT foo && bar ...
  • 39. PostGIS 1. New datatypes i.e. (2d/3d boxes) 2. New operators i.e. SELECT foo && bar ... 3. Understand relationships and distance i.e. person within location, nearest distance
  • 44. Sequential Scans They’re Bad (most of the time)
  • 48. Indexes B-Tree Generalized Inverted Index (GIN) Generalized Search Tree (GIST) K Nearest Neighbors (KNN) Space Partitioned GIST (SP-GIST)
  • 49. Indexes B-Tree Default Usually want this
  • 50. Indexes Generalized Inverted Index (GIN) Use with multiple values in 1 column Array/hStore
  • 51. Indexes Generalized Search Tree (GIST) Full text search Shapes
  • 52. Understanding Query Perf Given SELECT last_name FROM employees WHERE salary >= 50000;
  • 53. Explain # EXPLAIN SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN ------------------------------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000) (3 rows)
  • 54. Explain # EXPLAIN SELECT last_name FROM employees WHERE salary >= 50000; Startup Cost QUERY PLAN ------------------------------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000) (3 rows)
  • 55. Explain # EXPLAIN SELECT last_name FROM employees WHERE salary >= 50000; Max Time Startup Cost QUERY PLAN ------------------------------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000) (3 rows)
  • 56. Explain # EXPLAIN SELECT last_name FROM employees WHERE salary >= 50000; Max Time Startup Cost QUERY PLAN ------------------------------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000) (3 rows) Rows Returned
  • 57. Explain # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN ------------------------------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000) Total runtime: 295.379 (3 rows)
  • 58. Explain # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; Startup Cost QUERY PLAN ------------------------------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000) Total runtime: 295.379 (3 rows)
  • 59. Explain # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; Startup Cost QUERY PLAN Max Time ------------------------------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000) Total runtime: 295.379 (3 rows)
  • 60. Explain # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; Startup Cost QUERY PLAN Max Time ------------------------------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000) Total runtime: 295.379 (3 rows) Rows Returned
  • 61. Explain # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; Startup Cost QUERY PLAN Max Time ------------------------------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000) Total runtime: 295.379 (3 rows) Rows Returned
  • 62. # CREATE INDEX idx_emps ON employees (salary);
  • 63. # CREATE INDEX idx_emps ON employees (salary); # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN ------------------------------------------------------------------------- Index Scan using idx_emps on employees (cost=0.00..8.49 rows=1 width=6) (actual time = 0.047..1.603 rows=1428 loops=1) Index Cond: (salary >= 50000) Total runtime: 1.771 ms (3 rows)
  • 64. # CREATE INDEX idx_emps ON employees (salary); # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN ------------------------------------------------------------------------- Index Scan using idx_emps on employees (cost=0.00..8.49 rows=1 width=6) (actual time = 0.047..1.603 rows=1428 loops=1) Index Cond: (salary >= 50000) Total runtime: 1.771 ms (3 rows)
  • 66. Indexes Pro Tips CREATE INDEX CONCURRENTLY
  • 67. Indexes Pro Tips CREATE INDEX CONCURRENTLY CREATE INDEX WHERE foo=bar
  • 68. Indexes Pro Tips CREATE INDEX CONCURRENTLY CREATE INDEX WHERE foo=bar SELECT * WHERE foo LIKE ‘%bar% is BAD
  • 69. Indexes Pro Tips CREATE INDEX CONCURRENTLY CREATE INDEX WHERE foo=bar SELECT * WHERE foo LIKE ‘%bar% is BAD SELECT * WHERE Food LIKE ‘bar%’ is OKAY
  • 70. Extensions dblink hstore trigram uuid-ossp pgstattuple citext pgrowlocks pgcrypto isn ltree fuzzystrmatch cube earthdistance dict_int tablefunc unaccent btree_gist dict_xsyn
  • 72. Cache Hit Rate SELECT 'index hit rate' as name, (sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read) as ratio FROM pg_statio_user_indexes union all SELECT 'cache hit rate' as name, case sum(idx_blks_hit) when 0 then 'NaN'::numeric else to_char((sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read), '99.99')::numeric end as ratio FROM pg_statio_user_indexes;)
  • 73. Index Hit Rate SELECT relname, 100 * idx_scan / (seq_scan + idx_scan), n_live_tup FROM pg_stat_user_tables ORDER BY n_live_tup DESC;
  • 74. Index Hit Rate relname | percent_of_times_index_used | rows_in_table ---------------------+-----------------------------+--------------- events | 0 | 669917 app_infos_user_info | 0 | 198218 app_infos | 50 | 175640 user_info | 3 | 46718 rollouts | 0 | 34078 favorites | 0 | 3059 schema_migrations | 0 | 2 authorizations | 0 | 0 delayed_jobs | 23 | 0
  • 76. pg_stats_statements $ select * from pg_stat_statements where query ~ 'from users where email'; userid │ 16384 dbid │ 16388 query │ select * from users where email = ?; calls │ 2 total_time │ 0.000268 rows │ 2 shared_blks_hit │ 16 shared_blks_read │ 0 shared_blks_dirtied │ 0 shared_blks_written │ 0 local_blks_hit │ 0 local_blks_read │ 0 local_blks_dirtied │ 0 9.2 local_blks_written │ 0 temp_blks_read │ 0 temp_blks_written │ 0 time_read │ 0 time_write │ 0
  • 78. pg_stats_statements SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5; ---------------------------------------------------------------------- query | UPDATE pgbench_branches SET bbalance = bbalance + ? WHERE bid = ?; calls | 3000 total_time | 9609.00100000002 rows | 2836 hit_percent | 99.9778970000200936 9.2
  • 82. Window Functions SELECT email, users.data->'state', sum(total(items)), rank() OVER (PARTITION BY users.data->'state' ORDER BY sum(total(items)) desc) FROM users, purchases WHERE purchases.user_id = users.id GROUP BY 1, 2;
  • 83. Window Functions SELECT email, users.data->'state', sum(total(items)), rank() OVER (PARTITION BY users.data->'state' ORDER BY sum(total(items)) desc) FROM users, purchases WHERE purchases.user_id = users.id GROUP BY 1, 2;
  • 84. Extensions dblink hstore trigram uuid-ossp pgstattuple citext pgrowlocks pgcrypto isn ltree fuzzystrmatch cube earthdistance dict_int tablefunc unaccent btree_gist dict_xsyn
  • 87. Fuzzystrmatch SELECT soundex('Craig'), soundex('Will'), difference('Craig', 'Will'); SELECT soundex('Craig'), soundex('Greg'), difference('Craig', 'Greg'); SELECT soundex('Willl'), soundex('Will'), difference('Willl', 'Will');
  • 88. Moving Data Around copy (SELECT * FROM users) TO ‘~/users.csv’; copy users FROM ‘~/users.csv’;
  • 89. db_link SELECT dblink_connect('myconn', 'dbname=postgres'); SELECT * FROM dblink('myconn','SELECT * FROM foo') AS t(a int, b text); a | b -------+------------ 1 | example 2 | example2
  • 90. Foreign Data Wrappers oracle mysql odbc twitter sybase redis jdbc files couch s3 www ldap informix mongodb
  • 91. Foreign Data Wrappers CREATE EXTENSION redis_fdw; CREATE SERVER redis_server FOREIGN DATA WRAPPER redis_fdw OPTIONS (address '127.0.0.1', port '6379'); CREATE FOREIGN TABLE redis_db0 (key text, value text) SERVER redis_server OPTIONS (database '0'); CREATE USER MAPPING FOR PUBLIC SERVER redis_server OPTIONS (password 'secret');
  • 92. Query Redis from Postgres SELECT * FROM redis_db0; SELECT id, email, value as visits FROM users, redis_db0 WHERE ('user_' || cast(id as text)) = cast(redis_db0.key as text) AND cast(value as int) > 40;
  • 93. All are not equal oracle mysql odbc twitter sybase redis jdbc files couch s3 www ldap informix mongodb
  • 94. Production Ready FDWs oracle mysql odbc twitter sybase redis jdbc files couch s3 www ldap informix mongodb
  • 96. Readability WITH top_5_products AS ( SELECT products.*, count(*) FROM products, line_items WHERE products.id = line_items.product_id GROUP BY products.id ORDER BY count(*) DESC LIMIT 5 ) SELECT users.email, count(*) FROM users, line_items, top_5_products WHERE line_items.user_id = users.id AND line_items.product_id = top_5_products.id GROUP BY 1 ORDER BY 1;
  • 97. Common Table Expressions WITH top_5_products AS ( SELECT products.*, count(*) FROM products, line_items WHERE products.id = line_items.product_id GROUP BY products.id ORDER BY count(*) DESC LIMIT 5 ) SELECT users.email, count(*) FROM users, line_items, top_5_products WHERE line_items.user_id = users.id AND line_items.product_id = top_5_products.id GROUP BY 1 ORDER BY 1;
  • 98. Common Table Expressions WITH top_5_products AS ( SELECT products.*, count(*) FROM products, line_items WHERE products.id = line_items.product_id GROUP BY products.id ORDER BY count(*) DESC LIMIT 5 ) SELECT users.email, count(*) FROM users, line_items, top_5_products WHERE line_items.user_id = users.id AND line_items.product_id = top_5_products.id GROUP BY 1 ORDER BY 1;
  • 99. Common Table Expressions WITH top_5_products AS ( SELECT products.*, count(*) FROM products, line_items WHERE products.id = line_items.product_id GROUP BY products.id ORDER BY count(*) DESC LIMIT 5 ) SELECT users.email, count(*) FROM users, line_items, top_5_products WHERE line_items.user_id = users.id AND line_items.product_id = top_5_products.id GROUP BY 1 ’ ORDER BY 1;
  • 100. Common Table Expressions WITH top_5_products AS ( SELECT products.*, count(*) FROM products, line_items WHERE products.id = line_items.product_id GROUP BY products.id ORDER BY count(*) DESC LIMIT 5 ) SELECT users.email, count(*) FROM users, line_items, top_5_products WHERE line_items.user_id = users.id AND line_items.product_id = top_5_products.id GROUP BY 1 Don’t do this in production ’ ORDER BY 1;
  • 101. Brief History Developing w/ Postgres Postgres Performance Querying
  • 102. Extras
  • 105. Extras Listen/Notify Per Transaction Synchronous Replication SELECT for UPDATE
  • 106. Postgres - TLDR Datatypes Fast Column Addition Conditional Indexes Listen/Notify Transactional DDL Table Inheritance Foreign Data Wrappers Per Transaction sync replication Concurrent Index Creation Window functions Extensions NoSQL inside SQL Common Table Expressions Momentum
  • 107. Questions? Craig Kerstiens @craigkerstiens http://www.craigkerstiens.com https://speakerdeck.com/u/craigkerstiens/p/postgres-demystified