SlideShare a Scribd company logo
PostgreSQL: Advanced features
         in practice

          JÁN SUCHAL
           22.11.2011
          @RUBYSLAVA
Why PostgreSQL?

 The world’s most advanced open source database.
 Features!
   Transactional DDL

   Cost-based query optimizer + Graphical explain

   Partial indexes

   Function indexes

   K-nearest search

   Views

   Recursive Queries

   Window Functions
Transactional DDL

class CreatePostsMigration < ActiveRecord::Migration
  def change
    create_table :posts do |t|
      t.string :name, null: false
      t.text :body, null: false
      t.references :author, null: false
      t.timestamps null: false
    end

    add_index :posts, :title, unique: true
  end
end

 Where is the problem?
Transactional DDL

class CreatePostsMigration < ActiveRecord::Migration
  def change
    create_table :posts do |t|
      t.string :name, null: false
                             Column title does not exist!
      t.text :body, null: false is created, index is not. Oops!
                             Table
      t.references :author, null: false
                             Transactional DDL FTW!
      t.timestamps null: false
    end

    add_index :posts, :title, unique: true
  end
end

 Where is the problem?
Cost-based query optimizer

 What is the best plan to execute a given query?
 Cost = I/O + CPU operations needed
 Sequential vs. random seek
 Join order
 Join type (nested loop, hash join, merge join)
Graphical EXPLAIN

 pgAdmin (www.pgadmin.org)
Partial indexes

 Conditional indexes
 Problem: Async job/queue table, find failed jobs
   Create index on failed_at column

   99% of index is never used
Partial indexes

 Conditional indexes
 Problem: Async job/queue table, find failed jobs
   Create index on failed_at column

   99% of index is never used



 Solution:
CREATE INDEX idx_dj_only_failed ON delayed_jobs (failed_at)
  WHERE failed_at IS NOT NULL;
    smaller index
    faster updates
Function Indexes

 Problem: Suffix search
   SELECT … WHERE code LIKE ‘%123’
Function Indexes

 Problem: Suffix search
   SELECT … WHERE code LIKE ‘%123’

 “Solution”:
   Add reverse_code column, populate, add triggers for updates,
    create index on reverse_code column
   reverse queries WHERE reverse_code LIKE “321%”
Function Indexes

 Problem: Suffix search
   SELECT … WHERE code LIKE ‘%123’

 “Solution”:
   Add reverse_code column, populate, add triggers for updates,
    create index on reverse_code column,
   reverse queries WHERE reverse_code LIKE “321%”



 PostgreSQL solution:
  CREATE INDEX idx_reversed ON projects
  (reverse((code)::text) text_pattern_ops);
  SELECT … WHERE reverse(code) LIKE
  reverse(‘%123’)
K-nearest search

 Problem: Fuzzy string matching
   900K rows




 CREATE INDEX idx_trgm_name ON subjects USING gist (name
 gist_trgm_ops);

 SELECT name, name <-> 'Michl Brla' AS dist
   FROM subjects ORDER BY dist ASC LIMIT 10; (312ms)

 "Michal Barla“   ;   0.588235
 "Michal Bula“    ;   0.647059
 "Michal Broz“    ;   0.647059
 "Pavel Michl“    ;   0.647059
 "Michal Brna“    ;   0.647059
K-nearest search

 Problem: Fuzzy string matching
   900K rows



 Solution: Ngram/Trigram search
   johno = {" j"," jo",”hno”,”joh”,"no ",”ohn”}

 CREATE INDEX idx_trgm_name ON subjects USING gist (name
 gist_trgm_ops);

 SELECT name, name <-> 'Michl Brla' AS dist
   FROM subjects ORDER BY dist ASC LIMIT 10; (312ms)

 "Michal Barla“   ;   0.588235
 "Michal Bula“    ;   0.647059
 "Michal Broz“    ;   0.647059
 "Pavel Michl“    ;   0.647059
 "Michal Brna“    ;   0.647059
K-nearest search

 Problem: Fuzzy string matching
   900K rows



 Solution: Ngram/Trigram search
   johno = {" j"," jo",”hno”,”joh”,"no ",”ohn”}

 CREATE INDEX idx_trgm_name ON subjects USING gist (name
 gist_trgm_ops);

 SELECT name, name <-> 'Michl Brla' AS dist
   FROM subjects ORDER BY dist ASC LIMIT 10; (312ms)

 "Michal Barla“   ;   0.588235
 "Michal Bula“    ;   0.647059
 "Michal Broz“    ;   0.647059
 "Pavel Michl“    ;   0.647059
 "Michal Brna“    ;   0.647059
Views

 Constraints propagated down to views

CREATE VIEW edges AS
  SELECT subject_id AS source_id,
    connected_subject_id AS target_id FROM raw_connections
  UNION ALL
  SELECT connected_subject_id AS source_id,
    subject_id AS target_id FROM raw_connections;

 SELECT * FROM edges WHERE source_id = 123;
 SELECT * FROM edges WHERE source_id < 500 ORDER BY
  source_id LIMIT 10
  No materialization, 2x indexed select + 1x append/merge
Views

 Constraints propagated down to views

CREATE VIEW edges AS
  SELECT subject_id AS source_id,
    connected_subject_id AS target_id FROM raw_connections
  UNION ALL
  SELECT connected_subject_id AS source_id,
    subject_id AS target_id FROM raw_connections;

 SELECT * FROM edges WHERE source_id = 123;
 SELECT * FROM edges WHERE source_id < 500 ORDER BY
  source_id LIMIT 10
     No materialization, 2x indexed select + 1x append/merge
Recursive Queries

 Problem: Find paths between two nodes in graph

WITH RECURSIVE search_graph(source,target,distance,path) AS
(
  SELECT source_id, target_id, 1,
    ARRAY[source_id, target_id]
  FROM edges WHERE source_id = 552506
  UNION ALL
  SELECT sg.source, e.target_id, sg.distance + 1,
    path || ARRAY[e.target_id]
  FROM search_graph sg
    JOIN edges e ON sg.target = e.source_id
    WHERE NOT e.target_id = ANY(path) AND distance < 4
)
SELECT * FROM search_graph LIMIT 100
Recursive Queries

 Problem: Find paths between two nodes in graph

WITH RECURSIVE search_graph(source,target,distance,path) AS
(
  SELECT source_id, target_id, 1,
    ARRAY[source_id, target_id]
  FROM edges WHERE source_id = 552506
  UNION ALL
  SELECT sg.source, e.target_id, sg.distance + 1,
    path || ARRAY[e.target_id]
  FROM search_graph sg
    JOIN edges e ON sg.target = e.source_id
    WHERE NOT e.target_id = ANY(path) AND distance < 4
)
SELECT * FROM search_graph LIMIT 100
Recursive Queries

 Problem: Find paths between two nodes in graph

WITH RECURSIVE search_graph(source,target,distance,path) AS
(
  SELECT source_id, target_id, 1,
    ARRAY[source_id, target_id]
  FROM edges WHERE source_id = 552506
  UNION ALL
  SELECT sg.source, e.target_id, sg.distance + 1,
    path || ARRAY[e.target_id]
  FROM search_graph sg
    JOIN edges e ON sg.target = e.source_id
    WHERE NOT e.target_id = ANY(path) AND distance < 4
)
SELECT * FROM search_graph WHERE target = 530556 LIMIT 100;
Recursive Queries

 Problem: Find paths between two nodes in graph

WITH RECURSIVE search_graph(source,target,distance,path) AS
(
  SELECT source_id, target_id, 1,
    ARRAY[source_id, target_id]
  FROM edges WHERE source_id = 552506
  UNION ALL
  SELECT sg.source, e.target_id, sg.distance + 1,
    path || ARRAY[e.target_id]
  FROM search_graph sg
    JOIN edges e ON sg.target = e.source_id
    WHERE NOT e.target_id = ANY(path) AND distance < 4
)
SELECT * FROM search_graph WHERE target = 530556 LIMIT 100;
Recursive Queries

 Problem: Find paths between two nodes in graph

WITH RECURSIVE search_graph(source,target,distance,path) AS
(
  SELECT source_id, target_id, 1,
    ARRAY[source_id, target_id]
  FROM edges WHERE source_id = 552506
  UNION ALL
  SELECT sg.source, e.target_id, sg.distance + 1,
    path || ARRAY[e.target_id]
  FROM search_graph sg
    JOIN edges e ON sg.target = e.source_id
    WHERE NOT e.target_id = ANY(path) AND distance < 4
)
SELECT * FROM search_graph WHERE target = 530556 LIMIT 100;
Recursive queries
Recursive queries

 Graph with ~1M edges (61ms)
 source; target; distance; path
 530556; 552506; 2; {530556,185423,552506}
   JUDr. Robert Kaliňák -> FoodRest s.r.o. -> Ing. Ján
    Počiatek

 530556; 552506; 2; {530556,183291,552506}
   JUDr. Robert Kaliňák -> FoRest s.r.o. -> Ing. Ján
    Počiatek

 530556; 552506; 4;
 {530556,183291,552522,185423,552506}
    JUDr. Robert Kaliňák -> FoodRest s.r.o. -> Lena
     Sisková -> FoRest s.r.o. -> Ing. Ján Počiatek
Window functions

 “Aggregate functions without grouping”
   avg, count, sum, rank, row_number, ntile…

 Problem: Find closest nodes to a given node
  Order by sum of path scores
  Path score = 0.9^<distance> / log(1 + <number of paths>)

SELECT source, target FROM (
 SELECT source, target, path, distance,
  0.9 ^ distance / log(1 +
   COUNT(*) OVER (PARTITION BY distance,target)
  ) AS score
 FROM ( … ) AS paths
) as scored_paths
GROUP BY source, target ORDER BY SUM(score) DESC
Window functions

 “Aggregate functions without grouping”
   avg, count, sum, rank, row_number, ntile…

 Problem: Find closest nodes to a given node
   Order by sum of path scores
   Path score = 0.9^<distance> / log(1 + <number of paths>)


SELECT source, target FROM (
 SELECT source, target, path, distance,
  0.9 ^ distance / log(1 +
   COUNT(*) OVER (PARTITION BY distance,target)
  ) AS score
 FROM ( … ) AS paths
) as scored_paths
GROUP BY source, target ORDER BY SUM(score) DESC
Window functions

 “Aggregate functions without grouping”
   avg, count, sum, rank, row_number, ntile…

 Problem: Find closest nodes to a given node
   Order by sum of path scores
   Path score = 0.9^<distance> / log(1 + <number of paths>)


SELECT source, target FROM (
 SELECT source, target, path, distance,
  0.9 ^ distance / log(1 +
   COUNT(*) OVER (PARTITION BY distance, target)
  ) AS n
 FROM ( … ) AS paths
) as scored_paths
GROUP BY source, target ORDER BY SUM(score) DESC
Window functions

 “Aggregate functions without grouping”
   avg, count, sum, rank, row_number, ntile…

 Problem: Find closest nodes to a given node
   Order by sum of path scores
   Path score = 0.9^<distance> / log(1 + <number of paths>)


SELECT source, target FROM (
 SELECT source, target, path, distance,
  0.9 ^ distance / log(1 +
   COUNT(*) OVER (PARTITION BY distance, target)
  ) AS score
 FROM ( … ) AS paths
) as scored_paths
GROUP BY source, target ORDER BY SUM(score) DESC
Window functions

 “Aggregate functions without grouping”
   avg, count, sum, rank, row_number, ntile…

 Problem: Find closest nodes to a given node
   Order by sum of path scores
   Path score = 0.9^<distance> / log(1 + <number of paths>)


SELECT source, target FROM (
 SELECT source, target, path, distance,
  0.9 ^ distance / log(1 +
   COUNT(*) OVER (PARTITION BY distance, target)
  ) AS score
 FROM ( … ) AS paths
) AS scored_paths
GROUP BY source, target ORDER BY SUM(score) DESC
Window functions

 Example: Closest to Róbert Kaliňák
  "Bussines Park Bratislava a.s."
  "JARABINY a.s."
  "Ing. Robert Pintér"
  "Ing. Ján Počiatek"
  "Bratislava trade center a.s.“
  …
 1M edges, 41ms
Additional resources

 www.postgresql.org
   Read the docs, seriously

 www.explainextended.com
   SQL guru blog

 explain.depesz.com
   First aid for slow queries

 www.wikivs.com/wiki/MySQL_vs_PostgreSQL
   MySQL vs. PostgreSQL comparison
Real World Explain

 www.postgresql.org

More Related Content

What's hot (20)

Postgres rules
Postgres rules
gisborne
 
Python tutorial
Python tutorial
Rajiv Risi
 
A tour of Python
A tour of Python
Aleksandar Veselinovic
 
The Ring programming language version 1.5.2 book - Part 44 of 181
The Ring programming language version 1.5.2 book - Part 44 of 181
Mahmoud Samir Fayed
 
Unsung Heroes of PHP
Unsung Heroes of PHP
jsmith92
 
Postgresql 9.3 overview
Postgresql 9.3 overview
Aveic
 
Green dao
Green dao
彥彬 洪
 
Xm lparsers
Xm lparsers
Suman Lata
 
Ggplot2 v3
Ggplot2 v3
Josh Doyle
 
Getting started with R when analysing GitHub commits
Getting started with R when analysing GitHub commits
Barbara Fusinska
 
Graph Database Query Languages
Graph Database Query Languages
Jay Coskey
 
Why async and functional programming in PHP7 suck and how to get overr it?
Why async and functional programming in PHP7 suck and how to get overr it?
Lucas Witold Adamus
 
Scala best practices
Scala best practices
Alexander Zaidel
 
The Ring programming language version 1.5.3 book - Part 30 of 184
The Ring programming language version 1.5.3 book - Part 30 of 184
Mahmoud Samir Fayed
 
Pytables
Pytables
rocketcircus
 
JDK 8
JDK 8
Maciej Dragan
 
The Ring programming language version 1.8 book - Part 35 of 202
The Ring programming language version 1.8 book - Part 35 of 202
Mahmoud Samir Fayed
 
WorkingWithSlick2.1.0
WorkingWithSlick2.1.0
Knoldus Inc.
 
Programming Haskell Chapter8
Programming Haskell Chapter8
Kousuke Ruichi
 
The Ring programming language version 1.5.2 book - Part 33 of 181
The Ring programming language version 1.5.2 book - Part 33 of 181
Mahmoud Samir Fayed
 
Postgres rules
Postgres rules
gisborne
 
Python tutorial
Python tutorial
Rajiv Risi
 
The Ring programming language version 1.5.2 book - Part 44 of 181
The Ring programming language version 1.5.2 book - Part 44 of 181
Mahmoud Samir Fayed
 
Unsung Heroes of PHP
Unsung Heroes of PHP
jsmith92
 
Postgresql 9.3 overview
Postgresql 9.3 overview
Aveic
 
Getting started with R when analysing GitHub commits
Getting started with R when analysing GitHub commits
Barbara Fusinska
 
Graph Database Query Languages
Graph Database Query Languages
Jay Coskey
 
Why async and functional programming in PHP7 suck and how to get overr it?
Why async and functional programming in PHP7 suck and how to get overr it?
Lucas Witold Adamus
 
The Ring programming language version 1.5.3 book - Part 30 of 184
The Ring programming language version 1.5.3 book - Part 30 of 184
Mahmoud Samir Fayed
 
The Ring programming language version 1.8 book - Part 35 of 202
The Ring programming language version 1.8 book - Part 35 of 202
Mahmoud Samir Fayed
 
WorkingWithSlick2.1.0
WorkingWithSlick2.1.0
Knoldus Inc.
 
Programming Haskell Chapter8
Programming Haskell Chapter8
Kousuke Ruichi
 
The Ring programming language version 1.5.2 book - Part 33 of 181
The Ring programming language version 1.5.2 book - Part 33 of 181
Mahmoud Samir Fayed
 

Viewers also liked (20)

Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
 
PostgreSQL Advanced Queries
PostgreSQL Advanced Queries
Nur Hidayat
 
Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming Replication
Alexey Lesovsky
 
Сравнение форматов и библиотек сериализации / Антон Рыжов (Qrator Labs)
Сравнение форматов и библиотек сериализации / Антон Рыжов (Qrator Labs)
Ontico
 
Streaming replication in practice
Streaming replication in practice
Alexey Lesovsky
 
Freedom! Employee Empowerment the $2000 Way
Freedom! Employee Empowerment the $2000 Way
Davidson Asset Management Ltd
 
Mission: Launch a Digital Workplace
Mission: Launch a Digital Workplace
BMC Software
 
Et si la RH partagée devenait une nouvelle spécialité bretonne ?
Et si la RH partagée devenait une nouvelle spécialité bretonne ?
EMERAUDE RH
 
PrescriptionPillsToHeroine_hw Copy
PrescriptionPillsToHeroine_hw Copy
Amber Hollingsworth
 
Goed jaar voor firma Kevin Pauwels
Goed jaar voor firma Kevin Pauwels
Thierry Debels
 
TRUSTLESS.AI and Trustless Computing Consortium
TRUSTLESS.AI and Trustless Computing Consortium
TRUSTLESS.AI
 
Mesa job _fair_flyer
Mesa job _fair_flyer
Heath Anderson MEd
 
Los Desafíos de la educación a distancia
Los Desafíos de la educación a distancia
Claudio Rama
 
The adoption and impact of OEP and OER in the Global South: Theoretical, conc...
The adoption and impact of OEP and OER in the Global South: Theoretical, conc...
ROER4D
 
Game Studio Leadership: You Can Do It
Game Studio Leadership: You Can Do It
Jesse Schell
 
MOOC Aspects juridiques de la création d'entreprises innovantes - attestation
MOOC Aspects juridiques de la création d'entreprises innovantes - attestation
Audrey Jacob
 
Choosing Open (#OEGlobal) - Openness and praxis: Using OEP in HE
Choosing Open (#OEGlobal) - Openness and praxis: Using OEP in HE
Catherine Cronin
 
Marketing Week Live 2017
Marketing Week Live 2017
Jeremy Waite
 
コードクローン研究 ふりかえり ~ストロング・スタイルで行こう~
コードクローン研究 ふりかえり ~ストロング・スタイルで行こう~
Kamiya Toshihiro
 
はじめての CircleCI
はじめての CircleCI
Yosuke Mizutani
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
 
PostgreSQL Advanced Queries
PostgreSQL Advanced Queries
Nur Hidayat
 
Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming Replication
Alexey Lesovsky
 
Сравнение форматов и библиотек сериализации / Антон Рыжов (Qrator Labs)
Сравнение форматов и библиотек сериализации / Антон Рыжов (Qrator Labs)
Ontico
 
Streaming replication in practice
Streaming replication in practice
Alexey Lesovsky
 
Mission: Launch a Digital Workplace
Mission: Launch a Digital Workplace
BMC Software
 
Et si la RH partagée devenait une nouvelle spécialité bretonne ?
Et si la RH partagée devenait une nouvelle spécialité bretonne ?
EMERAUDE RH
 
PrescriptionPillsToHeroine_hw Copy
PrescriptionPillsToHeroine_hw Copy
Amber Hollingsworth
 
Goed jaar voor firma Kevin Pauwels
Goed jaar voor firma Kevin Pauwels
Thierry Debels
 
TRUSTLESS.AI and Trustless Computing Consortium
TRUSTLESS.AI and Trustless Computing Consortium
TRUSTLESS.AI
 
Los Desafíos de la educación a distancia
Los Desafíos de la educación a distancia
Claudio Rama
 
The adoption and impact of OEP and OER in the Global South: Theoretical, conc...
The adoption and impact of OEP and OER in the Global South: Theoretical, conc...
ROER4D
 
Game Studio Leadership: You Can Do It
Game Studio Leadership: You Can Do It
Jesse Schell
 
MOOC Aspects juridiques de la création d'entreprises innovantes - attestation
MOOC Aspects juridiques de la création d'entreprises innovantes - attestation
Audrey Jacob
 
Choosing Open (#OEGlobal) - Openness and praxis: Using OEP in HE
Choosing Open (#OEGlobal) - Openness and praxis: Using OEP in HE
Catherine Cronin
 
Marketing Week Live 2017
Marketing Week Live 2017
Jeremy Waite
 
コードクローン研究 ふりかえり ~ストロング・スタイルで行こう~
コードクローン研究 ふりかえり ~ストロング・スタイルで行こう~
Kamiya Toshihiro
 
はじめての CircleCI
はじめての CircleCI
Yosuke Mizutani
 
Ad

Similar to PostgreSQL: Advanced features in practice (20)

GreenDao Introduction
GreenDao Introduction
Booch Lin
 
Importing Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflow
Neo4j
 
PerlApp2Postgresql (2)
PerlApp2Postgresql (2)
Jerome Eteve
 
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger
Neo4j
 
Joins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation Enhancements
Andrew Morgan
 
Graph Connect: Importing data quickly and easily
Graph Connect: Importing data quickly and easily
Mark Needham
 
MongoDB
MongoDB
Hemant Kumar Tiwary
 
MySQL Indexes
MySQL Indexes
Anton Zhukov
 
MongoDB Aggregation
MongoDB Aggregation
Amit Ghosh
 
MongoDB Aggregation Framework
MongoDB Aggregation Framework
Caserta
 
Mapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQL
Gábor Szárnyas
 
Beyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the code
Wim Godden
 
Cassandra Data Modeling
Cassandra Data Modeling
Ben Knear
 
Tactical data engineering
Tactical data engineering
Julian Hyde
 
Less08 Schema
Less08 Schema
vivaankumar
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
MongoDB
 
Code is not text! How graph technologies can help us to understand our code b...
Code is not text! How graph technologies can help us to understand our code b...
Andreas Dewes
 
Lecture 3.pdf
Lecture 3.pdf
DanielGarca686549
 
Java Database Connectivity (JDBC) with Spring Framework is a powerful combina...
Java Database Connectivity (JDBC) with Spring Framework is a powerful combina...
demomki4
 
PPT on Data Science Using Python
PPT on Data Science Using Python
NishantKumar1179
 
GreenDao Introduction
GreenDao Introduction
Booch Lin
 
Importing Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflow
Neo4j
 
PerlApp2Postgresql (2)
PerlApp2Postgresql (2)
Jerome Eteve
 
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger
Neo4j
 
Joins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation Enhancements
Andrew Morgan
 
Graph Connect: Importing data quickly and easily
Graph Connect: Importing data quickly and easily
Mark Needham
 
MongoDB Aggregation
MongoDB Aggregation
Amit Ghosh
 
MongoDB Aggregation Framework
MongoDB Aggregation Framework
Caserta
 
Mapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQL
Gábor Szárnyas
 
Beyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the code
Wim Godden
 
Cassandra Data Modeling
Cassandra Data Modeling
Ben Knear
 
Tactical data engineering
Tactical data engineering
Julian Hyde
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
MongoDB
 
Code is not text! How graph technologies can help us to understand our code b...
Code is not text! How graph technologies can help us to understand our code b...
Andreas Dewes
 
Java Database Connectivity (JDBC) with Spring Framework is a powerful combina...
Java Database Connectivity (JDBC) with Spring Framework is a powerful combina...
demomki4
 
PPT on Data Science Using Python
PPT on Data Science Using Python
NishantKumar1179
 
Ad

More from Jano Suchal (20)

Slovensko.Digital: Čo ďalej?
Slovensko.Digital: Čo ďalej?
Jano Suchal
 
Datanest 3.0
Datanest 3.0
Jano Suchal
 
Improving code quality
Improving code quality
Jano Suchal
 
Beyond search queries
Beyond search queries
Jano Suchal
 
Rank all the things!
Rank all the things!
Jano Suchal
 
Rank all the (geo) things!
Rank all the (geo) things!
Jano Suchal
 
Ako si vybrať programovácí jazyk alebo framework?
Ako si vybrať programovácí jazyk alebo framework?
Jano Suchal
 
Bonetics: Mastering Puppet Workshop
Bonetics: Mastering Puppet Workshop
Jano Suchal
 
Peter Mihalik: Puppet
Peter Mihalik: Puppet
Jano Suchal
 
Tomáš Čorej: Configuration management & CFEngine3
Tomáš Čorej: Configuration management & CFEngine3
Jano Suchal
 
Ako si vybrať programovací jazyk a framework?
Ako si vybrať programovací jazyk a framework?
Jano Suchal
 
SQL: Query optimization in practice
SQL: Query optimization in practice
Jano Suchal
 
Garelic: Google Analytics as App Performance monitoring
Garelic: Google Analytics as App Performance monitoring
Jano Suchal
 
Miroslav Šimulčík: Temporálne databázy
Miroslav Šimulčík: Temporálne databázy
Jano Suchal
 
Vojtech Rinik: Internship v USA - moje skúsenosti
Vojtech Rinik: Internship v USA - moje skúsenosti
Jano Suchal
 
Profiling and monitoring ruby & rails applications
Profiling and monitoring ruby & rails applications
Jano Suchal
 
Aký programovací jazyk a framework si vybrať a prečo?
Aký programovací jazyk a framework si vybrať a prečo?
Jano Suchal
 
Čo po GAMČI?
Čo po GAMČI?
Jano Suchal
 
Petr Joachim: Redis na Super.cz
Petr Joachim: Redis na Super.cz
Jano Suchal
 
Metaprogramovanie #1
Metaprogramovanie #1
Jano Suchal
 
Slovensko.Digital: Čo ďalej?
Slovensko.Digital: Čo ďalej?
Jano Suchal
 
Improving code quality
Improving code quality
Jano Suchal
 
Beyond search queries
Beyond search queries
Jano Suchal
 
Rank all the things!
Rank all the things!
Jano Suchal
 
Rank all the (geo) things!
Rank all the (geo) things!
Jano Suchal
 
Ako si vybrať programovácí jazyk alebo framework?
Ako si vybrať programovácí jazyk alebo framework?
Jano Suchal
 
Bonetics: Mastering Puppet Workshop
Bonetics: Mastering Puppet Workshop
Jano Suchal
 
Peter Mihalik: Puppet
Peter Mihalik: Puppet
Jano Suchal
 
Tomáš Čorej: Configuration management & CFEngine3
Tomáš Čorej: Configuration management & CFEngine3
Jano Suchal
 
Ako si vybrať programovací jazyk a framework?
Ako si vybrať programovací jazyk a framework?
Jano Suchal
 
SQL: Query optimization in practice
SQL: Query optimization in practice
Jano Suchal
 
Garelic: Google Analytics as App Performance monitoring
Garelic: Google Analytics as App Performance monitoring
Jano Suchal
 
Miroslav Šimulčík: Temporálne databázy
Miroslav Šimulčík: Temporálne databázy
Jano Suchal
 
Vojtech Rinik: Internship v USA - moje skúsenosti
Vojtech Rinik: Internship v USA - moje skúsenosti
Jano Suchal
 
Profiling and monitoring ruby & rails applications
Profiling and monitoring ruby & rails applications
Jano Suchal
 
Aký programovací jazyk a framework si vybrať a prečo?
Aký programovací jazyk a framework si vybrať a prečo?
Jano Suchal
 
Petr Joachim: Redis na Super.cz
Petr Joachim: Redis na Super.cz
Jano Suchal
 
Metaprogramovanie #1
Metaprogramovanie #1
Jano Suchal
 

Recently uploaded (20)

MuleSoft for AgentForce : Topic Center and API Catalog
MuleSoft for AgentForce : Topic Center and API Catalog
shyamraj55
 
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
angelo60207
 
Down the Rabbit Hole – Solving 5 Training Roadblocks
Down the Rabbit Hole – Solving 5 Training Roadblocks
Rustici Software
 
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Alliance
 
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
Edge AI and Vision Alliance
 
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc
 
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
 
Artificial Intelligence in the Nonprofit Boardroom.pdf
Artificial Intelligence in the Nonprofit Boardroom.pdf
OnBoard
 
Oracle Cloud Infrastructure Generative AI Professional
Oracle Cloud Infrastructure Generative AI Professional
VICTOR MAESTRE RAMIREZ
 
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Safe Software
 
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance
 
Oracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI Foundations
VICTOR MAESTRE RAMIREZ
 
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
AmirStern2
 
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
Safe Software
 
Analysis of the changes in the attitude of the news comments caused by knowin...
Analysis of the changes in the attitude of the news comments caused by knowin...
Matsushita Laboratory
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Safe Software
 
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
 
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Alliance
 
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
Safe Software
 
MuleSoft for AgentForce : Topic Center and API Catalog
MuleSoft for AgentForce : Topic Center and API Catalog
shyamraj55
 
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
angelo60207
 
Down the Rabbit Hole – Solving 5 Training Roadblocks
Down the Rabbit Hole – Solving 5 Training Roadblocks
Rustici Software
 
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Alliance
 
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
Edge AI and Vision Alliance
 
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc
 
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
 
Artificial Intelligence in the Nonprofit Boardroom.pdf
Artificial Intelligence in the Nonprofit Boardroom.pdf
OnBoard
 
Oracle Cloud Infrastructure Generative AI Professional
Oracle Cloud Infrastructure Generative AI Professional
VICTOR MAESTRE RAMIREZ
 
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Safe Software
 
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance
 
Oracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI Foundations
VICTOR MAESTRE RAMIREZ
 
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
AmirStern2
 
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
Safe Software
 
Analysis of the changes in the attitude of the news comments caused by knowin...
Analysis of the changes in the attitude of the news comments caused by knowin...
Matsushita Laboratory
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Safe Software
 
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
 
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Alliance
 
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
Safe Software
 

PostgreSQL: Advanced features in practice

  • 1. PostgreSQL: Advanced features in practice JÁN SUCHAL 22.11.2011 @RUBYSLAVA
  • 2. Why PostgreSQL?  The world’s most advanced open source database.  Features!  Transactional DDL  Cost-based query optimizer + Graphical explain  Partial indexes  Function indexes  K-nearest search  Views  Recursive Queries  Window Functions
  • 3. Transactional DDL class CreatePostsMigration < ActiveRecord::Migration def change create_table :posts do |t| t.string :name, null: false t.text :body, null: false t.references :author, null: false t.timestamps null: false end add_index :posts, :title, unique: true end end  Where is the problem?
  • 4. Transactional DDL class CreatePostsMigration < ActiveRecord::Migration def change create_table :posts do |t| t.string :name, null: false Column title does not exist! t.text :body, null: false is created, index is not. Oops! Table t.references :author, null: false Transactional DDL FTW! t.timestamps null: false end add_index :posts, :title, unique: true end end  Where is the problem?
  • 5. Cost-based query optimizer  What is the best plan to execute a given query?  Cost = I/O + CPU operations needed  Sequential vs. random seek  Join order  Join type (nested loop, hash join, merge join)
  • 6. Graphical EXPLAIN  pgAdmin (www.pgadmin.org)
  • 7. Partial indexes  Conditional indexes  Problem: Async job/queue table, find failed jobs  Create index on failed_at column  99% of index is never used
  • 8. Partial indexes  Conditional indexes  Problem: Async job/queue table, find failed jobs  Create index on failed_at column  99% of index is never used  Solution: CREATE INDEX idx_dj_only_failed ON delayed_jobs (failed_at) WHERE failed_at IS NOT NULL;  smaller index  faster updates
  • 9. Function Indexes  Problem: Suffix search  SELECT … WHERE code LIKE ‘%123’
  • 10. Function Indexes  Problem: Suffix search  SELECT … WHERE code LIKE ‘%123’  “Solution”:  Add reverse_code column, populate, add triggers for updates, create index on reverse_code column  reverse queries WHERE reverse_code LIKE “321%”
  • 11. Function Indexes  Problem: Suffix search  SELECT … WHERE code LIKE ‘%123’  “Solution”:  Add reverse_code column, populate, add triggers for updates, create index on reverse_code column,  reverse queries WHERE reverse_code LIKE “321%”  PostgreSQL solution: CREATE INDEX idx_reversed ON projects (reverse((code)::text) text_pattern_ops); SELECT … WHERE reverse(code) LIKE reverse(‘%123’)
  • 12. K-nearest search  Problem: Fuzzy string matching  900K rows CREATE INDEX idx_trgm_name ON subjects USING gist (name gist_trgm_ops); SELECT name, name <-> 'Michl Brla' AS dist FROM subjects ORDER BY dist ASC LIMIT 10; (312ms) "Michal Barla“ ; 0.588235 "Michal Bula“ ; 0.647059 "Michal Broz“ ; 0.647059 "Pavel Michl“ ; 0.647059 "Michal Brna“ ; 0.647059
  • 13. K-nearest search  Problem: Fuzzy string matching  900K rows  Solution: Ngram/Trigram search  johno = {" j"," jo",”hno”,”joh”,"no ",”ohn”} CREATE INDEX idx_trgm_name ON subjects USING gist (name gist_trgm_ops); SELECT name, name <-> 'Michl Brla' AS dist FROM subjects ORDER BY dist ASC LIMIT 10; (312ms) "Michal Barla“ ; 0.588235 "Michal Bula“ ; 0.647059 "Michal Broz“ ; 0.647059 "Pavel Michl“ ; 0.647059 "Michal Brna“ ; 0.647059
  • 14. K-nearest search  Problem: Fuzzy string matching  900K rows  Solution: Ngram/Trigram search  johno = {" j"," jo",”hno”,”joh”,"no ",”ohn”} CREATE INDEX idx_trgm_name ON subjects USING gist (name gist_trgm_ops); SELECT name, name <-> 'Michl Brla' AS dist FROM subjects ORDER BY dist ASC LIMIT 10; (312ms) "Michal Barla“ ; 0.588235 "Michal Bula“ ; 0.647059 "Michal Broz“ ; 0.647059 "Pavel Michl“ ; 0.647059 "Michal Brna“ ; 0.647059
  • 15. Views  Constraints propagated down to views CREATE VIEW edges AS SELECT subject_id AS source_id, connected_subject_id AS target_id FROM raw_connections UNION ALL SELECT connected_subject_id AS source_id, subject_id AS target_id FROM raw_connections;  SELECT * FROM edges WHERE source_id = 123;  SELECT * FROM edges WHERE source_id < 500 ORDER BY source_id LIMIT 10 No materialization, 2x indexed select + 1x append/merge
  • 16. Views  Constraints propagated down to views CREATE VIEW edges AS SELECT subject_id AS source_id, connected_subject_id AS target_id FROM raw_connections UNION ALL SELECT connected_subject_id AS source_id, subject_id AS target_id FROM raw_connections;  SELECT * FROM edges WHERE source_id = 123;  SELECT * FROM edges WHERE source_id < 500 ORDER BY source_id LIMIT 10  No materialization, 2x indexed select + 1x append/merge
  • 17. Recursive Queries  Problem: Find paths between two nodes in graph WITH RECURSIVE search_graph(source,target,distance,path) AS ( SELECT source_id, target_id, 1, ARRAY[source_id, target_id] FROM edges WHERE source_id = 552506 UNION ALL SELECT sg.source, e.target_id, sg.distance + 1, path || ARRAY[e.target_id] FROM search_graph sg JOIN edges e ON sg.target = e.source_id WHERE NOT e.target_id = ANY(path) AND distance < 4 ) SELECT * FROM search_graph LIMIT 100
  • 18. Recursive Queries  Problem: Find paths between two nodes in graph WITH RECURSIVE search_graph(source,target,distance,path) AS ( SELECT source_id, target_id, 1, ARRAY[source_id, target_id] FROM edges WHERE source_id = 552506 UNION ALL SELECT sg.source, e.target_id, sg.distance + 1, path || ARRAY[e.target_id] FROM search_graph sg JOIN edges e ON sg.target = e.source_id WHERE NOT e.target_id = ANY(path) AND distance < 4 ) SELECT * FROM search_graph LIMIT 100
  • 19. Recursive Queries  Problem: Find paths between two nodes in graph WITH RECURSIVE search_graph(source,target,distance,path) AS ( SELECT source_id, target_id, 1, ARRAY[source_id, target_id] FROM edges WHERE source_id = 552506 UNION ALL SELECT sg.source, e.target_id, sg.distance + 1, path || ARRAY[e.target_id] FROM search_graph sg JOIN edges e ON sg.target = e.source_id WHERE NOT e.target_id = ANY(path) AND distance < 4 ) SELECT * FROM search_graph WHERE target = 530556 LIMIT 100;
  • 20. Recursive Queries  Problem: Find paths between two nodes in graph WITH RECURSIVE search_graph(source,target,distance,path) AS ( SELECT source_id, target_id, 1, ARRAY[source_id, target_id] FROM edges WHERE source_id = 552506 UNION ALL SELECT sg.source, e.target_id, sg.distance + 1, path || ARRAY[e.target_id] FROM search_graph sg JOIN edges e ON sg.target = e.source_id WHERE NOT e.target_id = ANY(path) AND distance < 4 ) SELECT * FROM search_graph WHERE target = 530556 LIMIT 100;
  • 21. Recursive Queries  Problem: Find paths between two nodes in graph WITH RECURSIVE search_graph(source,target,distance,path) AS ( SELECT source_id, target_id, 1, ARRAY[source_id, target_id] FROM edges WHERE source_id = 552506 UNION ALL SELECT sg.source, e.target_id, sg.distance + 1, path || ARRAY[e.target_id] FROM search_graph sg JOIN edges e ON sg.target = e.source_id WHERE NOT e.target_id = ANY(path) AND distance < 4 ) SELECT * FROM search_graph WHERE target = 530556 LIMIT 100;
  • 23. Recursive queries  Graph with ~1M edges (61ms)  source; target; distance; path  530556; 552506; 2; {530556,185423,552506}  JUDr. Robert Kaliňák -> FoodRest s.r.o. -> Ing. Ján Počiatek  530556; 552506; 2; {530556,183291,552506}  JUDr. Robert Kaliňák -> FoRest s.r.o. -> Ing. Ján Počiatek  530556; 552506; 4; {530556,183291,552522,185423,552506}  JUDr. Robert Kaliňák -> FoodRest s.r.o. -> Lena Sisková -> FoRest s.r.o. -> Ing. Ján Počiatek
  • 24. Window functions  “Aggregate functions without grouping”  avg, count, sum, rank, row_number, ntile…  Problem: Find closest nodes to a given node Order by sum of path scores Path score = 0.9^<distance> / log(1 + <number of paths>) SELECT source, target FROM ( SELECT source, target, path, distance, 0.9 ^ distance / log(1 + COUNT(*) OVER (PARTITION BY distance,target) ) AS score FROM ( … ) AS paths ) as scored_paths GROUP BY source, target ORDER BY SUM(score) DESC
  • 25. Window functions  “Aggregate functions without grouping”  avg, count, sum, rank, row_number, ntile…  Problem: Find closest nodes to a given node  Order by sum of path scores  Path score = 0.9^<distance> / log(1 + <number of paths>) SELECT source, target FROM ( SELECT source, target, path, distance, 0.9 ^ distance / log(1 + COUNT(*) OVER (PARTITION BY distance,target) ) AS score FROM ( … ) AS paths ) as scored_paths GROUP BY source, target ORDER BY SUM(score) DESC
  • 26. Window functions  “Aggregate functions without grouping”  avg, count, sum, rank, row_number, ntile…  Problem: Find closest nodes to a given node  Order by sum of path scores  Path score = 0.9^<distance> / log(1 + <number of paths>) SELECT source, target FROM ( SELECT source, target, path, distance, 0.9 ^ distance / log(1 + COUNT(*) OVER (PARTITION BY distance, target) ) AS n FROM ( … ) AS paths ) as scored_paths GROUP BY source, target ORDER BY SUM(score) DESC
  • 27. Window functions  “Aggregate functions without grouping”  avg, count, sum, rank, row_number, ntile…  Problem: Find closest nodes to a given node  Order by sum of path scores  Path score = 0.9^<distance> / log(1 + <number of paths>) SELECT source, target FROM ( SELECT source, target, path, distance, 0.9 ^ distance / log(1 + COUNT(*) OVER (PARTITION BY distance, target) ) AS score FROM ( … ) AS paths ) as scored_paths GROUP BY source, target ORDER BY SUM(score) DESC
  • 28. Window functions  “Aggregate functions without grouping”  avg, count, sum, rank, row_number, ntile…  Problem: Find closest nodes to a given node  Order by sum of path scores  Path score = 0.9^<distance> / log(1 + <number of paths>) SELECT source, target FROM ( SELECT source, target, path, distance, 0.9 ^ distance / log(1 + COUNT(*) OVER (PARTITION BY distance, target) ) AS score FROM ( … ) AS paths ) AS scored_paths GROUP BY source, target ORDER BY SUM(score) DESC
  • 29. Window functions  Example: Closest to Róbert Kaliňák "Bussines Park Bratislava a.s." "JARABINY a.s." "Ing. Robert Pintér" "Ing. Ján Počiatek" "Bratislava trade center a.s.“ …  1M edges, 41ms
  • 30. Additional resources  www.postgresql.org  Read the docs, seriously  www.explainextended.com  SQL guru blog  explain.depesz.com  First aid for slow queries  www.wikivs.com/wiki/MySQL_vs_PostgreSQL  MySQL vs. PostgreSQL comparison
  • 31. Real World Explain  www.postgresql.org