SlideShare a Scribd company logo
Scaling PostgreSQL  with GridSQL
Who Am I? Jim Mlodgenski Co-organizer of NYCPUG
Founder of Cirrus Technologies
Former Chief Architect of EnterpriseDB
Agenda What is GridSQL?
Architecture
Query Flow
Scaling
Limitations
What is GridSQL? “ Shared-Nothing”, distributed data architecture. Leverage the power of multiple commodity servers while appearing as a single database to the application Essentially...  Open Source
Greenplum, Netezza or Teradata
GridSQL Details Designed for Parallel Querying
Not just “Read-Only”, can execute UPDATE, DELETE
Data Loader for parallel loading
Standard connectivity via PostgreSQL compatible connectors: JDBC, ODBC, ADO.NET, libpq (psql)
What GridSQL is not? A replication solution like Slony or Bucardo
A high availability solution like Streaming Replication in PostgreSQL 9.0
A scalable transactional solution like PostgresXC
An elastic, eventually consistent NoSQL database
Configuration Can be configured for multiple logical “nodes” per physical server Take advantage of multi-core processors Tables may be either replicated or partitioned
Replicated tables for static lookup data or dimensions Partitioned tables for large fact tables
Partitioning Tables may simultaneously use GridSQL Partitioning with Constraint Exclusion Partitioning Large queries scan a much smaller subset of data by using subtables
Since each subtable is also partitioned across nodes, they are scanned in parallel
Queries execute much faster
Architecture Loosely coupled, shared-nothing architecture
Data repositories Metadata database
GridSQL database GridSQL processes Central coordinator
Agents
Query Optimization Cost Based Optimizer Takes into account Row Shipping (expensive) Looks for joins with replicated tables Can be done locally

More Related Content

PDF
Scaling PostreSQL with Stado
ODP
Multi-Master Replication with Slony
PDF
Fun with click house window functions webinar slides 2021-08-19
PDF
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
PPTX
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...
PDF
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
PPTX
Hive query optimization infinity
PDF
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
Scaling PostreSQL with Stado
Multi-Master Replication with Slony
Fun with click house window functions webinar slides 2021-08-19
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
Hive query optimization infinity
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...

What's hot (20)

PDF
How to teach an elephant to rock'n'roll
PDF
Performance features12102 doag_2014
PDF
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
PDF
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
PDF
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
PDF
ClickHouse 2018. How to stop waiting for your queries to complete and start ...
PPTX
Join optimization in hive
PDF
Photon Technical Deep Dive: How to Think Vectorized
PDF
Developers' mDay 2017. - Bogdan Kecman Oracle
PDF
Map reduce: beyond word count
PDF
Oracle Parallel Distribution and 12c Adaptive Plans
PDF
Data preparation covariates
 
PDF
Oracle Join Methods and 12c Adaptive Plans
PDF
Table partitioning in PostgreSQL + Rails
PDF
Spatial query on vanilla databases
PDF
Star Transformation, 12c Adaptive Bitmap Pruning and In-Memory option
PPTX
Webinar 2017. Supercharge your analytics with ClickHouse. Vadim Tkachenko
PDF
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
PDF
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
PDF
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
How to teach an elephant to rock'n'roll
Performance features12102 doag_2014
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
ClickHouse 2018. How to stop waiting for your queries to complete and start ...
Join optimization in hive
Photon Technical Deep Dive: How to Think Vectorized
Developers' mDay 2017. - Bogdan Kecman Oracle
Map reduce: beyond word count
Oracle Parallel Distribution and 12c Adaptive Plans
Data preparation covariates
 
Oracle Join Methods and 12c Adaptive Plans
Table partitioning in PostgreSQL + Rails
Spatial query on vanilla databases
Star Transformation, 12c Adaptive Bitmap Pruning and In-Memory option
Webinar 2017. Supercharge your analytics with ClickHouse. Vadim Tkachenko
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
Ad

Similar to Scaling PostgreSQL With GridSQL (20)

ODP
Basic Query Tuning Primer - Pg West 2009
ODP
Basic Query Tuning Primer
PPTX
Designing for DynamoDB - Serverless Sydney - Feb 2020
PDF
Gcp data engineer
PDF
GCP Data Engineer cheatsheet
PDF
Redshift deep dive
PPTX
Scalability: Rdbms Vs Other Data Stores
PPTX
The End of a Myth: Ultra-Scalable Transactional Management
PDF
Streaming SQL
ODP
Pro PostgreSQL
PDF
The Future of Distributed Databases is Relational
PPTX
Sql analytic queries tips
PDF
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
PPT
Column-vs-Row-how-different-are-they.ppt
PDF
Why NoSQL Makes Sense
PDF
NewSQL Database Overview
PDF
Intro to NoSQL and MongoDB
PPTX
Using SQL-MapReduce for Advanced Analytics
PDF
Streaming SQL
PDF
Intro to Table-Grouping™ technology
Basic Query Tuning Primer - Pg West 2009
Basic Query Tuning Primer
Designing for DynamoDB - Serverless Sydney - Feb 2020
Gcp data engineer
GCP Data Engineer cheatsheet
Redshift deep dive
Scalability: Rdbms Vs Other Data Stores
The End of a Myth: Ultra-Scalable Transactional Management
Streaming SQL
Pro PostgreSQL
The Future of Distributed Databases is Relational
Sql analytic queries tips
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Column-vs-Row-how-different-are-they.ppt
Why NoSQL Makes Sense
NewSQL Database Overview
Intro to NoSQL and MongoDB
Using SQL-MapReduce for Advanced Analytics
Streaming SQL
Intro to Table-Grouping™ technology
Ad

More from Jim Mlodgenski (10)

PDF
Strategic autovacuum
PDF
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
PDF
Oracle postgre sql-mirgration-top-10-mistakes
PDF
Profiling PL/pgSQL
PDF
Debugging Your PL/pgSQL Code
PDF
An Introduction To PostgreSQL Triggers
PDF
PostgreSQL Procedural Languages: Tips, Tricks and Gotchas
ODP
Introduction to PostgreSQL
ODP
Postgresql Federation
PPT
Leveraging Hadoop in your PostgreSQL Environment
Strategic autovacuum
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Oracle postgre sql-mirgration-top-10-mistakes
Profiling PL/pgSQL
Debugging Your PL/pgSQL Code
An Introduction To PostgreSQL Triggers
PostgreSQL Procedural Languages: Tips, Tricks and Gotchas
Introduction to PostgreSQL
Postgresql Federation
Leveraging Hadoop in your PostgreSQL Environment

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPT
Teaching material agriculture food technology
PDF
Electronic commerce courselecture one. Pdf
PDF
Transforming Manufacturing operations through Intelligent Integrations
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PPTX
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
PDF
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
madgavkar20181017ppt McKinsey Presentation.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
NewMind AI Weekly Chronicles - August'25 Week I
Teaching material agriculture food technology
Electronic commerce courselecture one. Pdf
Transforming Manufacturing operations through Intelligent Integrations
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Review of recent advances in non-invasive hemoglobin estimation
Chapter 3 Spatial Domain Image Processing.pdf
GamePlan Trading System Review: Professional Trader's Honest Take
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf
Understanding_Digital_Forensics_Presentation.pptx
Big Data Technologies - Introduction.pptx
madgavkar20181017ppt McKinsey Presentation.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx

Scaling PostgreSQL With GridSQL

  • 1. Scaling PostgreSQL with GridSQL
  • 2. Who Am I? Jim Mlodgenski Co-organizer of NYCPUG
  • 3. Founder of Cirrus Technologies
  • 4. Former Chief Architect of EnterpriseDB
  • 5. Agenda What is GridSQL?
  • 10. What is GridSQL? “ Shared-Nothing”, distributed data architecture. Leverage the power of multiple commodity servers while appearing as a single database to the application Essentially... Open Source
  • 12. GridSQL Details Designed for Parallel Querying
  • 13. Not just “Read-Only”, can execute UPDATE, DELETE
  • 14. Data Loader for parallel loading
  • 15. Standard connectivity via PostgreSQL compatible connectors: JDBC, ODBC, ADO.NET, libpq (psql)
  • 16. What GridSQL is not? A replication solution like Slony or Bucardo
  • 17. A high availability solution like Streaming Replication in PostgreSQL 9.0
  • 18. A scalable transactional solution like PostgresXC
  • 19. An elastic, eventually consistent NoSQL database
  • 20. Configuration Can be configured for multiple logical “nodes” per physical server Take advantage of multi-core processors Tables may be either replicated or partitioned
  • 21. Replicated tables for static lookup data or dimensions Partitioned tables for large fact tables
  • 22. Partitioning Tables may simultaneously use GridSQL Partitioning with Constraint Exclusion Partitioning Large queries scan a much smaller subset of data by using subtables
  • 23. Since each subtable is also partitioned across nodes, they are scanned in parallel
  • 25. Architecture Loosely coupled, shared-nothing architecture
  • 27. GridSQL database GridSQL processes Central coordinator
  • 29. Query Optimization Cost Based Optimizer Takes into account Row Shipping (expensive) Looks for joins with replicated tables Can be done locally
  • 30. Looks for joins between tables on partitioned columns
  • 31. Aggregation First set of aggregates done in parallel at the nodes
  • 32. Like groups of intermediate results shipped to same target node
  • 33. Second aggregation done in parallel
  • 34. Coordinator streams in node results, combining on the fly and sending to client result set, performing a merge sort if ORDER BY present
  • 35. Two Phase Aggregation SUM SUM(stat1)
  • 37. SUM2 (SUM(stat1)) / SUM2 (COUNT(stat1))
  • 38. Creating Tables Tables can be partitioned or replicated CREATE TABLE region (r_regionkey INTEGER NOT NULL, r_name CHAR(25) NOT NULL, r_comment VARCHAR(152)) REPLICATED;
  • 39. Creating Tables CREATE TABLE orders ( o_orderkey INTEGER NOT NULL, o_custkey INTEGER NOT NULL, o_orderstatus CHAR(1) NOT NULL, o_totalprice DECIMAL(15,2) NOT NULL, o_orderdate DATE NOT NULL, o_orderpriority CHAR(15) NOT NULL, o_clerk CHAR(15) NOT NULL, o_shippriority INTEGER NOT NULL, o_comment VARCHAR(79) NOT NULL) PARTITIONING KEY o_orderkey ON ALL;
  • 40. DBT3 : Query 1 SELECT l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order FROM lineitem WHERE l_shipdate <= date'1998-12-01' - interval '90 days' GROUP BY l_returnflag, l_linestatus ORDER BY l_returnflag, l_linestatus; Results l_returnflag | l_linestatus | sum_qty | sum_base_price | ... | count_order --------------+--------------+----------+----------------+ ... +------------- A | F | 37734104 | 56586654000 | ... | 1478493 N | F | 991417 | 1487505700 | ... | 38854 N | O | 74473520 | 111717540000 | ... | 2920374 R | F | 37719752 | 56567792000 | ... | 1478870 (4 rows)
  • 41. Query 1 – Execution (no Agents) Go to Animation Slide
  • 42. DBT3 : Query 7 Results supp_nation | cust_nation | l_year | revenue ---------------------------+---------------------------+--------+-------------------- GERMANY | UNITED STATES | 1995 | 51883178.038909949 GERMANY | UNITED STATES | 1996 | 52528107.076993272 UNITED STATES | GERMANY | 1995 | 51546631.033109233 UNITED STATES | GERMANY | 1996 | 53108668.056805529 (4 rows) SELECT supp_nation, cust_nation, l_year, sum(volume) as revenue FROM (SELECT n1.n_name as supp_nation, n2.n_name as cust_nation, extract(year from l_shipdate) as l_year, l_extendedprice * (1 - l_discount) as volume FROM supplier, lineitem, orders, customer, nation n1, nation n2 WHERE s_suppkey = l_suppkey AND o_orderkey = l_orderkey AND c_custkey = o_custkey AND s_nationkey = n1.n_nationkey AND c_nationkey = n2.n_nationkey AND ((n1.n_name = 'GERMANY' and n2.n_name = 'UNITED STATES') or (n1.n_name = 'UNITED STATES' and n2.n_name = 'GERMANY')) AND l_shipdate between date '1995-01-01' and date '1996-12-31' ) AS shipping GROUP BY supp_nation, cust_nation, l_year ORDER BY supp_nation, cust_nation, l_year;
  • 43. Query 7 – Execution (with Agents) Go to Animation Slide
  • 45. Scalability A few DBT3 queries on Amazon EC2 Using PostgreSQL 9.0
  • 46. Scalability SELECT l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order FROM lineitem WHERE l_shipdate <= date'1998-12-01' - interval '90 days' GROUP BY l_returnflag, l_linestatus ORDER BY l_returnflag, l_linestatus;
  • 47. Scalability SELECT supp_nation, cust_nation, l_year, sum(volume) as revenue FROM (SELECT n1.n_name as supp_nation, n2.n_name as cust_nation, extract(year from l_shipdate) as l_year, l_extendedprice * (1 - l_discount) as volume FROM supplier, lineitem, orders, customer, nation n1, nation n2 WHERE s_suppkey = l_suppkey AND o_orderkey = l_orderkey AND c_custkey = o_custkey AND s_nationkey = n1.n_nationkey AND c_nationkey = n2.n_nationkey AND ((n1.n_name = 'GERMANY' and n2.n_name = 'UNITED STATES') or (n1.n_name = 'UNITED STATES' and n2.n_name = 'GERMANY')) AND l_shipdate between date '1995-01-01' and date '1996-12-31' ) AS shipping GROUP BY supp_nation, cust_nation, l_year ORDER BY supp_nation, cust_nation, l_year;
  • 48. Limitations SQL Support Uses its own parser and optimizer so: No Window Functions
  • 50. No Full Text Search
  • 52. Transaction Performance Single row Insert, Update, or Delete are slow compared to a single PostgreSQL instance The data must make an additional network trip to be committed
  • 53. All partitioned rows must be hashed to be mapped to the proper node
  • 54. All replicated rows must be committed to all nodes Use “gs-loader” for bulk loading for better performance
  • 55. High Availability No heartbeat or fail-over control in the coordinator High Availability for each PostgreSQL node must be configured separately
  • 56. Streaming replication can be ideal for this Getting a consistent backup of the entire GridSQL database is difficult Must ensure there are no transaction are occurring
  • 57. Backup each node separately
  • 58. Adding Nodes Requires Downtime Data must be manually reloaded to partition the data to the new node With planning, the process can be fast with no mapping of data Run multiple PostgreSQL instances on each physical server and move the PostgreSQL instances to new hardware as needed
  • 59. Interesting Side Note GridSQL scales well in a cloud environment
  • 60. The results are dependent on the cloud vendor
  • 61. Summary GridSQL can improve performance tremendously of PostgreSQL queries
  • 62. GridSQL can scale linearly as more nodes are added
  • 63. GridSQL is open source so if the limitations are an issue,
  • 65. Download GridSQL at: http://sourceforge.net/projects/gridsql/ Jim Mlodgenski Email: [email_address] Twitter: @jim_mlodgenski