SlideShare a Scribd company logo
Alexei Krasner
Nov 2015
PostgreSQL as MSSQL
Alternative
What is PostgreSQL
▪ Powerful, open source object-relational database system.
▪ 15 years of active development and strong reputation.
▪ Runs on all major operating systems (Linux, Unix, Mac OS,
Windows…).
▪ Enterprise class database.
▪ Large and responsive community.
▪ Winner of the 2015 Database Trends and Applications Readers
Choice:
– The most advanced open source database.
– Best relational database.
Lets Start With Standards
▪ Fully ACID compliant.
▪ Includes most of SQL:2008 data types along with storage of
binary objects.
▪ Conforms to the ANSI-SQL:2008 standard:
– Full support for subqueries (including sub-selects).
– Read-Committed and serializable transaction isolation levels.
– Full support for Primary keys, Foreign Keys, Joins, Views, Triggers, Stored
Procedures, Restrictions (check, unique and not null) and Cascading.
– Fully relational system catalog – multiple schema per database.
▪ Native programming interfaces: Java, .NET, C/C++, Perl,
Python, ODBC
Continue With a Little of Splurging
▪ Multi-Version Concurrency Control (MVCC).
▪ Asynchronous Replication, Load Balancing and Online/Hot Backups with Point
in Time Recovery.
▪ Write Ahead Logging – fault tolerance.
▪ Performance:
– Sophisticated Query Planner/Optimizer.
– Compound, Unique, Partial and functional indexes.
▪ Supports:
– International character sets, multi-byte encodings, Unicode, locale awareness.
– Built-in Types – Geospatial, XML, JSONJSONB, Ranges and Arrays!
– NoSQL – Key-Value store with incredible performance and Full Text Search.
▪ Highly customizable and extensible.
Before We Dive – Generalized Search Tree (GiST)
▪ Advanced indexing system – different sorting and searching
algorithms:
– B-tree, B+-tree, R-tree, Partial Sum trees, ranked B+-trees etc.
– API for creating custom data types and extensible query methods for
search.
▪ Decide WHAT to persist, HOW to persist and a way to SEARCH
for it.
▪ Exceeds the general search algorithms using standard BR-
trees.
▪ Foundation for many public projects – OpenFTS and PostGIS
Features
Deep Dive
▪ MVCC
▪ Partitioning
▪ Useful Data Types
– Date and Time
– Interval
– Array
– Ranges
– JSON
– HSTORE
– XML
▪ PostGIS –
Geographic
▪ Full Text Search
▪ Server Side
Programming
▪ Backup and Restore
▪ High Availability,
Load Balancing and
Replication
– Sharding
▪ Big Data Readiness
Multi Version Concurrency Control - MVCC
▪ Reads should never block writes and vice
versa.
▪ Each transaction sees a snapshot of data
(version).
– Protection from viewing inconsistency –
transaction isolation.
▪ Avoidance of explicit locking solutions –
minimize lock contention.
▪ TableRow level locking mechanism is still
available – although proper MVCC usage
will provide performance benefits.
Partitioning – Table Inheritance
▪ Support of basic table partitioning via the table inheritance
concept.
– Includes known partitioning benefits:
▪ Improved heavy load query performance (on a single partition).
▪ Sequential scan of a partition instead of index usage.
▪ Bulk loads and deletes accomplished by adding or removing partitions.
▪ Infrequent data can be migrated to a cheaperslower storage solution.
– Range Partitioning:
▪ Table partitioned into “ranges” defined by a singleset key column (e.g.
dates).
– List Partitioning:
▪ Table partitioned into a list of discrete values as partitioning keys.
– Hundred partitions is an acceptable limit, thousands of partitions will
crucially harm performance.
Useful Data Types
▪ Date and Time – Date, Time, TimeStamp and TimeStamp with
zone.
– Converted to and from Unix time.
– Supports the INTERVAL type.
– Very convenient casting and conversion to text.
– Performance wise searching and sorting algorithms (including
zoneoffset).
▪ INTERVAL – representation of a period of time.
– Possible negative interval values (e.g. year ago).
– Intuitive arithmetic and persistence of time durations
– Easy casting and converting to relevant types.
– Performance wise searching and sorting algorithms on intervals.
Useful Data Types Cont.
▪ Array – supported as first-class datatype (actual field in a
table).
– Contain any datatype (sub arrays too).
– Parameters to functions as an array.
– Usages – Functions results, aggregations, getset array of data infrom
the application.
▪ Range – Supported as first-class datatype.
– Put range on TIME, INT or NUMERIC as a single data value.
– Possible dedicated indexes to support queries utilizing ranges.
– Exposed methods to define custom ranges.
Useful Data Types Cont.
▪ JSON – full support along with large dedicated set of utility
functions.
– Known JSONJSONB benefits – data transfer and integration standard.
– Transformation fromto types and tables.
– Retrieval and construction of JSON data.
– Parsing, casting and conversion.
▪ HSTORE – Fast key-value store as a datatype.
– NoSQL capabilities – flexibility of schema-less data store.
– Still ACID compliant.
– Interchange data between JSON and HSTORE.
Useful Data Types Cont.
▪ XML – Supported as a first-class datatype.
– Check well formedness + type-safe operations.
– Querying using Xpath.
– Producing XML content, Predicates, Processing, Mapping tables to XML
etc.
PostGIS
▪ Fully featured, reliable geospatial database project base on GiST
(Following ISO OGC)
▪ SQL types and functions to manage vector geometries (spatial data).
▪ Capabilities:
– Support for three dimensional data.
– Support for geospatial formats (KML, GeoJSON)
– Processing and analytics functions for vector and raster data.
– Map “rastering” and geo queries.
– Geo searches and reverse geo searches.
▪ Huge popularity and respect extension module – compered to ArcGIS
Full Text Search
▪ Online indexing of data and relevance ranking for database
searches.
▪ Good Enough:
– Stemming
– Ranking
– Multilingual
– Fuzzy searches (misspelling) Accent.
Server Side Programming
▪ Super Extensible – functions, data types, procedural
languages, operators, aggregates etc.
– Embedding Functions and Stored Procedures using procedural
– PL/pgSQL, PL/Tcl, PL/Perl, PL/Python
▪ Triggers – tables, views and foreign tables.
▪ Event Triggers – database global trigger.
▪ Rule System – Query modification based on given rules.
Backup and Restore
▪ Extremely flexible dump utility – migration, replication and
backups becomes more reliable, controllable and
configurable.
– Compressed format or plain SQL (human readable).
– Single table or whole database cluster.
▪ Approaches:
– SQL Dump – file with generated SQL commands. On restore the backed
up commands will be replayed.
– File system level backup – direct copy of PostgreSQL data files. Restore
will include reattaching the data files.
– Continuous archiving – backing up Write Ahead Log (WAL) files. On
restore log commands will be replayed.
High Availability, Load Balancing and Replication
Feature Shared Disk Failover
File System
Replication
Transaction Log
Shipping
Trigger-Based
Master-Standby
Replication
Statement-Based
Replication
Middleware
Asynchronous
Multimaster
Replication
Synchronous
Multimaster
Replication
Most Common
Implementation
NAS DRBD Streaming Repl. Slony pgpool-II Bucardo
Communication
Method
shared disk disk blocks WAL table rows SQL table rows
table rows and row
locks
No special hardware
required
X X X X X X
Allows multiple
master servers
X X X
No master server
overhead
X X X
No waiting for
multiple servers
X with sync off X X
Master failure will
never lose data
X X with sync on X X
Standby accept
read-only queries
with hot X X X X
Per-table granularity X X X
No conflict
resolution necessary
X X X X X
Sharding and Replication
▪ Pure Sharding:
– pg_shard – popular sharding extension for PostgreSQL.
▪ Running on Linux!
– BDR/UDR Project – Bi-Directional Replication which adds multi-master
replication to PostgreSQL.
▪ Running on Linux! Migration to windows only in a non-near future.
▪ Forked of the main PostgreSQL source.
– Postgres-XL – all purpose fully ACID open source scale-out db solution.
▪ Running on Linux!
▪ Forked of the main PostgreSQL source.
Sharding and Replication Cont.
▪ Via Replication:
– Hot Standby – Reducing read loads from Master to slaves (horizontal
scale).
– Streaming (or Bucardo, or other possible option) replication to slaves.
– Load balancing “write” queries to Master, “read” queries to slaves.
PostgreSQL and Big Data
▪ PostgreSQL was used a decade before Hadoop launched, for large
data volumes and complex analytics (as the only pure open source).
▪ Today heavily used in mid-sized warehouses and data-marts (1-10
TB).
▪ Source of code for many big data systems:
– Netezza (IBM).
– Greenplum (Pivotal) – Open Source Massively Parallel Data Warehouse.
– PipelineDB – open source, run SQL queries continuously on streaming data.
– EnterpriseDB and CitusDB (commercial license) – fully scaled out Postgres.
– Redshift (Amazon).
▪ PostgreSQL project continuously provide new features and better
performance to support big data usage.
PostgreSQL and Big Data – Features
▪ Serious NoSQL database competitor.
– JSONB advanced features and ongoing massive development plan .
– Extensions that provide NoSQL like API.
▪ Faster Sorts – text and long numeric sorting improvements.
▪ TABLESAMPLE – result set of pseudo-random number of rows
to provide a data glimpse for further analysis.
▪ Cubes, Rollups and Grouping Sets – summarizing and
exploring huge data sets in the OLAP way.
▪ BRIN indexes – much faster, suits for TBs size tables on
incrementally increasing value fields (like timestamps or
integers).
PostgreSQL and Big Data – Features Cont.
▪ Foreign Data Wrappers – linking external data (for querying
like local) for hybrid solutions.
– Foreign schema import.
– JOIN pushdowns
▪ Vacuum (garbage collection – deleting) – became parallel with
multi-process mode (maintaining several large tables at once).
▪ Scaling UP – Multicore scalability improvements.
Enterprise
Wise
▪ Open Source
▪ Reliability
▪ Authentication
▪ Logging
▪ Documentation
▪ Support
▪ Maintenance
Open Source
▪ Available under the open source license – PostgreSQL
License.
▪ Using, modifying and distributing in any openclose form.
▪ Extending and patching the relational database per
projectclient etc.
▪ Variety of modules, extensions and tools based on its open
source license.
Reliability
▪ PostgreSQL is relatively bug-free (compared to MSSQL).
▪ Very large community reporting, fixingworkarounds bugs.
▪ Constantly growing community
Authentication
▪ Trust Authentication.
▪ Password Authentication.
▪ GSSAPISSPI Authentication – using Kerberos.
▪ Ident Authentication.
▪ Peer Authentication.
▪ LDAP Authentication
▪ RADIUS Authentication.
▪ Certificate Authentication.
▪ Pluggable Authentication Modules.
Logging
▪ Logs in one place.
– Unlike MSSQL – error logs, event log, profiler log, agent log…
▪ Easily configurable logging level.
▪ Easily redirect to CSV files and shipped to tables.
▪ Easily redirect to System Log, Windows Event Log.
▪ Logs are human readable with a great sysadmin value.
Documentation
▪ There is nothing more to add than a link:
http://www.postgresql.org/docs/
Support
▪ Community based support – seems like a fast one too.
▪ Numerous companies specialized in enterprise support:
http://www.postgresql.org/support/professional_support/
▪ Enterprise database management companies like:
EnterpriseDB
▪ Total Cost of Ownership is significantly lower even with
enterprise support. (Based on reports. e.g. Gartner 2015).
vs.
MySQL
▪ ACID fully! compliant.
▪ Subqueries and Joins.
▪ Better locking mechanism.
▪ JSONJSONB support.
▪ NoSQL and Key-Value store.
▪ Advanced GIS abilities.
▪ Full Text Search abilities.
▪ Advanced and attractive data types.
▪ Way better and useful extensibility patterns.
▪ Licensing issues.
vs.
PostgreSQL
▪ Partitioning based on table inheritance
(Pros. and Cons.)
▪ Can be an overkill in case of simple read-
heavy operations. (Improved in newer
versions).
▪ Replication and Clustering (especially
multi-master). Not “there” yet, but on a
right track.
▪ Popularity – not as popular as MySQL (for
example) but gains popularity constantly,
as opposite to MySQL.
▪ Expertise issues – different syntax and
administration (compared to MSSQL).
THANK
YOU

More Related Content

PDF
PostgreSQL and MySQL
PDF
Converting from MySQL to PostgreSQL
PDF
Ten Reasons Why You Should Prefer PostgreSQL to MySQL
PPTX
When is MyRocks good?
PDF
"Advanced MySQL 5 Tuning" by Michael Monty Widenius @ eLiberatica 2007
PDF
Migrating to postgresql
PDF
MySQL Performance - Best practices
ODP
Introduction to PostgreSQL
PostgreSQL and MySQL
Converting from MySQL to PostgreSQL
Ten Reasons Why You Should Prefer PostgreSQL to MySQL
When is MyRocks good?
"Advanced MySQL 5 Tuning" by Michael Monty Widenius @ eLiberatica 2007
Migrating to postgresql
MySQL Performance - Best practices
Introduction to PostgreSQL

What's hot (20)

PDF
Upgrade to MySQL 5.7 and latest news planned for MySQL 8
PDF
Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...
PDF
01 upgrade to my sql8
PPTX
High performance and high availability proxies for MySQL
PPTX
MyDUMPER : Faster logical backups and restores
PDF
MySQL HA
PDF
Connector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document Store
PDF
Galera cluster for high availability
PDF
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
PPTX
MariaDB Galera Cluster
PDF
MySQL 5.6 Performance
PDF
InnoDB Cluster Experience (MySQL User Camp)
PDF
PGDay.Seoul 2016 lightingtalk
PDF
Get to know PostgreSQL!
PDF
NoSQL databases
PDF
What’s New In PostgreSQL 9.3
PDF
MyRocks in MariaDB: why and how
PPT
Fudcon talk.ppt
PDF
What's New in PostgreSQL 9.6
 
PDF
Run Cloud Native MySQL NDB Cluster in Kubernetes
Upgrade to MySQL 5.7 and latest news planned for MySQL 8
Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...
01 upgrade to my sql8
High performance and high availability proxies for MySQL
MyDUMPER : Faster logical backups and restores
MySQL HA
Connector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document Store
Galera cluster for high availability
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
MariaDB Galera Cluster
MySQL 5.6 Performance
InnoDB Cluster Experience (MySQL User Camp)
PGDay.Seoul 2016 lightingtalk
Get to know PostgreSQL!
NoSQL databases
What’s New In PostgreSQL 9.3
MyRocks in MariaDB: why and how
Fudcon talk.ppt
What's New in PostgreSQL 9.6
 
Run Cloud Native MySQL NDB Cluster in Kubernetes
Ad

Viewers also liked (6)

PDF
Lightening Talk - PostgreSQL Worst Practices
PDF
Lessons PostgreSQL learned from commercial databases, and didn’t
PDF
PostgreSQL on Amazon RDS
PDF
Query Parallelism in PostgreSQL: What's coming next?
PDF
Lammasi pitch deck
PDF
PostgreSQL WAL for DBAs
Lightening Talk - PostgreSQL Worst Practices
Lessons PostgreSQL learned from commercial databases, and didn’t
PostgreSQL on Amazon RDS
Query Parallelism in PostgreSQL: What's coming next?
Lammasi pitch deck
PostgreSQL WAL for DBAs
Ad

Similar to PostgreSQL as an Alternative to MSSQL (20)

PDF
Gcp data engineer
PDF
Drill architecture 20120913
PDF
GCP Data Engineer cheatsheet
PPTX
Nosql databases
PPTX
An AMIS Overview of Oracle database 12c (12.1)
PDF
Hoodie - DataEngConf 2017
PDF
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
PDF
Sql Server2008
PDF
PDF
NoSql and it's introduction features-Unit-1.pdf
PDF
The Future of Fast Databases: Lessons from a Decade of QuestDB
PPTX
Module 2.2 Introduction to NoSQL Databases.pptx
PPTX
Master.pptx
PPTX
No sql databases
PPTX
ElasticSearch as (only) datastore
PPTX
NoSQL.pptx
PPTX
cours database pour etudiant NoSQL (1).pptx
PPT
Java Developers, make the database work for you (NLJUG JFall 2010)
PDF
Introduction to ClustrixDB
PPTX
This is training for spark SQL essential
Gcp data engineer
Drill architecture 20120913
GCP Data Engineer cheatsheet
Nosql databases
An AMIS Overview of Oracle database 12c (12.1)
Hoodie - DataEngConf 2017
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
Sql Server2008
NoSql and it's introduction features-Unit-1.pdf
The Future of Fast Databases: Lessons from a Decade of QuestDB
Module 2.2 Introduction to NoSQL Databases.pptx
Master.pptx
No sql databases
ElasticSearch as (only) datastore
NoSQL.pptx
cours database pour etudiant NoSQL (1).pptx
Java Developers, make the database work for you (NLJUG JFall 2010)
Introduction to ClustrixDB
This is training for spark SQL essential

Recently uploaded (20)

PDF
medical staffing services at VALiNTRY
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PDF
System and Network Administraation Chapter 3
PPTX
FLIGHT TICKET RESERVATION SYSTEM | FLIGHT BOOKING ENGINE API
PDF
5 Lead Qualification Frameworks Every Sales Team Should Use
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
AI in Product Development-omnex systems
PDF
Softaken Excel to vCard Converter Software.pdf
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPTX
Materi_Pemrograman_Komputer-Looping.pptx
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
L1 - Introduction to python Backend.pptx
PPT
Introduction Database Management System for Course Database
PPTX
Odoo POS Development Services by CandidRoot Solutions
PPTX
ai tools demonstartion for schools and inter college
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PPTX
Online Work Permit System for Fast Permit Processing
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
medical staffing services at VALiNTRY
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
System and Network Administraation Chapter 3
FLIGHT TICKET RESERVATION SYSTEM | FLIGHT BOOKING ENGINE API
5 Lead Qualification Frameworks Every Sales Team Should Use
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
AI in Product Development-omnex systems
Softaken Excel to vCard Converter Software.pdf
VVF-Customer-Presentation2025-Ver1.9.pptx
Materi_Pemrograman_Komputer-Looping.pptx
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
L1 - Introduction to python Backend.pptx
Introduction Database Management System for Course Database
Odoo POS Development Services by CandidRoot Solutions
ai tools demonstartion for schools and inter college
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Online Work Permit System for Fast Permit Processing
How Creative Agencies Leverage Project Management Software.pdf
Which alternative to Crystal Reports is best for small or large businesses.pdf

PostgreSQL as an Alternative to MSSQL

  • 1. Alexei Krasner Nov 2015 PostgreSQL as MSSQL Alternative
  • 2. What is PostgreSQL ▪ Powerful, open source object-relational database system. ▪ 15 years of active development and strong reputation. ▪ Runs on all major operating systems (Linux, Unix, Mac OS, Windows…). ▪ Enterprise class database. ▪ Large and responsive community. ▪ Winner of the 2015 Database Trends and Applications Readers Choice: – The most advanced open source database. – Best relational database.
  • 3. Lets Start With Standards ▪ Fully ACID compliant. ▪ Includes most of SQL:2008 data types along with storage of binary objects. ▪ Conforms to the ANSI-SQL:2008 standard: – Full support for subqueries (including sub-selects). – Read-Committed and serializable transaction isolation levels. – Full support for Primary keys, Foreign Keys, Joins, Views, Triggers, Stored Procedures, Restrictions (check, unique and not null) and Cascading. – Fully relational system catalog – multiple schema per database. ▪ Native programming interfaces: Java, .NET, C/C++, Perl, Python, ODBC
  • 4. Continue With a Little of Splurging ▪ Multi-Version Concurrency Control (MVCC). ▪ Asynchronous Replication, Load Balancing and Online/Hot Backups with Point in Time Recovery. ▪ Write Ahead Logging – fault tolerance. ▪ Performance: – Sophisticated Query Planner/Optimizer. – Compound, Unique, Partial and functional indexes. ▪ Supports: – International character sets, multi-byte encodings, Unicode, locale awareness. – Built-in Types – Geospatial, XML, JSONJSONB, Ranges and Arrays! – NoSQL – Key-Value store with incredible performance and Full Text Search. ▪ Highly customizable and extensible.
  • 5. Before We Dive – Generalized Search Tree (GiST) ▪ Advanced indexing system – different sorting and searching algorithms: – B-tree, B+-tree, R-tree, Partial Sum trees, ranked B+-trees etc. – API for creating custom data types and extensible query methods for search. ▪ Decide WHAT to persist, HOW to persist and a way to SEARCH for it. ▪ Exceeds the general search algorithms using standard BR- trees. ▪ Foundation for many public projects – OpenFTS and PostGIS
  • 6. Features Deep Dive ▪ MVCC ▪ Partitioning ▪ Useful Data Types – Date and Time – Interval – Array – Ranges – JSON – HSTORE – XML ▪ PostGIS – Geographic ▪ Full Text Search ▪ Server Side Programming ▪ Backup and Restore ▪ High Availability, Load Balancing and Replication – Sharding ▪ Big Data Readiness
  • 7. Multi Version Concurrency Control - MVCC ▪ Reads should never block writes and vice versa. ▪ Each transaction sees a snapshot of data (version). – Protection from viewing inconsistency – transaction isolation. ▪ Avoidance of explicit locking solutions – minimize lock contention. ▪ TableRow level locking mechanism is still available – although proper MVCC usage will provide performance benefits.
  • 8. Partitioning – Table Inheritance ▪ Support of basic table partitioning via the table inheritance concept. – Includes known partitioning benefits: ▪ Improved heavy load query performance (on a single partition). ▪ Sequential scan of a partition instead of index usage. ▪ Bulk loads and deletes accomplished by adding or removing partitions. ▪ Infrequent data can be migrated to a cheaperslower storage solution. – Range Partitioning: ▪ Table partitioned into “ranges” defined by a singleset key column (e.g. dates). – List Partitioning: ▪ Table partitioned into a list of discrete values as partitioning keys. – Hundred partitions is an acceptable limit, thousands of partitions will crucially harm performance.
  • 9. Useful Data Types ▪ Date and Time – Date, Time, TimeStamp and TimeStamp with zone. – Converted to and from Unix time. – Supports the INTERVAL type. – Very convenient casting and conversion to text. – Performance wise searching and sorting algorithms (including zoneoffset). ▪ INTERVAL – representation of a period of time. – Possible negative interval values (e.g. year ago). – Intuitive arithmetic and persistence of time durations – Easy casting and converting to relevant types. – Performance wise searching and sorting algorithms on intervals.
  • 10. Useful Data Types Cont. ▪ Array – supported as first-class datatype (actual field in a table). – Contain any datatype (sub arrays too). – Parameters to functions as an array. – Usages – Functions results, aggregations, getset array of data infrom the application. ▪ Range – Supported as first-class datatype. – Put range on TIME, INT or NUMERIC as a single data value. – Possible dedicated indexes to support queries utilizing ranges. – Exposed methods to define custom ranges.
  • 11. Useful Data Types Cont. ▪ JSON – full support along with large dedicated set of utility functions. – Known JSONJSONB benefits – data transfer and integration standard. – Transformation fromto types and tables. – Retrieval and construction of JSON data. – Parsing, casting and conversion. ▪ HSTORE – Fast key-value store as a datatype. – NoSQL capabilities – flexibility of schema-less data store. – Still ACID compliant. – Interchange data between JSON and HSTORE.
  • 12. Useful Data Types Cont. ▪ XML – Supported as a first-class datatype. – Check well formedness + type-safe operations. – Querying using Xpath. – Producing XML content, Predicates, Processing, Mapping tables to XML etc.
  • 13. PostGIS ▪ Fully featured, reliable geospatial database project base on GiST (Following ISO OGC) ▪ SQL types and functions to manage vector geometries (spatial data). ▪ Capabilities: – Support for three dimensional data. – Support for geospatial formats (KML, GeoJSON) – Processing and analytics functions for vector and raster data. – Map “rastering” and geo queries. – Geo searches and reverse geo searches. ▪ Huge popularity and respect extension module – compered to ArcGIS
  • 14. Full Text Search ▪ Online indexing of data and relevance ranking for database searches. ▪ Good Enough: – Stemming – Ranking – Multilingual – Fuzzy searches (misspelling) Accent.
  • 15. Server Side Programming ▪ Super Extensible – functions, data types, procedural languages, operators, aggregates etc. – Embedding Functions and Stored Procedures using procedural – PL/pgSQL, PL/Tcl, PL/Perl, PL/Python ▪ Triggers – tables, views and foreign tables. ▪ Event Triggers – database global trigger. ▪ Rule System – Query modification based on given rules.
  • 16. Backup and Restore ▪ Extremely flexible dump utility – migration, replication and backups becomes more reliable, controllable and configurable. – Compressed format or plain SQL (human readable). – Single table or whole database cluster. ▪ Approaches: – SQL Dump – file with generated SQL commands. On restore the backed up commands will be replayed. – File system level backup – direct copy of PostgreSQL data files. Restore will include reattaching the data files. – Continuous archiving – backing up Write Ahead Log (WAL) files. On restore log commands will be replayed.
  • 17. High Availability, Load Balancing and Replication Feature Shared Disk Failover File System Replication Transaction Log Shipping Trigger-Based Master-Standby Replication Statement-Based Replication Middleware Asynchronous Multimaster Replication Synchronous Multimaster Replication Most Common Implementation NAS DRBD Streaming Repl. Slony pgpool-II Bucardo Communication Method shared disk disk blocks WAL table rows SQL table rows table rows and row locks No special hardware required X X X X X X Allows multiple master servers X X X No master server overhead X X X No waiting for multiple servers X with sync off X X Master failure will never lose data X X with sync on X X Standby accept read-only queries with hot X X X X Per-table granularity X X X No conflict resolution necessary X X X X X
  • 18. Sharding and Replication ▪ Pure Sharding: – pg_shard – popular sharding extension for PostgreSQL. ▪ Running on Linux! – BDR/UDR Project – Bi-Directional Replication which adds multi-master replication to PostgreSQL. ▪ Running on Linux! Migration to windows only in a non-near future. ▪ Forked of the main PostgreSQL source. – Postgres-XL – all purpose fully ACID open source scale-out db solution. ▪ Running on Linux! ▪ Forked of the main PostgreSQL source.
  • 19. Sharding and Replication Cont. ▪ Via Replication: – Hot Standby – Reducing read loads from Master to slaves (horizontal scale). – Streaming (or Bucardo, or other possible option) replication to slaves. – Load balancing “write” queries to Master, “read” queries to slaves.
  • 20. PostgreSQL and Big Data ▪ PostgreSQL was used a decade before Hadoop launched, for large data volumes and complex analytics (as the only pure open source). ▪ Today heavily used in mid-sized warehouses and data-marts (1-10 TB). ▪ Source of code for many big data systems: – Netezza (IBM). – Greenplum (Pivotal) – Open Source Massively Parallel Data Warehouse. – PipelineDB – open source, run SQL queries continuously on streaming data. – EnterpriseDB and CitusDB (commercial license) – fully scaled out Postgres. – Redshift (Amazon). ▪ PostgreSQL project continuously provide new features and better performance to support big data usage.
  • 21. PostgreSQL and Big Data – Features ▪ Serious NoSQL database competitor. – JSONB advanced features and ongoing massive development plan . – Extensions that provide NoSQL like API. ▪ Faster Sorts – text and long numeric sorting improvements. ▪ TABLESAMPLE – result set of pseudo-random number of rows to provide a data glimpse for further analysis. ▪ Cubes, Rollups and Grouping Sets – summarizing and exploring huge data sets in the OLAP way. ▪ BRIN indexes – much faster, suits for TBs size tables on incrementally increasing value fields (like timestamps or integers).
  • 22. PostgreSQL and Big Data – Features Cont. ▪ Foreign Data Wrappers – linking external data (for querying like local) for hybrid solutions. – Foreign schema import. – JOIN pushdowns ▪ Vacuum (garbage collection – deleting) – became parallel with multi-process mode (maintaining several large tables at once). ▪ Scaling UP – Multicore scalability improvements.
  • 23. Enterprise Wise ▪ Open Source ▪ Reliability ▪ Authentication ▪ Logging ▪ Documentation ▪ Support ▪ Maintenance
  • 24. Open Source ▪ Available under the open source license – PostgreSQL License. ▪ Using, modifying and distributing in any openclose form. ▪ Extending and patching the relational database per projectclient etc. ▪ Variety of modules, extensions and tools based on its open source license.
  • 25. Reliability ▪ PostgreSQL is relatively bug-free (compared to MSSQL). ▪ Very large community reporting, fixingworkarounds bugs. ▪ Constantly growing community
  • 26. Authentication ▪ Trust Authentication. ▪ Password Authentication. ▪ GSSAPISSPI Authentication – using Kerberos. ▪ Ident Authentication. ▪ Peer Authentication. ▪ LDAP Authentication ▪ RADIUS Authentication. ▪ Certificate Authentication. ▪ Pluggable Authentication Modules.
  • 27. Logging ▪ Logs in one place. – Unlike MSSQL – error logs, event log, profiler log, agent log… ▪ Easily configurable logging level. ▪ Easily redirect to CSV files and shipped to tables. ▪ Easily redirect to System Log, Windows Event Log. ▪ Logs are human readable with a great sysadmin value.
  • 28. Documentation ▪ There is nothing more to add than a link: http://www.postgresql.org/docs/
  • 29. Support ▪ Community based support – seems like a fast one too. ▪ Numerous companies specialized in enterprise support: http://www.postgresql.org/support/professional_support/ ▪ Enterprise database management companies like: EnterpriseDB ▪ Total Cost of Ownership is significantly lower even with enterprise support. (Based on reports. e.g. Gartner 2015).
  • 30. vs. MySQL ▪ ACID fully! compliant. ▪ Subqueries and Joins. ▪ Better locking mechanism. ▪ JSONJSONB support. ▪ NoSQL and Key-Value store. ▪ Advanced GIS abilities. ▪ Full Text Search abilities. ▪ Advanced and attractive data types. ▪ Way better and useful extensibility patterns. ▪ Licensing issues.
  • 31. vs. PostgreSQL ▪ Partitioning based on table inheritance (Pros. and Cons.) ▪ Can be an overkill in case of simple read- heavy operations. (Improved in newer versions). ▪ Replication and Clustering (especially multi-master). Not “there” yet, but on a right track. ▪ Popularity – not as popular as MySQL (for example) but gains popularity constantly, as opposite to MySQL. ▪ Expertise issues – different syntax and administration (compared to MSSQL).