Lessons PostgreSQL learned from commercial databases, and didn’tPGConf APAC
This is the ppt used by Illay for his presentation at pgDay Asia 2016 - "Lessons PostgreSQL learned from commercial
databases, and didn’t". The talk takes you through some of the really good things that PostgreSQL has done really well and somethings that PostgreSQL can learn from other databases
PostgreSQL Enterprise Class Features and CapabilitiesPGConf APAC
These are the slides used by Venkar from Fujitsu for his presentation at pgDay Asia 2016. He spoke about some of the Enterprise Class features of PostgreSQL database.
Why we love pgpool-II and why we hate it!PGConf APAC
Pgpool is middleware that works between PostgreSQL clients and servers to provide connection pooling, replication, and load balancing. The presenter's company deployed pgpool in various architectures including master-slave replication and load balancing configurations. They experienced some issues with pgpool like connection errors when using application pooling, lack of guaranteed connection reuse, and bugs. Tips are provided like ensuring synchronized server times and restricting health check users. Pgpool may not be best when automatic node rejoining is needed or during network instability.
Postgrtesql as a NoSQL Document Store - The JSON/JSONB data typeJumping Bean
Our presentation from PGDay Asia 2016 on the JSON/JSONB data type in Postgres and how you can have the best of both the SQL and NoSQL worlds in one. There is JavaScript in my SQL.
This document summarizes a presentation about Presto, an open source distributed SQL query engine. It discusses Presto's distributed and plug-in architecture, query planning process, and cluster configuration options. For architecture, it explains that Presto uses coordinators, workers, and connectors to distribute queries across data sources. For query planning, it shows how SQL queries are converted into logical and physical query plans with stages, tasks, and splits. For configuration, it reviews single-server, multi-worker, and multi-coordinator cluster topologies. It also provides an overview of Presto's recent updates.
The document summarizes the speaker's use of Presto for log analysis. Key points include:
- Presto was selected due to familiarity from others and ease of use compared to other options.
- Presto is used for batch queries with Hive and interactive queries. Results are accessed through Cognos using Prestogres.
- Managing Presto involves deployment with Ansible, configuration tuning, and monitoring with tools like GrowthForecast and jstat2gf.
- While Presto has been stable overall, the speaker notes some version upgrade issues but sees leverage from its frequent updates.
In the engineering world, we don’t always have the luxury of owning our data pipelines end to end. If only we could influence those outside components… Well, we tried, and this our story - replete with failure, discovery, and the serenity of enlightenment. Join us on our journey as we learned more than we ever wanted to know about compression in different Apache projects, deployed our own ingestion pipeline in Apache Flume, and ultimately unified these in a robust framework built on Apache Apex handling 1 TB of data per day. We end with some reflections on the joys and tribulations of the open source realm and some key lessons for other large applications atop multiple Apache solutions.
This is the presentation used by Umari Shahid or 2nd Quadrant for his Presentation at pgDay Asia 2016. It takes you through usage of TABLESAMPLE clause of SELECT queries introduced in PostgreSQL v9.5.
Your Guide to Streaming - The Engineer's PerspectiveIlya Ganelin
It feels like every week there's a new open-source streaming platform out there. Yet, if you only look at the descriptions, performance metrics, or even the architecture, they all start to look exactly the same! In short, nothing really differentiates itself - whether it be Storm, Flink, Apex, GearPumk, Samza, KafkaStreams, AkkaStreams, or any of the other myriad technologies. So if they all look the same, how do you really pick a streaming platform to solve the problem that YOU have? This talk is about how to really compare these platforms, and it turns out that they do have their key differences, they're just not the ones you usually think about. The way that you need to compare these systems if you're building something to last, a well-engineered system, is to look at how they handle durability, availability, how easy they are to install and use, and how they deal with failures.
1. The presenter discusses their use of Presto for analytics at their company, including joining data across different data sources and using window functions on MySQL data.
2. They explain how they integrate Presto with other tools like re:dash for visualization and Embulk for ETL workflows.
3. While Presto solves many of their problems, they still require some ETL and have encountered issues like large repository sizes and coordinator bottlenecks.
At Noon – The Social Learning Platform, on a daily basis we process close to 100M audio, sketch samples from more than 80K students to help measure the voice & sketch quality of our online classrooms. This talk explores the need for real time analytics in EdTech, how we built a real time analytics platform on Apache Druid & Apache Flink to provide realtime feedback on classroom quality & engagement metrics. We will also share some of the lessons we learnt along the way.
This document discusses benchmarking TPC-H queries in MongoDB compared to MySQL. It introduces MongoDB and describes setting up the TPC-H data by embedding all tables into a single MongoDB collection. Six sample queries are presented and run using Map-Reduce and the Aggregation Framework. Benchmark results show MongoDB performing worse than MySQL on all queries due to data conversion difficulties and MongoDB's immature Aggregation Framework. The document concludes that while MongoDB is suitable for some applications, it is not well-suited to complex queries like those in TPC-H due to its lack of standard query language and server-side processing abilities.
Query Parallelism in PostgreSQL: What's coming next?PGConf APAC
This presentation was presented by Dilip Kumar (a PostgreSQL contributor) at pgDay Asia 2017. The presentation talks about Prallel query features released in v9.6, the infrastructure for the prallel query feature which was built in previous versions and what is the roadmap for prallel query.
Tempto is a product test framework that allows developers to write and execute tests for SQL databases running on Hadoop. Individual test requirements such as data generation, HDFS file copy/storage of generated data and schema creation are expressed declaratively and are automatically fulfilled by the framework. Developers can write tests using Java (using a TestNG like paradigm and AssertJ style assertion) or by providing query files with expected results. We will show how we use it for presto product tests.
Benchto is a benchmark framework that provides an easy and manageable way to define, run and analyze macro benchmarks in clustered environment. Understanding behavior of distributed systems is hard and requires good visibility intostate of the cluster and internals of tested system. This project was developed for repeatable benchmarking ofHadoop SQL engines, most importantly Presto.
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)Martin Traverso
This document summarizes Presto, an analytics engine used at Facebook. It provides ad-hoc querying for data warehouses and batch processing. It is used for analytics across Facebook's data warehouses and specialized data stores. The document outlines Presto's architecture, deployment, usage statistics, features, and enhancements made for specific Facebook use cases including user-facing products, large datasets, and reliable data loading.
Ambry is an open source object store that is responsible for storing all media content at Linkedin. This talk goes over development of Ambry at Linkedin and its architecture to some details.
Architecture for building scalable and highly available Postgres ClusterAshnikbiz
As PostgreSQL has made way into business critical applications, many customers who are using Oracle RAC for high availability and load balancing have asked for similar functionality for using PostgreSQL.
In this Hangout session we would discuss architecture and alternatives, based on real life experience, for achieving high availability and load balancing functionality when you deploy PostgreSQL. We will also present some of the key tools and how to deploy them for effectiveness of this architecture.
Oracle 12c Parallel Execution New FeaturesRandolf Geist
This document discusses new parallel execution features introduced in Oracle 12c. It begins with an introduction to key aspects of parallel execution, including the producer-consumer model and data distribution skew. The document then covers major new 12c features such as hybrid hash distribution, concurrent UNION ALL, and the 1 slave distribution method. It concludes with a question and answer section.
Presto is a distributed SQL query engine that Treasure Data provides as a service. Taro Saito discussed the internals of the Presto service at Treasure Data, including how the TD Presto connector optimizes scan performance from storage systems and how the service manages multi-tenancy and resource allocation for customers. Key challenges in providing a database as a service were also covered, such as balancing cost and performance.
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...PostgreSQL-Consulting
Even an experienced PostgreSQL DBA can not always say that upgrading between major versions of Postgres is an easy task, especially if there are some special requirements, such as downtime limitations or if something goes wrong. For less experienced DBAs anything more complex than dump/restore can be frustrating.
In this talk I will describe why we need a special procedure to upgrade between major versions, how that can be achieved and what sort of problems can occur. I will review all possible ways to upgrade your cluster from classical pg_upgrade to old-school slony or modern methods like logical replication. For all approaches, I will give a brief explanation how it works (limited by the scope of this talk of course), examples how to perform upgrade and some advice on potentially problematic steps. Besides I will touch upon such topics as integration of upgrade tools and procedures with other software — connection brokers, operating system package managers, automation tools, etc. This talk would not be complete if I do not cover cases when something goes wrong and how to deal with such cases.
This document summarizes the speaker's log analysis system that uses Presto. It describes the components of the system in 2015 and how they have been updated in 2016. It also discusses how Presto is used, details about Prestogres, common issues, Presto configuration settings, upgrading Presto, and a new web application called yanagishima that was created for Presto.
Migrating Oracle database to PostgreSQLUmair Mansoob
This document discusses migrating an Oracle database to PostgreSQL. It covers initial discovery of the Oracle database features and data types used. A migration assessment would analyze data type mapping, additional PostgreSQL features, and testing requirements. Challenges include porting PL/SQL code, minimizing downtime during migration, and comprehensive testing of applications on the new PostgreSQL platform. Migrating large data sets and ensuring performance for critical applications are also challenges.
It talks about native compilation technology, why it is required, what it is?
Also how we can apply this technology to compile table and procedure to achieve considerable performance gain with very minimal changes.
This document summarizes recent updates to Presto, including new data types, connectors, syntax, features, functions, and configuration options. Some key additions are support for DECIMAL, VARCHAR, and new data types; connectors for Redis, MongoDB, and other data sources; transaction support; and a variety of new SQL functions for strings, dates, aggregation, and more. Upcoming work includes prepared statements, a new optimizer, and other performance and usability improvements.
Devrim Gunduz gives a presentation on Write-Ahead Logging (WAL) in PostgreSQL. WAL logs all transactions to files called write-ahead logs (WAL files) before changes are written to data files. This allows for crash recovery by replaying WAL files. WAL files are used for replication, backup, and point-in-time recovery (PITR) by replaying WAL files to restore the database to a previous state. Checkpoints write all dirty shared buffers to disk and update the pg_control file with the checkpoint location.
PostgreSQL is one of the most loved databases and that is why AWS could not hold back from offering PostgreSQL as RDS. There are some really nice features in RDS which can be good for DBA and inspiring for Enterprises to build resilient solution with PostgreSQL.
The document summarizes the speaker's use of Presto for log analysis. Key points include:
- Presto was selected due to familiarity from others and ease of use compared to other options.
- Presto is used for batch queries with Hive and interactive queries. Results are accessed through Cognos using Prestogres.
- Managing Presto involves deployment with Ansible, configuration tuning, and monitoring with tools like GrowthForecast and jstat2gf.
- While Presto has been stable overall, the speaker notes some version upgrade issues but sees leverage from its frequent updates.
In the engineering world, we don’t always have the luxury of owning our data pipelines end to end. If only we could influence those outside components… Well, we tried, and this our story - replete with failure, discovery, and the serenity of enlightenment. Join us on our journey as we learned more than we ever wanted to know about compression in different Apache projects, deployed our own ingestion pipeline in Apache Flume, and ultimately unified these in a robust framework built on Apache Apex handling 1 TB of data per day. We end with some reflections on the joys and tribulations of the open source realm and some key lessons for other large applications atop multiple Apache solutions.
This is the presentation used by Umari Shahid or 2nd Quadrant for his Presentation at pgDay Asia 2016. It takes you through usage of TABLESAMPLE clause of SELECT queries introduced in PostgreSQL v9.5.
Your Guide to Streaming - The Engineer's PerspectiveIlya Ganelin
It feels like every week there's a new open-source streaming platform out there. Yet, if you only look at the descriptions, performance metrics, or even the architecture, they all start to look exactly the same! In short, nothing really differentiates itself - whether it be Storm, Flink, Apex, GearPumk, Samza, KafkaStreams, AkkaStreams, or any of the other myriad technologies. So if they all look the same, how do you really pick a streaming platform to solve the problem that YOU have? This talk is about how to really compare these platforms, and it turns out that they do have their key differences, they're just not the ones you usually think about. The way that you need to compare these systems if you're building something to last, a well-engineered system, is to look at how they handle durability, availability, how easy they are to install and use, and how they deal with failures.
1. The presenter discusses their use of Presto for analytics at their company, including joining data across different data sources and using window functions on MySQL data.
2. They explain how they integrate Presto with other tools like re:dash for visualization and Embulk for ETL workflows.
3. While Presto solves many of their problems, they still require some ETL and have encountered issues like large repository sizes and coordinator bottlenecks.
At Noon – The Social Learning Platform, on a daily basis we process close to 100M audio, sketch samples from more than 80K students to help measure the voice & sketch quality of our online classrooms. This talk explores the need for real time analytics in EdTech, how we built a real time analytics platform on Apache Druid & Apache Flink to provide realtime feedback on classroom quality & engagement metrics. We will also share some of the lessons we learnt along the way.
This document discusses benchmarking TPC-H queries in MongoDB compared to MySQL. It introduces MongoDB and describes setting up the TPC-H data by embedding all tables into a single MongoDB collection. Six sample queries are presented and run using Map-Reduce and the Aggregation Framework. Benchmark results show MongoDB performing worse than MySQL on all queries due to data conversion difficulties and MongoDB's immature Aggregation Framework. The document concludes that while MongoDB is suitable for some applications, it is not well-suited to complex queries like those in TPC-H due to its lack of standard query language and server-side processing abilities.
Query Parallelism in PostgreSQL: What's coming next?PGConf APAC
This presentation was presented by Dilip Kumar (a PostgreSQL contributor) at pgDay Asia 2017. The presentation talks about Prallel query features released in v9.6, the infrastructure for the prallel query feature which was built in previous versions and what is the roadmap for prallel query.
Tempto is a product test framework that allows developers to write and execute tests for SQL databases running on Hadoop. Individual test requirements such as data generation, HDFS file copy/storage of generated data and schema creation are expressed declaratively and are automatically fulfilled by the framework. Developers can write tests using Java (using a TestNG like paradigm and AssertJ style assertion) or by providing query files with expected results. We will show how we use it for presto product tests.
Benchto is a benchmark framework that provides an easy and manageable way to define, run and analyze macro benchmarks in clustered environment. Understanding behavior of distributed systems is hard and requires good visibility intostate of the cluster and internals of tested system. This project was developed for repeatable benchmarking ofHadoop SQL engines, most importantly Presto.
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)Martin Traverso
This document summarizes Presto, an analytics engine used at Facebook. It provides ad-hoc querying for data warehouses and batch processing. It is used for analytics across Facebook's data warehouses and specialized data stores. The document outlines Presto's architecture, deployment, usage statistics, features, and enhancements made for specific Facebook use cases including user-facing products, large datasets, and reliable data loading.
Ambry is an open source object store that is responsible for storing all media content at Linkedin. This talk goes over development of Ambry at Linkedin and its architecture to some details.
Architecture for building scalable and highly available Postgres ClusterAshnikbiz
As PostgreSQL has made way into business critical applications, many customers who are using Oracle RAC for high availability and load balancing have asked for similar functionality for using PostgreSQL.
In this Hangout session we would discuss architecture and alternatives, based on real life experience, for achieving high availability and load balancing functionality when you deploy PostgreSQL. We will also present some of the key tools and how to deploy them for effectiveness of this architecture.
Oracle 12c Parallel Execution New FeaturesRandolf Geist
This document discusses new parallel execution features introduced in Oracle 12c. It begins with an introduction to key aspects of parallel execution, including the producer-consumer model and data distribution skew. The document then covers major new 12c features such as hybrid hash distribution, concurrent UNION ALL, and the 1 slave distribution method. It concludes with a question and answer section.
Presto is a distributed SQL query engine that Treasure Data provides as a service. Taro Saito discussed the internals of the Presto service at Treasure Data, including how the TD Presto connector optimizes scan performance from storage systems and how the service manages multi-tenancy and resource allocation for customers. Key challenges in providing a database as a service were also covered, such as balancing cost and performance.
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...PostgreSQL-Consulting
Even an experienced PostgreSQL DBA can not always say that upgrading between major versions of Postgres is an easy task, especially if there are some special requirements, such as downtime limitations or if something goes wrong. For less experienced DBAs anything more complex than dump/restore can be frustrating.
In this talk I will describe why we need a special procedure to upgrade between major versions, how that can be achieved and what sort of problems can occur. I will review all possible ways to upgrade your cluster from classical pg_upgrade to old-school slony or modern methods like logical replication. For all approaches, I will give a brief explanation how it works (limited by the scope of this talk of course), examples how to perform upgrade and some advice on potentially problematic steps. Besides I will touch upon such topics as integration of upgrade tools and procedures with other software — connection brokers, operating system package managers, automation tools, etc. This talk would not be complete if I do not cover cases when something goes wrong and how to deal with such cases.
This document summarizes the speaker's log analysis system that uses Presto. It describes the components of the system in 2015 and how they have been updated in 2016. It also discusses how Presto is used, details about Prestogres, common issues, Presto configuration settings, upgrading Presto, and a new web application called yanagishima that was created for Presto.
Migrating Oracle database to PostgreSQLUmair Mansoob
This document discusses migrating an Oracle database to PostgreSQL. It covers initial discovery of the Oracle database features and data types used. A migration assessment would analyze data type mapping, additional PostgreSQL features, and testing requirements. Challenges include porting PL/SQL code, minimizing downtime during migration, and comprehensive testing of applications on the new PostgreSQL platform. Migrating large data sets and ensuring performance for critical applications are also challenges.
It talks about native compilation technology, why it is required, what it is?
Also how we can apply this technology to compile table and procedure to achieve considerable performance gain with very minimal changes.
This document summarizes recent updates to Presto, including new data types, connectors, syntax, features, functions, and configuration options. Some key additions are support for DECIMAL, VARCHAR, and new data types; connectors for Redis, MongoDB, and other data sources; transaction support; and a variety of new SQL functions for strings, dates, aggregation, and more. Upcoming work includes prepared statements, a new optimizer, and other performance and usability improvements.
Devrim Gunduz gives a presentation on Write-Ahead Logging (WAL) in PostgreSQL. WAL logs all transactions to files called write-ahead logs (WAL files) before changes are written to data files. This allows for crash recovery by replaying WAL files. WAL files are used for replication, backup, and point-in-time recovery (PITR) by replaying WAL files to restore the database to a previous state. Checkpoints write all dirty shared buffers to disk and update the pg_control file with the checkpoint location.
PostgreSQL is one of the most loved databases and that is why AWS could not hold back from offering PostgreSQL as RDS. There are some really nice features in RDS which can be good for DBA and inspiring for Enterprises to build resilient solution with PostgreSQL.
PostgreSQL has evolved from its origins in academic research projects in the 1970s-1980s to a widely used open source database today. It has a large and active user community supporting deployments across industries and organization sizes. The future of PostgreSQL remains bright, as it continues to add new features and performance improvements while maintaining its low cost, flexibility, and reliability advantages over closed source databases. Major areas of focus for ongoing PostgreSQL development include application-specific data types, advanced indexing techniques, and improved single and multi-node scalability.
Security Best Practices for your Postgres DeploymentPGConf APAC
These slides were used by Sameer Kumar of Ashnik for presenting his topic at pgDay Asia 2016. He took audience through some of the security best practices for deploying and hardening PostgreSQL
How to teach an elephant to rock'n'rollPGConf APAC
The document discusses techniques for optimizing PostgreSQL queries, including:
1. Using index only scans to efficiently skip large offsets in queries instead of scanning all rows.
2. Pulling the LIMIT clause under joins and aggregates to avoid processing unnecessary rows.
3. Employing indexes creatively to perform DISTINCT operations by scanning the index instead of the entire table.
4. Optimizing DISTINCT ON queries by looping through authors and returning the latest row for each instead of a full sort.
Introduction to Vacuum Freezing and XIDPGConf APAC
These are slides which were used by Masahiko Sawada of NTT, Japan for his presentation at pgDay Asia. He spoke about internals of VACCUM and XID Wraparound issue of PosgreSQL.
This document discusses using drones and PostgreSQL/PostGIS for agricultural applications. It describes how drones can capture imaging data for tasks like measuring crop health through NDVI analysis. PostgreSQL is useful for organizing the large amounts of drone data, like flight plans, sensor readings, and imagery. The document provides an example of importing this data into PostgreSQL and using PostGIS functions to process imagery, extract waypoints of problem areas, and more.
Swapping Pacemaker Corosync with repmgrPGConf APAC
These slides were used by Wei Shan from GMO GlobalSign while presenting at pgDay Asia 2016. He discussed about challenges with the maintenance of Pacemaker/Corosync HA Clusters and how he migrated over to repmgr. He also did a short demo
Magnus Hagander
PostgreSQL supports several options for securing communications when deployed outside the typical webserver/database combination. This talk will go into some details about the features that make this possible, with some extra focus on the changes in 8.4. The main areas discussed are:
* Securing the channel between client and server using SSL, including an overview of the threats and how to secure against them
* Securing the login process, using LDAP, Kerberos or SSL certificates, including the use of smartcards to log into the database
The talk will not focus on security and access control inside the database once the user is connected and authenticated.
Past, Present, and Future Analysis of the Architectural & Engineering Design ...Lisa Dehner
This report provides a macro view of the architectural and engineering design industry in the U.S. in order to understand strategies employed by both successful and unsuccessful firms over the past 50 years. It also evaluates technological trends that design firms must embrace in order to maintain a competitive edge moving forward into the future.
Java Generics Past, Present and Future - Richard Warburton, Raoul-Gabriel UrmaJAXLondon_Conference
This document summarizes the past, present, and future of generics in Java and other languages. In the past, generics were added to Java to provide compile-time type safety. Presently, Java generics are commonly used with collections but wildcards are used less. Future areas of exploration include intersection types, declaration-site variance, value types, and unbounded wildcards. Generics usage continues to increase in complexity as new language features are added.
The digital universe is huge and is growing at a stellar rate and along with it grows the data generated every second. By 2020, there will be nearly as many digital bits as there are stars in this universe. That effectively means infinite as per the reports published by IDC in 2014. InMobi has grown leaps and bounds globally in past few years and that has only caused the data here to grow exponentially. There are thousands of advertisers and publishers on InMobi network, handling the OLTP ( 200-300 GB ) and OLAP ( 14TB ) demands high availability and the best performance. To ensure the smoothness and 24/7 availability of our production database servers, we are using a lot of open source technologies to keep an eye on all the Postgresql servers running across different data centres. We have one of the biggest Postgresql Master-Slave Streaming Replication production setup and it is very important for us to monitor the database performance, production traffic and some analytics on top of each and every database server @InMobi.
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem...Mark Wong
Fifth presentation in a speaker series sponsored by the Portland State University Computer Science Department. The series covers PostgreSQL performance with an OLTP (on-line transaction processing) workload called Database Test 2 (DBT-2). This presentation goes through results of different hardware RAID configurations to show why it is important to test your own hardware: it might be performing in way you don't expect.
The document discusses achieving PCI compliance when using PostgreSQL for databases. It provides an overview of PCI requirements, how they apply to databases, and how PostgreSQL features like encryption, access control, and logging can help fulfill the requirements. Specific examples are given for how to implement encryption of cardholder data, restrict access according to the principle of least privilege, and maintain regularly updated software in PostgreSQL.
PgDay Asia 2016 - Security Best Practices for your Postgres DeploymentAshnikbiz
Ashnik Database Solution Architect, Sameer Kumar, an Open Source database evangelist talked about the "Security Best Practices for your Postgres Deployment" at the recent pgDAy Asia event held in Singapore in March 2016.
Key areas he presented were:
- Security Model
- Security Features in Postgres
- Securing the access
- Avoiding common attacks
- Access Control and Securing data
- Logging and Auditing
- Patching – OS and PostgreSQL
This document provides an overview of microservices from past to present to future. It discusses how microservices evolved from earlier concepts like SOA and how new technologies like containers and platforms helped popularize microservices. The key aspects of microservices architecture are defined as isolation and flexibility. Current trends include the rise of platforms like Kubernetes and serverless computing. Issues around data management, communication styles, and industry adoption are also covered at a high level.
This presentation explores a broad cross-section of enterprise Postgres deployments to identify key usage patterns and reveals important aspects of performance, scalability, and availability including:
* Challenges organizations encounter most frequently during the stages of database development, deployment and maintenance
* Tuning parameters used most frequently to improve performance of production databases
* Frequently problematic database maintenance processes and configuration parameters
* Most commonly-used database back-up and recovery strategies
These slides were used by Victor from Tantan, a company who provides dating app which is very popular in China. He spoke about a key feature of PostGIS (a Geo-spatial extenion of PostgreSQL), which they used for finding perfect match.
This document provides an overview of five steps to improve PostgreSQL performance: 1) hardware optimization, 2) operating system and filesystem tuning, 3) configuration of postgresql.conf parameters, 4) application design considerations, and 5) query tuning. The document discusses various techniques for each step such as selecting appropriate hardware components, spreading database files across multiple disks or arrays, adjusting memory and disk configuration parameters, designing schemas and queries efficiently, and leveraging caching strategies.
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya KosmodemianskyPostgreSQL-Consulting
This talk is prepared as a bunch of slides, where each slide describes a really bad way people can screw up their PostgreSQL database and provides a weight - how frequently I saw that kind of problem. Right before the talk I will reshuffle the deck to draw ten random slides and explain you why such practices are bad and how to avoid running into them.
At point A, the entire QuerySet of 2,500,000 Order objects would be loaded into memory. This defeats Django's lazy loading and is extremely inefficient. It's better to use QuerySet methods like update() to perform updates without iterating.
1. Table calculations allow you to perform calculations on the fly within Looker reports to add new columns or metrics without modifying the underlying data model. They use Looker expressions rather than LookML.
2. The document provides examples of using mathematical functions like sum and rand, string functions like contains, and date functions like diff_days in table calculations to calculate new metrics like percentages of users by gender, average items delivered per week, and time to delivery.
3. Tips are provided like avoiding sorting on reports with many rows, limiting result sets before aggregation, and hiding table calculations from visualizations.
Driving application development through behavior driven developmentEinar Ingebrigtsen
This document discusses Behavior Driven Development (BDD) and how it can be used to drive application development. It introduces BDD, focusing on behaviors of the system rather than tests. It discusses key aspects of BDD like Gherkin, units, test doubles, writing testable code, frameworks like SpecFlow and recommended reading. The overall message is that BDD changes the way software is developed by shifting the focus to behaviors and improving collaboration.
When it comes to user experience a snappy application beat a glamorous one. Nothing frustrates an end user more than a slow application. Did you know that any wait time greater than one second will break a user's concentration and cause them to feel frustration? How can we create applications to meet user expectations? This class will cover all things performance from design to delivery. We will go over application design, user interface guidelines, caching guidelines, code optimizations, and query optimizations.
This document summarizes Brian Overstreet's talk on scaling Pinterest's monitoring system over time as the company and traffic grew. It describes how Pinterest started with just Ganglia for system metrics and no application metrics. They introduced Graphite but faced challenges with packet loss and metrics being dropped. They then introduced OpenTSDB which users were happier with due to its querying speed. Pinterest developed an agent-based pipeline using Kafka and Storm to address packet loss issues and allow over 1.5 million points per second to be ingested by OpenTSDB. Key lessons included the need to educate users, control incoming metrics, and ensure the monitoring system scales with engineers rather than just site users.
This document discusses approaches for improving Django performance. It notes that front-end performance issues typically account for 80-90% of response time and recommends caching static assets, bundling/minifying assets, and using a CDN. For back-end issues, it recommends profiling views to identify SQL or Python bottlenecks and provides techniques like select_related, prefetch_related, and caching to address different problem areas. The key message is that performance work requires understanding where time is actually being spent before applying optimizations.
Data Integration Basics: Merging & Joining DataSafe Software
Are you tired of dealing with data trapped in silos? Join our upcoming webinar to learn how to efficiently merge and join disparate datasets, transforming your data integration capabilities. This webinar is designed to empower you with the knowledge and skills needed to efficiently integrate data from various sources, allowing you to draw more value from your data.
With FME, merging and joining different types of data—whether it’s spreadsheets, databases, or spatial data—becomes a straightforward process. Our expert presenters will guide you through the essential techniques and best practices.
In this webinar, you will learn:
- Which transformers work best for your specific data types.
- How to merge attributes from multiple datasets into a single output.
- Techniques to automate these processes for greater efficiency.
Don’t miss out on this opportunity to enhance your data integration skills. By the end of this webinar, you’ll have the confidence to break down data silos and integrate your data seamlessly, boosting your productivity and the value of your data.
How to manage a system in which the schema of data cannot be defined “a priori”? How to quickly search for entities whose data is on multiple lines? In this session we are going to address all these issues, historically among the most complex for those who find themselves having to manage yet very common and very delicate with regard to performance. From EAV to Sparse Columns, we'll see all the possible techniques to do it in the best way possible, from a usability, performance and maintenance points of view.
Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users HappyIDERA Software
You can watch the replay for this Geek Sync webcast in the IDERA Resource Center: http://ow.ly/qLFi50A5aPp
Have you ever wondered what it takes to keep an Always On availability group running and the users and administrators who depend on it happy?
Join IDERA and Matt Gordon as he uses his experience maintaining several production Always On Availability Groups as an example to provide you some battle-tested information and hopefully save you some sleepless nights. From security tips to maintenance advice, come hear about some less than obvious tips that will keep users happy and the DBA’s phone quiet. This will be an interactive Geek Sync you will not want to miss.
When setting up a new project we have some tips and tricks to help you do this in the best way possible, incl. infrastructure, database, standard attributes, logging, code alignment, and service center.
Artur Suchwalko “What are common mistakes in Data Science projects and how to...Lviv Startup Club
Common mistakes in data science projects include:
1) Not properly defining the business problem or focusing on optimizing the wrong process.
2) Not adequately preparing the data or understanding how it was generated.
3) Rushing the modeling process or implementation without proper testing.
4) Choosing complex methods or "AI" solutions when simpler approaches may work better.
5) Not involving experienced people or adequately educating the team.
To avoid these mistakes, it is important to carefully analyze the business problem, data, modeling process, and make sure the right people are involved.
Making operations visible - Nick GallbreathDevopsdays
This document provides an overview of a presentation given by Nick Galbreath at DevOpsDays Tokyo 2013 about making operations visible. The presentation encourages organizations to expose more operational metrics and business data through systems like Graphite and StatsD to improve communication and collaboration between teams. It provides examples of how to collect and visualize different types of data from applications, systems, and business processes. The goal is to overcome excuses for lack of visibility and have organizations complete the "One Machine, One Day, One Person Challenge" to start capturing and sharing their key operational and business metrics.
Making operations visible - devopsdays tokyo 2013Nick Galbreath
This document provides an overview of a presentation given by Nick Galbreath at DevOpsDays Tokyo 2013 about making operations visible. The presentation encourages organizations to expose more operational metrics and business data through systems like Graphite and StatsD to improve communication and collaboration between teams. It provides examples of how to collect and visualize different types of data from applications, systems, and business processes. The goal is to overcome excuses for lack of visibility and have organizations complete the "One Machine, One Day, One Person Challenge" to start exposing all of their operational metrics.
It Sounded Good on Paper - Lessons Learned with PuppetJeffery Smith
This talk is a 12 point guide on the things we did wrong during our journey with Puppet. We hope sharing helps people prevent the same mistakes in the future.
Best practices with development of enterprise-scale SharePoint solutions - Pa...SPC Adriatics
This session discusses and shares with you best practices and rules for developing enterprise-scale SharePoint solutions, which need to be highly performant, scalable, and secure. You will learn how to design and create SharePoint solutions capable to support large number of users, and a huge number of transactions. Moreover, you will understand how to tune performances, and will see common dos and don’ts of real SharePoint projects. All the topics and samples will target server-side code and full-trust code solutions in on-premises environment.
This document summarizes Terry Bunio's presentation on breaking and fixing broken data. It begins by thanking sponsors and providing information about Terry Bunio and upcoming SQL events. It then discusses the three types of broken data: inconsistent, incoherent, and ineffectual data. For each type, it provides an example and suggestions on how to identify and fix the issues. It demonstrates how to use tools like Oracle Data Modeler, execution plans, SQL Profiler, and OStress to diagnose problems to make data more consistent, coherent and effective.
Performance modeling provides important insights for capacity planning and system sizing without costly full-scale testing. While sophisticated mathematical modeling was common in the past, today's complex systems are difficult to model formally and existing tools are outdated. However, minimal modeling with common-sense approximations using metrics like resource usage per transaction and hardware capacity can still be useful. Keeping even informal models in mind helps performance engineers understand systems, but complex systems benefit from documenting models. Reviving the art of performance modeling can add value to modern continuous performance testing approaches.
PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...PGConf APAC
Speaker: Rajni Baliyan
As the volume of data of a personal nature and commodification of information collected and analysed increases; so is the focus on privacy and data security. Many countries are examining international and domestic laws in order to protect consumers and organisations alike.
The Australian Senate has recently passed a bill containing mandatory requirements to notify the privacy commissioner and consumers when data is at risk of causing serious harm in the case of a data breach occurring.
Europe has also announced new laws that allow consumers more control over their data. These laws allow consumers to tell companies to erase any data held about them.
These new laws will have a significant impact on organisations that store personal information.
This talk will examine some of these legislative changes and how specific PostgreSQL features can assist organisations in meeting their obligations and avoid heavy fines associated with breaching them.
While the physical replication in PostgreSQL is quite robust, however, it doesn’t fit well in the picture when:
- You need partial replication only
- You want to replicate between different major versions of PostgreSQL
- You need to replicate multiple databases to the same target
- Transformation of the data is needed
- You want to replicate in order to upgrade without downtime
The answer to these use cases is logical replication
This talk will discuss and cover these use cases followed by a logical replication demo.
PGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQLPGConf APAC
This document outlines various ways to contribute to the PostgreSQL open source database project. It discusses that PostgreSQL needs support from individuals and companies to continue developing and competing against commercial databases. Contributing provides benefits like being listed as a contributor or sponsor on PostgreSQL's website. The document then lists several contribution methods like making donations, participating in surveys, providing hardware/infrastructure, helping with documentation, answering user questions, reporting bugs, and writing code in the form of tools, extensions, or patches.
The document discusses implementing centralized authorization in PostgreSQL by synchronizing user roles and privileges with an LDAP server. It provides a step-by-step approach to setting up LDAP authentication in PostgreSQL and using scripts to synchronize user roles and privileges between the database and LDAP based on group membership. The synchronization scripts create roles for each LDAP user, grant privileges to roles based on mapping rules, and handle role inheritance.
PGConf APAC 2018 - A PostgreSQL DBAs Toolbelt for 2018PGConf APAC
There's no need to re-invent the wheel! Dozens of people have already tried...and succeeded. This talk is a categorized and illustrated overview on most popular and/or useful PostgreSQL specific scripts, utilities and whole toolsets that DBAs should be aware of for solving daily tasks. Inlcuding - performance monitoring, logs management/analyzis, identifying/fixing most common adminstration problems around areas of general performance metrics, tuning, locking, indexing, bloat, leaving out high-availability topics. Covered are venerable oldies from wiki.postgresql.org as well as my newer favourites from Github.
Speaker: Alexander Kukushkin
Kubernetes is a solid leader among different cloud orchestration engines and its adoption rate is growing on a daily basis. Naturally people want to run both their applications and databases on the same infrastructure.
There are a lot of ways to deploy and run PostgreSQL on Kubernetes, but most of them are not cloud-native. Around one year ago Zalando started to run HA setup of PostgreSQL on Kubernetes managed by Patroni. Those experiments were quite successful and produced a Helm chart for Patroni. That chart was useful, albeit a single problem: Patroni depended on Etcd, ZooKeeper or Consul.
Few people look forward to deploy two applications instead of one and support them later on. In this talk I would like to introduce Kubernetes-native Patroni. I will explain how Patroni uses Kubernetes API to run a leader election and store the cluster state. I’m going to live-demo a deployment of HA PostgreSQL cluster on Minikube and share our own experience of running more than 130 clusters on Kubernetes.
Patroni is a Python open-source project developed by Zalando in cooperation with other contributors on GitHub: https://github.com/zalando/patroni
PGConf APAC 2018 - High performance json postgre-sql vs. mongodbPGConf APAC
Speakers: Dominic Dwyer & Wei Shan Ang
This talk was presented in Percona Live Europe 2017. However, we did not have enough time to test against more scenario. We will be giving an updated talk with a more comprehensive tests and numbers. We hope to run it against citusDB and MongoRocks as well to provide a comprehensive comparison.
https://www.percona.com/live/e17/sessions/high-performance-json-postgresql-vs-mongodb
PGConf APAC 2018 - Monitoring PostgreSQL at ScalePGConf APAC
Speaker: Lukas Fittl
Your PostgreSQL database is one of the most important pieces of your architecture - yet the level of introspection available in Postgres is often hard to work with. Its easy to get very detailed information, but what should you really watch out for, send reports on and alert on?
In this talk we'll discuss how query performance statistics can be made accessible to application developers, critical entries one should monitor in the PostgreSQL log files, how to collect EXPLAIN plans at scale, how to watch over autovacuum and VACUUM operations, and how to flag issues based on schema statistics.
We'll also talk a bit about monitoring multi-server setups, first going into high availability and read standbys, logical replication, and then reviewing how monitoring looks like for sharded databases like Citus.
The talk will primarily describe free/open-source tools and statistics views readily available from within Postgres.
PGConf APAC 2018 - Where's Waldo - Text Search and Pattern in PostgreSQLPGConf APAC
Speaker: Joe Conway
There are many use cases for text search and pattern matching, and there are also a wide variety of techniques available in PostgreSQL to perform text search and pattern matching. Figuring out the best "match" between use case and technique can be confusing. This talk will review the possibilities and provide guidance regarding when to use what method, and especially how to properly deal with the related index methods to ensure speedy searches. This talk covers:
* The primary available search methods
* Examples illustrating when to use each
* Extensive discussion of index use
* Timing comparisons using realistic examples
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...PGConf APAC
Speaker: Ian Barwick
PostgreSQL and reliability go hand-in-hand - but your data is only truly safe with a solid and trusted backup system in place, and no matter how good your application is, it's useless if it can't talk to your database.
In this talk we'll demonstrate how to set up a reliable replication
cluster using open source tools closely associated with the PostgreSQL project. The talk will cover following areas:
- how to set up and manage a replication cluster with `repmgr`
- how to set up and manage reliable backups with `Barman`
- how to manage failover and application connections with `repmgr` and `PgBouncer`
Ian Barwick has worked for 2ndQuadrant since 2014, and as well as making various contributions to PostgreSQL itself, is lead `repmgr` developer. He lives in Tokyo, Japan.
PGConf APAC 2018 - PostgreSQL HA with Pgpool-II and whats been happening in P...PGConf APAC
Speaker: Muhammad Usama
Pgpool-II has been around to complement PostgreSQL over a decade and provides many features like connection pooling, failover, query caching, load balancing, and HA. High Availability (HA) is very critical to most enterprise application, the clients needs the ability to automatically reconnect with a secondary node when the master nodes goes down.
This is where Pgpool-II watchdog feature comes in, the core feature of Pgpool-II provides HA by eliminating the SPOF is the Watchdog. This watchdog feature has been around for a while but it went through major overhauling and enhancements in recent releases. This talk aims to explain the watchdog feature, the recent enhancements went into the watchdog and describe how it can be used to provide PostgreSQL HA and automatic failover.
Their is rising trend of enterprise deployment shifting to cloud based environment, Pgpool II can be used in the cloud without any issues. In this talk we will give some ideas how Pgpool-II is used to provide PostgreSQL HA in cloud environment.
Finally we will summarise the major features that have been added in the recent major release of Pgpool II and whats in the pipeline for the next major release.
PGConf APAC 2018 - PostgreSQL performance comparison in various cloudsPGConf APAC
Speaker: Oskari Saarenmaa
Aiven PostgreSQL is available in five different public cloud providers' infrastructure in more than 60 regions around the world, including 18 in APAC. This has given us a unique opportunity to benchmark and compare performance of similar configurations in different environments.
We'll share our benchmark methods and results, comparing various PostgreSQL configurations and workloads across different clouds.
This document discusses migrating Oracle databases to EDB Postgres. It outlines the steps to migrate, including assessing the database, preparing the environment, migrating database objects and data, porting applications, testing, integrating, and rolling out the migration. It then provides two case studies of large companies that migrated from Oracle to EDB Postgres to significantly lower costs while still meeting their business and technical requirements.
About a year ago I was caught up in line-of-fire when a production system started behaving abruptly
- A batch process which would finish in 15minutes started taking 1.5 hours
- We started facing OLTP read queries on standby being cancelled
- We faced a sudden slowness on the Primary server and we were forced to do a forceful switch to standby.
We were able to figure out that some peculiarities of the application code and batch process were responsible for this. But we could not fix the application code (as it is packaged application).
In this talk I would like to share more details of how we debugged, what was the problem we were facing and how we applied a work around for it. We also learnt that a query returning in 10minutes may not be as dangerous as a query returning in 10sec but executed 100s of times in an hour.
I will share in detail-
- How to map the process/top stats from OS with pg_stat_activity
- How to get and read explain plan
- How to judge if a query is costly
- What tools helped us
- A peculiar autovacuum/vacuum Vs Replication conflict we ran into
- Various parameters to tune autvacuum and auto-analyze process
- What we have done to work-around the problem
- What we have put in place for better monitoring and information gathering
The document discusses PostgreSQL version 11 and future development. It provides a history of PostgreSQL and its predecessors, describing the development process and community. It summarizes key features committed to version 11, including improvements to partitioning, parallelization, performance and logical replication. It also outlines features proposed for future versions, with a focus on continued enhancements to partitioning and query planning.
This presentation was used by Blair during his talk on Aurora and PostgreSQl compatibility for Aurora at pgDay Asia 2017. The talk was part of dedicated PostgreSQL track at FOSSASIA 2017
These are the slides which were used by Kumar Rajeev Rastogi of Huawei for his presentation at pgDay Asia 2016. He presented great idea about Native Compilation to improve CPU efficiency.
A brief introduction to OpenTelemetry, with a practical example of auto-instrumenting a Java web application with the Grafana stack (Loki, Grafana, Tempo, and Mimir).
AI and Deep Learning with NVIDIA TechnologiesSandeepKS52
Artificial intelligence and deep learning are transforming various fields by enabling machines to learn from data and make decisions. Understanding how to prepare data effectively is crucial, as it lays the foundation for training models that can recognize patterns and improve over time. Once models are trained, the focus shifts to deployment, where these intelligent systems are integrated into real-world applications, allowing them to perform tasks and provide insights based on new information. This exploration of AI encompasses the entire process from initial concepts to practical implementation, highlighting the importance of each stage in creating effective and reliable AI solutions.
Integrating Survey123 and R&H Data Using FMESafe Software
West Virginia Department of Transportation (WVDOT) actively engages in several field data collection initiatives using Collector and Survey 123. A critical component for effective asset management and enhanced analytical capabilities is the integration of Geographic Information System (GIS) data with Linear Referencing System (LRS) data. Currently, RouteID and Measures are not captured in Survey 123. However, we can bridge this gap through FME Flow automation. When a survey is submitted through Survey 123 for ArcGIS Portal (10.8.1), it triggers FME Flow automation. This process uses a customized workbench that interacts with a modified version of Esri's Geometry to Measure API. The result is a JSON response that includes RouteID and Measures, which are then applied to the feature service record.
How the US Navy Approaches DevSecOps with Raise 2.0Anchore
Join us as Anchore's solutions architect reveals how the U.S. Navy successfully approaches the shift left philosophy to DevSecOps with the RAISE 2.0 Implementation Guide to support its Cyber Ready initiative. This session will showcase practical strategies for defense application teams to pivot from a time-intensive compliance checklist and mindset to continuous cyber-readiness with real-time visibility.
Learn how to break down organizational silos through RAISE 2.0 principles and build efficient, secure pipeline automation that produces the critical security artifacts needed for Authorization to Operate (ATO) approval across military environments.
MOVIE RECOMMENDATION SYSTEM, UDUMULA GOPI REDDY, Y24MC13085.pptxMaharshi Mallela
Movie recommendation system is a software application or algorithm designed to suggest movies to users based on their preferences, viewing history, or other relevant factors. The primary goal of such a system is to enhance user experience by providing personalized and relevant movie suggestions.
AI-Powered Compliance Solutions for Global Regulations | Certivocertivoai
Certivo offers AI-powered compliance solutions designed to help businesses in the USA, EU, and UK simplify complex regulatory demands. From environmental and product compliance to safety, quality, and sustainability, our platform automates supplier documentation, manages certifications, and integrates with ERP/PLM systems. Ensure seamless RoHS, REACH, PFAS, and Prop 65 compliance through predictive insights and multilingual support. Turn compliance into a competitive edge with Certivo’s intelligent, scalable, and audit-ready platform.
Who will create the languages of the future?Jordi Cabot
Will future languages be created by language engineers?
Can you "vibe" a DSL?
In this talk, we will explore the changing landscape of language engineering and discuss how Artificial Intelligence and low-code/no-code techniques can play a role in this future by helping in the definition, use, execution, and testing of new languages. Even empowering non-tech users to create their own language infrastructure. Maybe without them even realizing.
Women in Tech: Marketo Engage User Group - June 2025 - AJO with AWSBradBedford3
Creating meaningful, real-time engagement across channels is essential to building lasting business relationships. Discover how AWS, in collaboration with Deloitte, set up one of Adobe's first instances of Journey Optimizer B2B Edition to revolutionize customer journeys for B2B audiences.
This session will share the use cases the AWS team has the implemented leveraging Adobe's Journey Optimizer B2B alongside Marketo Engage and Real-Time CDP B2B to deliver unified, personalized experiences and drive impactful engagement.
They will discuss how they are positioning AJO B2B in their marketing strategy and how AWS is imagining AJO B2B and Marketo will continue to work together in the future.
Whether you’re looking to enhance customer journeys or scale your B2B marketing efforts, you’ll leave with a clear view of what can be achieved to help transform your own approach.
Speakers:
Britney Young Senior Technical Product Manager, AWS
Erine de Leeuw Technical Product Manager, AWS
Generative Artificial Intelligence and its ApplicationsSandeepKS52
The exploration of generative AI begins with an overview of its fundamental concepts, highlighting how these technologies create new content and ideas by learning from existing data. Following this, the focus shifts to the processes involved in training and fine-tuning models, which are essential for enhancing their performance and ensuring they meet specific needs. Finally, the importance of responsible AI practices is emphasized, addressing ethical considerations and the impact of AI on society, which are crucial for developing systems that are not only effective but also beneficial and fair.
In today's world, artificial intelligence (AI) is transforming the way we learn.
This talk will explore how we can use AI tools to enhance our learning experiences, by looking at some (recent) research that has been done on the matter.
But as we embrace these new technologies, we must also ask ourselves:
Are we becoming less capable of thinking for ourselves?
Do these tools make us smarter, or do they risk dulling our critical thinking skills?
This talk will encourage us to think critically about the role of AI in our education. Together, we will discover how to use AI to support our learning journey while still developing our ability to think critically.
Agentic Techniques in Retrieval-Augmented Generation with Azure AI SearchMaxim Salnikov
Discover how Agentic Retrieval in Azure AI Search takes Retrieval-Augmented Generation (RAG) to the next level by intelligently breaking down complex queries, leveraging full conversation history, and executing parallel searches through a new LLM-powered query planner. This session introduces a cutting-edge approach that delivers significantly more accurate, relevant, and grounded answers—unlocking new capabilities for building smarter, more responsive generative AI applications.
Traditional Retrieval-Augmented Generation (RAG) pipelines work well for simple queries—but when users ask complex, multi-part questions or refer to previous conversation history, they often fall short. That’s where Agentic Retrieval comes in: a game-changing advancement in Azure AI Search that brings LLM-powered reasoning directly into the retrieval layer.
This session unveils how agentic techniques elevate your RAG-based applications by introducing intelligent query planning, subquery decomposition, parallel execution, and result merging—all orchestrated by a new Knowledge Agent. You’ll learn how this approach significantly boosts relevance, groundedness, and answer quality, especially for sophisticated enterprise use cases.
Key takeaways:
- Understand the evolution from keyword and vector search to agentic query orchestration
- See how full conversation context improves retrieval accuracy
- Explore measurable improvements in answer relevance and completeness (up to 40% gains!)
- Get hands-on guidance on integrating Agentic Retrieval with Azure AI Foundry and SDKs
- Discover how to build scalable, AI-first applications powered by this new paradigm
Whether you're building intelligent copilots, enterprise Q&A bots, or AI-driven search solutions, this session will equip you with the tools and patterns to push beyond traditional RAG.
Application Modernization with Choreo - The AI-Native Internal Developer Plat...WSO2
In this slide deck, we explore the challenges and best practices of application modernization. We also deep dive how an internal developer platform as a service like Choreo can fast track your modernization journey with AI capabilities and end-to-end workflow automation.
2. Best practices are just boring
• Never follow them, try worst practices
• Only those practices can really help you to screw the things up
most effectively
• PostgreSQL consultants are nice people, so try to make them
happy
3. 1. Use as many count(*) as you can
• Figure 301083021830123921 is very informative for the end
user
• If it changes in a second to 30108302894839434020, it is still
informative
• select count(*) from sometable is a quite light query
• Tuple estimation from pg_catalog can never be precise
enough for you
4. 2. Try to create as many indexes as you can
• Indexes consume no disk space
• Indexes consume no shared_bufers
• There is no overhead on DML if one and every column in a
table covered with bunch of indexes
• Optimizer will definitely choose your index once you created it
• Keep calm and create more indexes
5. 3. Turn autovacuum off
• It is quite auxiliary process, you can easily stop it
• There is no problem at all to have 100Gb data in a database
which is 1Tb in size
• 2-3Tb RAM servers are cheap, IO is a fastest thing in modern
computing
• Besides of that, everyone likes BigData
6. 4. Reinvent Slony
• If you need some data replication to another database, try to
implement it from scratch
• That allows you to run into all problems, PostgreSQL have
had since introducing Slony
7. 5. Move joins to your application
• Just select * a couple of tables into the application written in
your favorite programming language
• Than join them at the application level
• Now you only need to implement nested loop join, hash join
and merge join as well as query optimizer and page cache
8. 6. Never use graphical monitoring
• You do not need graphs
• Because it is an easy task to guess what was happened
yesterday at 2 a.m. using command line and grep only
9. 7. Never use Foreign Keys
• Consistency control at application level always works as
expected
• You will never get data inconsistency without constraints
10. 8. Always use text type for all columns
• It is always fun to reimplement date or ip validation in your
code
• You will never mistaken converting ”12-31-2015 03:01AM” to
”15:01 12 of undef 2015” using textfields