SlideShare a Scribd company logo
Federated
PostgreSQL
Who Am I?
●

Jim Mlodgenski
–
–

●

jimm@openscg.com
@jim_mlodgenski

Co-organizer of
–
–

●

NYC PUG (www.nycpug.org)
Philly PUG (www.phlpug.org)

CTO, OpenSCG
–

www.openscg.com
http://nyc.pgconf.us
What is a federated database?
“A federated database system is a type of meta-database
management system (DBMS), which transparently maps
multiple autonomous database systems into a single federated
database. The constituent databases are interconnected via a
computer network and may be geographically decentralized. ...
There is no actual data integration in the constituent disparate
databases as a result of data federation.”
-Wikipedia
How does PostgreSQL do it?
●

Uses Foreign Table Wrappers (FDW)

●

Used with SQL/MED
–
–

Management of External Data

–
●

New ANIS SQL 2003 Extension
Standard way of handling remote objects in SQL databases

Wrappers used by SQL/MED to access remotes data
sources
Types of Foreign Data Wrappers
●

SQL

●

NoSQL

●

File

●

Miscellaneous

●

PostgreSQL
SQL Wrappers
●

Oracle

●

SQLite

●

MySQL

●

JDBC

●

Informix

●

ODBC

●

Firebird
SQL Wrappers
CREATE SERVER oracle_server FOREIGN DATA WRAPPER
oracle_fdw OPTIONS (dbserver 'ORACLE_DBNAME');
CREATE USER MAPPING FOR CURRENT_USER
SERVER oracle_server
OPTIONS (user 'scott', password 'tiger');
CREATE FOREIGN TABLE fdw_test (
userid

numeric,

username

text,

email

text

)
SERVER oracle_server
OPTIONS ( schema 'scott', table 'fdw_test');
postgres=# select * from fdw_test;
userid | username |

email

--------+----------+------------------1 | scott
(1 row)

| scott@oracle.com
NoSQL Wrappers
●

MongoDB

●

Redis

●

CouchDB

●

Neo4j

●

MonetDB

●

Tycoon
NoSQL Wrappers
CREATE SERVER mongo_server FOREIGN DATA WRAPPER
mongo_fdw OPTIONS (address '192.168.122.47', port '27017');
CREATE FOREIGN TABLE databases (
_id NAME,
name TEXT
)
SERVER mongo_server
OPTIONS (database 'mydb', collection 'pgData');
test=# select * from databases ;
_id

|

name

--------------------------+-----------52fd49bfba3ae4ea54afc459 | mongo
52fd49bfba3ae4ea54afc45a | postgresql
52fd49bfba3ae4ea54afc45b | oracle
52fd49bfba3ae4ea54afc45c | mysql
52fd49bfba3ae4ea54afc45d | redis
52fd49bfba3ae4ea54afc45e | db2
(6 rows)
File Wrappers
●

Delimited files

●

Fixed length files

●

JSON files
File Wrappers
CREATE SERVER pg_load FOREIGN DATA WRAPPER file_fdw;
CREATE FOREIGN TABLE leads (
first_name text, last_name text,
company_name text, address text,
city text, county text,
state text, zip text,
phone1 text, phone2 text,
email text, web text
) SERVER pg_load
OPTIONS ( filename '/tmp/us-500.csv', format 'csv', header 'TRUE' );
test=# select first_name || ' ' || last_name as full_name, email from leads limit 3;
full_name

|

email

-------------------+------------------------------James Butt

| jbutt@gmail.com

Josephine Darakjy | josephine_darakjy@darakjy.org
Art Venere
(3 rows)

| art@venere.org
Miscellaneous Wrappers
●

Hadoop

●

LDAP

●

S3

●

WWW

●

PG-Strom
Hadoop Wrapper
CREATE SERVER hive_server FOREIGN DATA WRAPPER
hive_fdw OPTIONS (address '127.0.0.1', port '10000');
CREATE USER MAPPING

FOR PUBLIC SERVER hive_server;

CREATE FOREIGN TABLE order_line (
ol_w_id

integer,

ol_d_id

integer,

ol_o_id

integer,

ol_number

integer,

ol_i_id

integer,

ol_delivery_d

timestamp,

ol_amount

decimal(6,2),

ol_supply_w_id

integer,

ol_quantity

decimal(2,0),

ol_dist_info

varchar(24)

) SERVER hive_server OPTIONS (table 'order_line');
INSERT INTO item_sale_month
SELECT ol_i_id as i_id,
EXTRACT(YEAR FROM ol_delivery_d) as year,
EXTRACT(MONTH FROM ol_delivery_d) as month,
sum(ol_amount) as amount
FROM order_line
GROUP BY 1, 2, 3;
Hadoop Wrapper
●

Hadoop foreign tables can also be writable
CREATE FORIEGN TABLE audit (
audit_id

bigint,

event_d

timestamp,

table

varchar,

action

varchar,

user

varchar,

) SERVER hive_server
OPTIONS (table 'audit',
flume_port '44444');
INSERT INTO audit
VALUES (nextval('audit_id_seq'), now(), 'users', 'SELECT', 'scott');
Hadoop Wrapper
●

It also works with HBase tables
CREATE FOREIGN TABLE hive_hbase_table (
key

varchar,

value varchar
) SERVER localhive
OPTIONS (table 'hbase_table', hbase_address 'localhost',
hbase_port '9090', hbase_mapping ':key,cf:val');
INSERT INTO hive_hbase_table VALUES ('key1', 'value1');
INSERT INTO hive_hbase_table VALUES ('key2', 'value2');
UPDATE hive_hbase_table SET value = 'update' WHERE key = 'key2';
DELETE FROM hive_hbase_table WHERE key='key1';
SELECT * from hive_hbase_table;
WWW Wrapper
CREATE SERVER www_fdw_server_google_search FOREIGN DATA WRAPPER www_fdw
OPTIONS (uri 'https://ajax.googleapis.com/ajax/services/search/web?v=1.0');
CREATE USER MAPPING FOR current_user SERVER www_fdw_server_google_search;
CREATE FOREIGN TABLE www_fdw_google_search (
q text, GsearchResultClass text, unescapedUrl text, url text,
visibleUrl text, cacheUrl text, title text, titleNoFormatting text, content text
) SERVER www_fdw_server_google_search;
select url,substring(title,1,25)||'...',substring(content,1,25)||'...'
from www_fdw_google_search where q='postgresql fdw';
url

|

?column?

|

?column?

-------------------------------------------------------------+------------------------------+-----------------------------http://wiki.postgresql.org/wiki/Foreign_data_wrappers

| Foreign data wrappers - <... | Jan 24, 2014 <b>...</b> 1...

http://www.postgresql.org/docs/9.3/static/postgres-fdw.html | <b>PostgreSQL</b>: Docume... | F.31.1. <b>FDW</b> Option...
http://www.postgresql.org/docs/9.3/static/fdwhandler.html

| <b>PostgreSQL</b>: Docume... | Foreign Data Wrapper Call...

http://www.craigkerstiens.com/2013/08/05/a-look-at-FDWs/

| A look at Foreign Data Wr... | Aug 5, 2013 <b>...</b> An...

(4 rows)
PostgreSQL Wrapper
●

The most functional FDW by far

●

Replaces much of the functionality of dblink

●

Shipped as a contrib module
PostgreSQL Wrapper
CREATE SERVER postgres_server FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'localhost', port '5432', dbname 'test2');
CREATE USER MAPPING FOR PUBLIC SERVER postgres_server;
CREATE FOREIGN TABLE bird_strikes (
aircraft_type varchar, airport varchar, altitude varchar, aircraft_model varchar,
num_wildlife_struck varchar, impact_to_flight varchar, effect varchar,
location varchar, flight_num varchar, flight_date timestamp,
record_id int, indicated_damage varchar, freeform_en_route varchar, num_engines varchar,
airline varchar, origin_state varchar, phase_of_flight varchar, precipitation varchar,
wildlife_collected boolean, wildlife_sent_to_smithsonian boolean, remarks varchar,
reported_date timestamp, wildlife_size varchar, sky_conditions varchar, wildlife_species varchar,
when_time_hhmm varchar, time_of_day varchar, pilot_warned varchar,
cost_out_of_service varchar, cost_other varchar, cost_repair varchar, cost_total varchar,
miles_from_airport varchar, feet_above_ground varchar, num_human_fatalities integer,
num_injured integer, speed_knots varchar
) SERVER postgres_server OPTIONS (table_name 'bird_strikes');
PostgreSQL Wrapper
●

Only requests columns that are needed
test=# explain verbose select airport, flight_date from bird_strikes;
QUERY PLAN
------------------------------------------------------------------------------Foreign Scan on public.bird_strikes

(cost=100.00..148.40 rows=1280 width=40)

Output: airport, flight_date
Remote SQL: SELECT airport, flight_date FROM public.bird_strikes
(3 rows)
PostgreSQL Wrapper
●

Sends a WHERE clause
test=# explain verbose select airport, flight_date from
bird_strikes where flight_date > '2011-01-01';
QUERY PLAN
-----------------------------------------------------------------Foreign Scan on public.bird_strikes
rows=427 width=40)

(cost=100.00..134.54

Output: airport, flight_date
Remote SQL: SELECT airport, flight_date FROM
public.bird_strikes WHERE ((flight_date > '2011-01-01
00:00:00'::timestamp without time zone))
(3 rows)
PostgreSQL Wrapper
●

Sends built-in immutable functions
test=# explain verbose select airport, flight_date from bird_strikes where flight_date
> '2011-01-01' and length(airport) < 10;
QUERY PLAN
------------------------------------------------------------------------------Foreign Scan on public.bird_strikes

(cost=100.00..135.24 rows=142 width=40)

Output: airport, flight_date
Remote SQL: SELECT airport, flight_date FROM public.bird_strikes WHERE ((flight_date
> '2011-01-01 00:00:00'::timestamp without time zone)) AND ((length(airport) < 10))
(3 rows)
PostgreSQL Wrapper
●

Writable (INSERT, UPDATE, DELETE)
test=# explain verbose update bird_strikes set airport = 'Unknown' where record_id = 313339;
QUERY PLAN
------------------------------------------------------------------------------Update on public.bird_strikes

(cost=100.00..111.05 rows=1 width=964)

Remote SQL: UPDATE public.bird_strikes SET airport = $2 WHERE ctid = $1
->

Foreign Scan on public.bird_strikes

(cost=100.00..111.05 rows=1 width=964)

Output: aircraft_type, 'Unknown'::character varying, altitude, aircraft_model, num_wildlife_struck,
impact_to_flight, effect, location, flight_num, flight_date, record_id, indicated_damage, freefo
rm_en_route, num_engines, airline, origin_state, phase_of_flight, precipitation, wildlife_collected,
wildlife_sent_to_smithsonian, remarks, reported_date, wildlife_size, sky_conditions, wildlife_species, w
hen_time_hhmm, time_of_day, pilot_warned, cost_out_of_service, cost_other, cost_repair, cost_total, miles_from_airport,
feet_above_ground, num_human_fatalities, num_injured, speed_knots, ctid
Remote SQL: SELECT aircraft_type, altitude, aircraft_model, num_wildlife_struck, impact_to_flight, effect,
location, flight_num, flight_date, record_id, indicated_damage, freeform_en_route, num_en
gines, airline, origin_state, phase_of_flight, precipitation, wildlife_collected, wildlife_sent_to_smithsonian, remarks,
reported_date, wildlife_size, sky_conditions, wildlife_species, when_time_hhmm, time
_of_day, pilot_warned, cost_out_of_service, cost_other, cost_repair, cost_total, miles_from_airport, feet_above_ground,
num_human_fatalities, num_injured, speed_knots, ctid FROM public.bird_strikes WHERE (
(record_id = 313339)) FOR UPDATE
(5 rows)
PostgreSQL Wrapper
●

Writes are transactional
test=# select airport from bird_strikes where record_id = 313339;
airport
--------Unknown
(1 row)
test=# BEGIN;
BEGIN
test=# update bird_strikes set airport = 'UNKNOWN' where record_id = 313339;
UPDATE 1
test=# ROLLBACK;
ROLLBACK
test=# select airport from bird_strikes where record_id = 313339;
airport
--------Unknown
(1 row)
Limitations
●

Aggregates are not pushed down
test=# explain verbose select count(*) from bird_strikes;
QUERY PLAN
--------------------------------------------------------------------------------------------------------Aggregate

(cost=220.92..220.93 rows=1 width=0)

Output: count(*)
->

Foreign Scan on public.bird_strikes

(cost=100.00..212.39 rows=3413 width=0)

Output: aircraft_type, airport, altitude, aircraft_model, num_wildlife_struck, impact_to_flight, effect,
location, flight_num, flight_date, record_id, indicated_damage, freeform_en_route, num_engi
nes, airline, origin_state, phase_of_flight, precipitation, wildlife_collected, wildlife_sent_to_smithsonian,
remarks, reported_date, wildlife_size, sky_conditions, wildlife_species, when_time_hhmm, time_o
f_day, pilot_warned, cost_out_of_service, cost_other, cost_repair, cost_total, miles_from_airport,
feet_above_ground, num_human_fatalities, num_injured, speed_knots
Remote SQL: SELECT NULL FROM public.bird_strikes
(5 rows)
Limitations
●

ORDER BY, GROUP BY, LIMIT not pushed down
test=# explain verbose select flight_num from bird_strikes order by flight_date limit 5;
QUERY PLAN
------------------------------------------------------------------------------------------Limit

(cost=169.66..169.67 rows=5 width=40)

Output: flight_num, flight_date
->

Sort

(cost=169.66..172.86 rows=1280 width=40)

Output: flight_num, flight_date
Sort Key: bird_strikes.flight_date
->

Foreign Scan on public.bird_strikes

(cost=100.00..148.40 rows=1280 width=40)

Output: flight_num, flight_date
Remote SQL: SELECT flight_num, flight_date FROM public.bird_strikes
(8 rows)
Limitations
●

Joins not pushed down
test=# explain verbose select s.name, b.flight_date
test-# from bird_strikes b, state_code s
test-# where b.location = s.abbreviation and flight_date > '2011-01-01';
QUERY PLAN
------------------------------------------------------------------------------Hash Join

(cost=239.88..349.95 rows=1986 width=40)

Output: s.name, b.flight_date
Hash Cond: ((s.abbreviation)::text = (b.location)::text)
->

Foreign Scan on public.state_code s

(cost=100.00..137.90 rows=930 width=64)

Output: s.id, s.name, s.abbreviation, s.country, s.type, s.sort, s.status, s.occupied, s.notes, s.fips_state, s.assoc_press,
s.standard_federal_region, s.census_region, s.census_region_name, s.cen
sus_division, s.census_devision_name, s.circuit_court
Remote SQL: SELECT name, abbreviation FROM public.state_code
->

Hash

(cost=134.54..134.54 rows=427 width=40)

Output: b.flight_date, b.location
->

Foreign Scan on public.bird_strikes b

(cost=100.00..134.54 rows=427 width=40)

Output: b.flight_date, b.location
Remote SQL: SELECT location, flight_date FROM public.bird_strikes WHERE ((flight_date > '2011-01-01 00:00:00'::timestamp
without time zone))
(11 rows)
Limitations (Gotcha)
●

Sometimes the foreign tables don't act like tables
test=# SELECT l.*, w.lat, w.lng
FROM leads l, www_fdw_geocoder_google w
WHERE w.address = l.address || ',' || l.city || ',' || l.state;
first_name | last_name | company_name | address | city | county |
state | zip | phone1 | phone2 | email | web | lat | lng
------------+-----------+--------------+---------+------+-------+-------+-----+--------+--------+-------+-----+-----+----(0 rows)
Limitations (Gotcha)
QUERY PLAN
------------------------------------------------------------------------------------------Merge Join

(cost=187.47..215.47 rows=1000 width=448)

Output: l.first_name, l.last_name, l.company_name, l.address, l.city, l.county, l.state, l.zip, l.phone1, l.phone2, l.email, l.web, w.lat,
w.lng
Merge Cond: ((((((l.address || ','::text) || l.city) || ','::text) || l.state)) = w.address)
->

Sort

(cost=37.64..38.14 rows=200 width=384)

Output: l.first_name, l.last_name, l.company_name, l.address, l.city, l.county, l.state, l.zip, l.phone1, l.phone2, l.email, l.web,
(((((l.address || ','::text) || l.city) || ','::text) || l.state
))
Sort Key: (((((l.address || ','::text) || l.city) || ','::text) || l.state))
->

Foreign Scan on public.leads l

(cost=0.00..30.00 rows=200 width=384)

Output: l.first_name, l.last_name, l.company_name, l.address, l.city, l.county, l.state, l.zip, l.phone1, l.phone2, l.email,
l.web, ((((l.address || ','::text) || l.city) || ','::text) || l.
state)
Foreign File: /tmp/us-500.csv
Foreign File Size: 81485
->

Sort

(cost=149.83..152.33 rows=1000 width=96)

Output: w.lat, w.lng, w.address
Sort Key: w.address
->

Foreign Scan on public.www_fdw_geocoder_google w
Output: w.lat, w.lng, w.address
WWW API: Request

(16 rows)

(cost=0.00..100.00 rows=1000 width=96)
Limitations (Gotcha)
CREATE OR REPLACE FUNCTION google_geocode(
OUT first_name text, OUT last_name text, OUT company_name text, OUT address text, OUT city text, OUT county text,
OUT state text, OUT zip text, OUT phone1 text, OUT phone2 text, OUT email text, OUT web text, OUT lat text, OUT lng text)
RETURNS SETOF RECORD AS $$
DECLARE
r

record;

f_adr text;
l_lat text;
l_lng text;
BEGIN
FOR r IN SELECT * FROM leads LOOP
f_adr := r.address || ',' || r.city || ',' || r.state;
EXECUTE 'SELECT lat, lng FROM www_fdw_geocoder_google WHERE address = $1'
INTO l_lat, l_lng
USING f_adr;
SELECT

r.first_name, r.last_name, r.company_name, r.address, r.city, r.county, r.state, r.zip,
r.phone1, r.phone2, r.email, r.web, l_lat, l_lng

INTO first_name, last_name, company_name, address, city, county, state, zip,
phone1, phone2, email, web, lat, lng;
RETURN NEXT;
END LOOP;
END $$ LANGUAGE plpgsql;
Writing a new FDW
●

Might not need to write one if there is a http interface

●

Use the Blackhole as a template
–

https://bitbucket.org/adunstan/blackhole_fdw
Writing a new FDW
Datum blackhole_fdw_handler(PG_FUNCTION_ARGS){
...
/* these are required */
fdwroutine->GetForeignRelSize = blackholeGetForeignRelSize;
fdwroutine->GetForeignPaths = blackholeGetForeignPaths;
fdwroutine->GetForeignPlan = blackholeGetForeignPlan;
fdwroutine->BeginForeignScan = blackholeBeginForeignScan;
fdwroutine->IterateForeignScan = blackholeIterateForeignScan;
fdwroutine->ReScanForeignScan = blackholeReScanForeignScan;
fdwroutine->EndForeignScan = blackholeEndForeignScan;
/* remainder are optional - use NULL if not required */
/* support for insert / update / delete */
fdwroutine->AddForeignUpdateTargets = blackholeAddForeignUpdateTargets;
fdwroutine->PlanForeignModify = blackholePlanForeignModify;
fdwroutine->BeginForeignModify = blackholeBeginForeignModify;
fdwroutine->ExecForeignInsert = blackholeExecForeignInsert;
fdwroutine->ExecForeignUpdate = blackholeExecForeignUpdate;
fdwroutine->ExecForeignDelete = blackholeExecForeignDelete;
fdwroutine->EndForeignModify = blackholeEndForeignModify;
/* support for EXPLAIN */
fdwroutine->ExplainForeignScan = blackholeExplainForeignScan;
fdwroutine->ExplainForeignModify = blackholeExplainForeignModify;
/* support for ANALYSE */
fdwroutine->AnalyzeForeignTable = blackholeAnalyzeForeignTable;
PG_RETURN_POINTER(fdwroutine);
}
Future
●

Even more Wrappers

●

Check Constraints on Foreign Tables
–

●

Allows partitioning

Joins
–

Custom Scan API
●

Probably will not be the way to do this, but progress being made
Questions?
jimm@openscg.com
@jim_mlodgenski

More Related Content

What's hot (20)

The Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication Tutorial
Jean-François Gagné
 
Solving PostgreSQL wicked problems
Solving PostgreSQL wicked problems
Alexander Korotkov
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
DataStax Academy
 
Observability with HAProxy
Observability with HAProxy
HAProxy Technologies
 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs
PGConf APAC
 
MongoDB Performance Tuning
MongoDB Performance Tuning
Puneet Behl
 
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
DataWorks Summit
 
Monitoring Gengo using Saas
Monitoring Gengo using Saas
Yosuke Tomita
 
Reducing Risk When Upgrading MySQL
Reducing Risk When Upgrading MySQL
Kenny Gryp
 
Oracle Performance Tuning Fundamentals
Oracle Performance Tuning Fundamentals
Enkitec
 
Highly efficient backups with percona xtrabackup
Highly efficient backups with percona xtrabackup
Nilnandan Joshi
 
Indexes in postgres
Indexes in postgres
Louise Grandjonc
 
PostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability Methods
Mydbops
 
MongoDB Backup & Disaster Recovery
MongoDB Backup & Disaster Recovery
Elankumaran Srinivasan
 
Oracle Goldengate for Big Data - LendingClub Implementation
Oracle Goldengate for Big Data - LendingClub Implementation
Vengata Guruswamy
 
OLTP+OLAP=HTAP
OLTP+OLAP=HTAP
EDB
 
MongoDB Database Replication
MongoDB Database Replication
Mehdi Valikhani
 
Incremental Processing on Large Analytical Datasets with Prasanna Rajaperumal...
Incremental Processing on Large Analytical Datasets with Prasanna Rajaperumal...
Databricks
 
Mastering PostgreSQL Administration
Mastering PostgreSQL Administration
EDB
 
Docker Networking - Common Issues and Troubleshooting Techniques
Docker Networking - Common Issues and Troubleshooting Techniques
Sreenivas Makam
 
The Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication Tutorial
Jean-François Gagné
 
Solving PostgreSQL wicked problems
Solving PostgreSQL wicked problems
Alexander Korotkov
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
DataStax Academy
 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs
PGConf APAC
 
MongoDB Performance Tuning
MongoDB Performance Tuning
Puneet Behl
 
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
DataWorks Summit
 
Monitoring Gengo using Saas
Monitoring Gengo using Saas
Yosuke Tomita
 
Reducing Risk When Upgrading MySQL
Reducing Risk When Upgrading MySQL
Kenny Gryp
 
Oracle Performance Tuning Fundamentals
Oracle Performance Tuning Fundamentals
Enkitec
 
Highly efficient backups with percona xtrabackup
Highly efficient backups with percona xtrabackup
Nilnandan Joshi
 
PostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability Methods
Mydbops
 
Oracle Goldengate for Big Data - LendingClub Implementation
Oracle Goldengate for Big Data - LendingClub Implementation
Vengata Guruswamy
 
OLTP+OLAP=HTAP
OLTP+OLAP=HTAP
EDB
 
MongoDB Database Replication
MongoDB Database Replication
Mehdi Valikhani
 
Incremental Processing on Large Analytical Datasets with Prasanna Rajaperumal...
Incremental Processing on Large Analytical Datasets with Prasanna Rajaperumal...
Databricks
 
Mastering PostgreSQL Administration
Mastering PostgreSQL Administration
EDB
 
Docker Networking - Common Issues and Troubleshooting Techniques
Docker Networking - Common Issues and Troubleshooting Techniques
Sreenivas Makam
 

Similar to Postgresql Federation (20)

Postgres Conference (PgCon) New York 2019
Postgres Conference (PgCon) New York 2019
Ibrar Ahmed
 
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree
Ashnikbiz
 
PostgreSQL 9.5 Foreign Data Wrappers
PostgreSQL 9.5 Foreign Data Wrappers
Nicholas Kiraly
 
Really Big Elephants: PostgreSQL DW
Really Big Elephants: PostgreSQL DW
PostgreSQL Experts, Inc.
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDW
Jonathan Katz
 
Leveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL Environment
Jim Mlodgenski
 
One Database To Rule 'em All
One Database To Rule 'em All
Stefanie Janine Stölting
 
Foreign Data Wrapper Enhancements
Foreign Data Wrapper Enhancements
Shigeru Hanada
 
02 20180605 meetup_fdw_v1
02 20180605 meetup_fdw_v1
Frederic Bamiere
 
Meet the-other-elephant
Meet the-other-elephant
Stefanie Janine Stölting
 
PostgreSQL 10: What to Look For
PostgreSQL 10: What to Look For
Amit Langote
 
SQL/MED: Doping for PostgreSQL
SQL/MED: Doping for PostgreSQL
Peter Eisentraut
 
Building Hybrid data cluster using PostgreSQL and MongoDB
Building Hybrid data cluster using PostgreSQL and MongoDB
Ashnikbiz
 
Cjoin
Cjoin
blogboy
 
Postgres.foreign.data.wrappers.2015
Postgres.foreign.data.wrappers.2015
EDB
 
PGDay.Amsterdam 2018 - Stefanie Stoelting - PostgreSQL As Data Integration Tool
PGDay.Amsterdam 2018 - Stefanie Stoelting - PostgreSQL As Data Integration Tool
PGDay.Amsterdam
 
PostgreSQL As Data Integration Tool
PostgreSQL As Data Integration Tool
Stefanie Janine Stölting
 
Relational Database Access with Python ‘sans’ ORM
Relational Database Access with Python ‘sans’ ORM
Mark Rees
 
Amazon RDS for PostgreSQL: What's New and Lessons Learned - NY 2017
Amazon RDS for PostgreSQL: What's New and Lessons Learned - NY 2017
Grant McAlister
 
Writing A Foreign Data Wrapper
Writing A Foreign Data Wrapper
psoo1978
 
Postgres Conference (PgCon) New York 2019
Postgres Conference (PgCon) New York 2019
Ibrar Ahmed
 
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree
Ashnikbiz
 
PostgreSQL 9.5 Foreign Data Wrappers
PostgreSQL 9.5 Foreign Data Wrappers
Nicholas Kiraly
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDW
Jonathan Katz
 
Leveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL Environment
Jim Mlodgenski
 
Foreign Data Wrapper Enhancements
Foreign Data Wrapper Enhancements
Shigeru Hanada
 
PostgreSQL 10: What to Look For
PostgreSQL 10: What to Look For
Amit Langote
 
SQL/MED: Doping for PostgreSQL
SQL/MED: Doping for PostgreSQL
Peter Eisentraut
 
Building Hybrid data cluster using PostgreSQL and MongoDB
Building Hybrid data cluster using PostgreSQL and MongoDB
Ashnikbiz
 
Postgres.foreign.data.wrappers.2015
Postgres.foreign.data.wrappers.2015
EDB
 
PGDay.Amsterdam 2018 - Stefanie Stoelting - PostgreSQL As Data Integration Tool
PGDay.Amsterdam 2018 - Stefanie Stoelting - PostgreSQL As Data Integration Tool
PGDay.Amsterdam
 
Relational Database Access with Python ‘sans’ ORM
Relational Database Access with Python ‘sans’ ORM
Mark Rees
 
Amazon RDS for PostgreSQL: What's New and Lessons Learned - NY 2017
Amazon RDS for PostgreSQL: What's New and Lessons Learned - NY 2017
Grant McAlister
 
Writing A Foreign Data Wrapper
Writing A Foreign Data Wrapper
psoo1978
 
Ad

More from Jim Mlodgenski (11)

Strategic autovacuum
Strategic autovacuum
Jim Mlodgenski
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Jim Mlodgenski
 
Oracle postgre sql-mirgration-top-10-mistakes
Oracle postgre sql-mirgration-top-10-mistakes
Jim Mlodgenski
 
Profiling PL/pgSQL
Profiling PL/pgSQL
Jim Mlodgenski
 
Debugging Your PL/pgSQL Code
Debugging Your PL/pgSQL Code
Jim Mlodgenski
 
An Introduction To PostgreSQL Triggers
An Introduction To PostgreSQL Triggers
Jim Mlodgenski
 
PostgreSQL Procedural Languages: Tips, Tricks and Gotchas
PostgreSQL Procedural Languages: Tips, Tricks and Gotchas
Jim Mlodgenski
 
Introduction to PostgreSQL
Introduction to PostgreSQL
Jim Mlodgenski
 
Scaling PostreSQL with Stado
Scaling PostreSQL with Stado
Jim Mlodgenski
 
Multi-Master Replication with Slony
Multi-Master Replication with Slony
Jim Mlodgenski
 
Scaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQL
Jim Mlodgenski
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Jim Mlodgenski
 
Oracle postgre sql-mirgration-top-10-mistakes
Oracle postgre sql-mirgration-top-10-mistakes
Jim Mlodgenski
 
Debugging Your PL/pgSQL Code
Debugging Your PL/pgSQL Code
Jim Mlodgenski
 
An Introduction To PostgreSQL Triggers
An Introduction To PostgreSQL Triggers
Jim Mlodgenski
 
PostgreSQL Procedural Languages: Tips, Tricks and Gotchas
PostgreSQL Procedural Languages: Tips, Tricks and Gotchas
Jim Mlodgenski
 
Introduction to PostgreSQL
Introduction to PostgreSQL
Jim Mlodgenski
 
Scaling PostreSQL with Stado
Scaling PostreSQL with Stado
Jim Mlodgenski
 
Multi-Master Replication with Slony
Multi-Master Replication with Slony
Jim Mlodgenski
 
Scaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQL
Jim Mlodgenski
 
Ad

Recently uploaded (20)

Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
 
PyData - Graph Theory for Multi-Agent Integration
PyData - Graph Theory for Multi-Agent Integration
barqawicloud
 
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
 
Kubernetes Security Act Now Before It’s Too Late
Kubernetes Security Act Now Before It’s Too Late
Michael Furman
 
Down the Rabbit Hole – Solving 5 Training Roadblocks
Down the Rabbit Hole – Solving 5 Training Roadblocks
Rustici Software
 
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
Edge AI and Vision Alliance
 
FME for Good: Integrating Multiple Data Sources with APIs to Support Local Ch...
FME for Good: Integrating Multiple Data Sources with APIs to Support Local Ch...
Safe Software
 
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
 
Edge-banding-machines-edgeteq-s-200-en-.pdf
Edge-banding-machines-edgeteq-s-200-en-.pdf
AmirStern2
 
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc
 
Mastering AI Workflows with FME - Peak of Data & AI 2025
Mastering AI Workflows with FME - Peak of Data & AI 2025
Safe Software
 
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
Edge AI and Vision Alliance
 
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Alliance
 
Oracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI Foundations
VICTOR MAESTRE RAMIREZ
 
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Safe Software
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
Introduction to Typescript - GDG On Campus EUE
Introduction to Typescript - GDG On Campus EUE
Google Developer Group On Campus European Universities in Egypt
 
Data Validation and System Interoperability
Data Validation and System Interoperability
Safe Software
 
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance
 
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Puppy jhon
 
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
 
PyData - Graph Theory for Multi-Agent Integration
PyData - Graph Theory for Multi-Agent Integration
barqawicloud
 
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
 
Kubernetes Security Act Now Before It’s Too Late
Kubernetes Security Act Now Before It’s Too Late
Michael Furman
 
Down the Rabbit Hole – Solving 5 Training Roadblocks
Down the Rabbit Hole – Solving 5 Training Roadblocks
Rustici Software
 
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
Edge AI and Vision Alliance
 
FME for Good: Integrating Multiple Data Sources with APIs to Support Local Ch...
FME for Good: Integrating Multiple Data Sources with APIs to Support Local Ch...
Safe Software
 
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
 
Edge-banding-machines-edgeteq-s-200-en-.pdf
Edge-banding-machines-edgeteq-s-200-en-.pdf
AmirStern2
 
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc
 
Mastering AI Workflows with FME - Peak of Data & AI 2025
Mastering AI Workflows with FME - Peak of Data & AI 2025
Safe Software
 
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
Edge AI and Vision Alliance
 
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Alliance
 
Oracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI Foundations
VICTOR MAESTRE RAMIREZ
 
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Safe Software
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
Data Validation and System Interoperability
Data Validation and System Interoperability
Safe Software
 
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance
 
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Puppy jhon
 

Postgresql Federation

  • 2. Who Am I? ● Jim Mlodgenski – – ● [email protected] @jim_mlodgenski Co-organizer of – – ● NYC PUG (www.nycpug.org) Philly PUG (www.phlpug.org) CTO, OpenSCG – www.openscg.com
  • 4. What is a federated database? “A federated database system is a type of meta-database management system (DBMS), which transparently maps multiple autonomous database systems into a single federated database. The constituent databases are interconnected via a computer network and may be geographically decentralized. ... There is no actual data integration in the constituent disparate databases as a result of data federation.” -Wikipedia
  • 5. How does PostgreSQL do it? ● Uses Foreign Table Wrappers (FDW) ● Used with SQL/MED – – Management of External Data – ● New ANIS SQL 2003 Extension Standard way of handling remote objects in SQL databases Wrappers used by SQL/MED to access remotes data sources
  • 6. Types of Foreign Data Wrappers ● SQL ● NoSQL ● File ● Miscellaneous ● PostgreSQL
  • 8. SQL Wrappers CREATE SERVER oracle_server FOREIGN DATA WRAPPER oracle_fdw OPTIONS (dbserver 'ORACLE_DBNAME'); CREATE USER MAPPING FOR CURRENT_USER SERVER oracle_server OPTIONS (user 'scott', password 'tiger'); CREATE FOREIGN TABLE fdw_test ( userid numeric, username text, email text ) SERVER oracle_server OPTIONS ( schema 'scott', table 'fdw_test'); postgres=# select * from fdw_test; userid | username | email --------+----------+------------------1 | scott (1 row) | [email protected]
  • 10. NoSQL Wrappers CREATE SERVER mongo_server FOREIGN DATA WRAPPER mongo_fdw OPTIONS (address '192.168.122.47', port '27017'); CREATE FOREIGN TABLE databases ( _id NAME, name TEXT ) SERVER mongo_server OPTIONS (database 'mydb', collection 'pgData'); test=# select * from databases ; _id | name --------------------------+-----------52fd49bfba3ae4ea54afc459 | mongo 52fd49bfba3ae4ea54afc45a | postgresql 52fd49bfba3ae4ea54afc45b | oracle 52fd49bfba3ae4ea54afc45c | mysql 52fd49bfba3ae4ea54afc45d | redis 52fd49bfba3ae4ea54afc45e | db2 (6 rows)
  • 11. File Wrappers ● Delimited files ● Fixed length files ● JSON files
  • 12. File Wrappers CREATE SERVER pg_load FOREIGN DATA WRAPPER file_fdw; CREATE FOREIGN TABLE leads ( first_name text, last_name text, company_name text, address text, city text, county text, state text, zip text, phone1 text, phone2 text, email text, web text ) SERVER pg_load OPTIONS ( filename '/tmp/us-500.csv', format 'csv', header 'TRUE' ); test=# select first_name || ' ' || last_name as full_name, email from leads limit 3; full_name | email -------------------+------------------------------James Butt | [email protected] Josephine Darakjy | [email protected] Art Venere (3 rows) | [email protected]
  • 14. Hadoop Wrapper CREATE SERVER hive_server FOREIGN DATA WRAPPER hive_fdw OPTIONS (address '127.0.0.1', port '10000'); CREATE USER MAPPING FOR PUBLIC SERVER hive_server; CREATE FOREIGN TABLE order_line ( ol_w_id integer, ol_d_id integer, ol_o_id integer, ol_number integer, ol_i_id integer, ol_delivery_d timestamp, ol_amount decimal(6,2), ol_supply_w_id integer, ol_quantity decimal(2,0), ol_dist_info varchar(24) ) SERVER hive_server OPTIONS (table 'order_line'); INSERT INTO item_sale_month SELECT ol_i_id as i_id, EXTRACT(YEAR FROM ol_delivery_d) as year, EXTRACT(MONTH FROM ol_delivery_d) as month, sum(ol_amount) as amount FROM order_line GROUP BY 1, 2, 3;
  • 15. Hadoop Wrapper ● Hadoop foreign tables can also be writable CREATE FORIEGN TABLE audit ( audit_id bigint, event_d timestamp, table varchar, action varchar, user varchar, ) SERVER hive_server OPTIONS (table 'audit', flume_port '44444'); INSERT INTO audit VALUES (nextval('audit_id_seq'), now(), 'users', 'SELECT', 'scott');
  • 16. Hadoop Wrapper ● It also works with HBase tables CREATE FOREIGN TABLE hive_hbase_table ( key varchar, value varchar ) SERVER localhive OPTIONS (table 'hbase_table', hbase_address 'localhost', hbase_port '9090', hbase_mapping ':key,cf:val'); INSERT INTO hive_hbase_table VALUES ('key1', 'value1'); INSERT INTO hive_hbase_table VALUES ('key2', 'value2'); UPDATE hive_hbase_table SET value = 'update' WHERE key = 'key2'; DELETE FROM hive_hbase_table WHERE key='key1'; SELECT * from hive_hbase_table;
  • 17. WWW Wrapper CREATE SERVER www_fdw_server_google_search FOREIGN DATA WRAPPER www_fdw OPTIONS (uri 'https://ajax.googleapis.com/ajax/services/search/web?v=1.0'); CREATE USER MAPPING FOR current_user SERVER www_fdw_server_google_search; CREATE FOREIGN TABLE www_fdw_google_search ( q text, GsearchResultClass text, unescapedUrl text, url text, visibleUrl text, cacheUrl text, title text, titleNoFormatting text, content text ) SERVER www_fdw_server_google_search; select url,substring(title,1,25)||'...',substring(content,1,25)||'...' from www_fdw_google_search where q='postgresql fdw'; url | ?column? | ?column? -------------------------------------------------------------+------------------------------+-----------------------------http://wiki.postgresql.org/wiki/Foreign_data_wrappers | Foreign data wrappers - <... | Jan 24, 2014 <b>...</b> 1... http://www.postgresql.org/docs/9.3/static/postgres-fdw.html | <b>PostgreSQL</b>: Docume... | F.31.1. <b>FDW</b> Option... http://www.postgresql.org/docs/9.3/static/fdwhandler.html | <b>PostgreSQL</b>: Docume... | Foreign Data Wrapper Call... http://www.craigkerstiens.com/2013/08/05/a-look-at-FDWs/ | A look at Foreign Data Wr... | Aug 5, 2013 <b>...</b> An... (4 rows)
  • 18. PostgreSQL Wrapper ● The most functional FDW by far ● Replaces much of the functionality of dblink ● Shipped as a contrib module
  • 19. PostgreSQL Wrapper CREATE SERVER postgres_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localhost', port '5432', dbname 'test2'); CREATE USER MAPPING FOR PUBLIC SERVER postgres_server; CREATE FOREIGN TABLE bird_strikes ( aircraft_type varchar, airport varchar, altitude varchar, aircraft_model varchar, num_wildlife_struck varchar, impact_to_flight varchar, effect varchar, location varchar, flight_num varchar, flight_date timestamp, record_id int, indicated_damage varchar, freeform_en_route varchar, num_engines varchar, airline varchar, origin_state varchar, phase_of_flight varchar, precipitation varchar, wildlife_collected boolean, wildlife_sent_to_smithsonian boolean, remarks varchar, reported_date timestamp, wildlife_size varchar, sky_conditions varchar, wildlife_species varchar, when_time_hhmm varchar, time_of_day varchar, pilot_warned varchar, cost_out_of_service varchar, cost_other varchar, cost_repair varchar, cost_total varchar, miles_from_airport varchar, feet_above_ground varchar, num_human_fatalities integer, num_injured integer, speed_knots varchar ) SERVER postgres_server OPTIONS (table_name 'bird_strikes');
  • 20. PostgreSQL Wrapper ● Only requests columns that are needed test=# explain verbose select airport, flight_date from bird_strikes; QUERY PLAN ------------------------------------------------------------------------------Foreign Scan on public.bird_strikes (cost=100.00..148.40 rows=1280 width=40) Output: airport, flight_date Remote SQL: SELECT airport, flight_date FROM public.bird_strikes (3 rows)
  • 21. PostgreSQL Wrapper ● Sends a WHERE clause test=# explain verbose select airport, flight_date from bird_strikes where flight_date > '2011-01-01'; QUERY PLAN -----------------------------------------------------------------Foreign Scan on public.bird_strikes rows=427 width=40) (cost=100.00..134.54 Output: airport, flight_date Remote SQL: SELECT airport, flight_date FROM public.bird_strikes WHERE ((flight_date > '2011-01-01 00:00:00'::timestamp without time zone)) (3 rows)
  • 22. PostgreSQL Wrapper ● Sends built-in immutable functions test=# explain verbose select airport, flight_date from bird_strikes where flight_date > '2011-01-01' and length(airport) < 10; QUERY PLAN ------------------------------------------------------------------------------Foreign Scan on public.bird_strikes (cost=100.00..135.24 rows=142 width=40) Output: airport, flight_date Remote SQL: SELECT airport, flight_date FROM public.bird_strikes WHERE ((flight_date > '2011-01-01 00:00:00'::timestamp without time zone)) AND ((length(airport) < 10)) (3 rows)
  • 23. PostgreSQL Wrapper ● Writable (INSERT, UPDATE, DELETE) test=# explain verbose update bird_strikes set airport = 'Unknown' where record_id = 313339; QUERY PLAN ------------------------------------------------------------------------------Update on public.bird_strikes (cost=100.00..111.05 rows=1 width=964) Remote SQL: UPDATE public.bird_strikes SET airport = $2 WHERE ctid = $1 -> Foreign Scan on public.bird_strikes (cost=100.00..111.05 rows=1 width=964) Output: aircraft_type, 'Unknown'::character varying, altitude, aircraft_model, num_wildlife_struck, impact_to_flight, effect, location, flight_num, flight_date, record_id, indicated_damage, freefo rm_en_route, num_engines, airline, origin_state, phase_of_flight, precipitation, wildlife_collected, wildlife_sent_to_smithsonian, remarks, reported_date, wildlife_size, sky_conditions, wildlife_species, w hen_time_hhmm, time_of_day, pilot_warned, cost_out_of_service, cost_other, cost_repair, cost_total, miles_from_airport, feet_above_ground, num_human_fatalities, num_injured, speed_knots, ctid Remote SQL: SELECT aircraft_type, altitude, aircraft_model, num_wildlife_struck, impact_to_flight, effect, location, flight_num, flight_date, record_id, indicated_damage, freeform_en_route, num_en gines, airline, origin_state, phase_of_flight, precipitation, wildlife_collected, wildlife_sent_to_smithsonian, remarks, reported_date, wildlife_size, sky_conditions, wildlife_species, when_time_hhmm, time _of_day, pilot_warned, cost_out_of_service, cost_other, cost_repair, cost_total, miles_from_airport, feet_above_ground, num_human_fatalities, num_injured, speed_knots, ctid FROM public.bird_strikes WHERE ( (record_id = 313339)) FOR UPDATE (5 rows)
  • 24. PostgreSQL Wrapper ● Writes are transactional test=# select airport from bird_strikes where record_id = 313339; airport --------Unknown (1 row) test=# BEGIN; BEGIN test=# update bird_strikes set airport = 'UNKNOWN' where record_id = 313339; UPDATE 1 test=# ROLLBACK; ROLLBACK test=# select airport from bird_strikes where record_id = 313339; airport --------Unknown (1 row)
  • 25. Limitations ● Aggregates are not pushed down test=# explain verbose select count(*) from bird_strikes; QUERY PLAN --------------------------------------------------------------------------------------------------------Aggregate (cost=220.92..220.93 rows=1 width=0) Output: count(*) -> Foreign Scan on public.bird_strikes (cost=100.00..212.39 rows=3413 width=0) Output: aircraft_type, airport, altitude, aircraft_model, num_wildlife_struck, impact_to_flight, effect, location, flight_num, flight_date, record_id, indicated_damage, freeform_en_route, num_engi nes, airline, origin_state, phase_of_flight, precipitation, wildlife_collected, wildlife_sent_to_smithsonian, remarks, reported_date, wildlife_size, sky_conditions, wildlife_species, when_time_hhmm, time_o f_day, pilot_warned, cost_out_of_service, cost_other, cost_repair, cost_total, miles_from_airport, feet_above_ground, num_human_fatalities, num_injured, speed_knots Remote SQL: SELECT NULL FROM public.bird_strikes (5 rows)
  • 26. Limitations ● ORDER BY, GROUP BY, LIMIT not pushed down test=# explain verbose select flight_num from bird_strikes order by flight_date limit 5; QUERY PLAN ------------------------------------------------------------------------------------------Limit (cost=169.66..169.67 rows=5 width=40) Output: flight_num, flight_date -> Sort (cost=169.66..172.86 rows=1280 width=40) Output: flight_num, flight_date Sort Key: bird_strikes.flight_date -> Foreign Scan on public.bird_strikes (cost=100.00..148.40 rows=1280 width=40) Output: flight_num, flight_date Remote SQL: SELECT flight_num, flight_date FROM public.bird_strikes (8 rows)
  • 27. Limitations ● Joins not pushed down test=# explain verbose select s.name, b.flight_date test-# from bird_strikes b, state_code s test-# where b.location = s.abbreviation and flight_date > '2011-01-01'; QUERY PLAN ------------------------------------------------------------------------------Hash Join (cost=239.88..349.95 rows=1986 width=40) Output: s.name, b.flight_date Hash Cond: ((s.abbreviation)::text = (b.location)::text) -> Foreign Scan on public.state_code s (cost=100.00..137.90 rows=930 width=64) Output: s.id, s.name, s.abbreviation, s.country, s.type, s.sort, s.status, s.occupied, s.notes, s.fips_state, s.assoc_press, s.standard_federal_region, s.census_region, s.census_region_name, s.cen sus_division, s.census_devision_name, s.circuit_court Remote SQL: SELECT name, abbreviation FROM public.state_code -> Hash (cost=134.54..134.54 rows=427 width=40) Output: b.flight_date, b.location -> Foreign Scan on public.bird_strikes b (cost=100.00..134.54 rows=427 width=40) Output: b.flight_date, b.location Remote SQL: SELECT location, flight_date FROM public.bird_strikes WHERE ((flight_date > '2011-01-01 00:00:00'::timestamp without time zone)) (11 rows)
  • 28. Limitations (Gotcha) ● Sometimes the foreign tables don't act like tables test=# SELECT l.*, w.lat, w.lng FROM leads l, www_fdw_geocoder_google w WHERE w.address = l.address || ',' || l.city || ',' || l.state; first_name | last_name | company_name | address | city | county | state | zip | phone1 | phone2 | email | web | lat | lng ------------+-----------+--------------+---------+------+-------+-------+-----+--------+--------+-------+-----+-----+----(0 rows)
  • 29. Limitations (Gotcha) QUERY PLAN ------------------------------------------------------------------------------------------Merge Join (cost=187.47..215.47 rows=1000 width=448) Output: l.first_name, l.last_name, l.company_name, l.address, l.city, l.county, l.state, l.zip, l.phone1, l.phone2, l.email, l.web, w.lat, w.lng Merge Cond: ((((((l.address || ','::text) || l.city) || ','::text) || l.state)) = w.address) -> Sort (cost=37.64..38.14 rows=200 width=384) Output: l.first_name, l.last_name, l.company_name, l.address, l.city, l.county, l.state, l.zip, l.phone1, l.phone2, l.email, l.web, (((((l.address || ','::text) || l.city) || ','::text) || l.state )) Sort Key: (((((l.address || ','::text) || l.city) || ','::text) || l.state)) -> Foreign Scan on public.leads l (cost=0.00..30.00 rows=200 width=384) Output: l.first_name, l.last_name, l.company_name, l.address, l.city, l.county, l.state, l.zip, l.phone1, l.phone2, l.email, l.web, ((((l.address || ','::text) || l.city) || ','::text) || l. state) Foreign File: /tmp/us-500.csv Foreign File Size: 81485 -> Sort (cost=149.83..152.33 rows=1000 width=96) Output: w.lat, w.lng, w.address Sort Key: w.address -> Foreign Scan on public.www_fdw_geocoder_google w Output: w.lat, w.lng, w.address WWW API: Request (16 rows) (cost=0.00..100.00 rows=1000 width=96)
  • 30. Limitations (Gotcha) CREATE OR REPLACE FUNCTION google_geocode( OUT first_name text, OUT last_name text, OUT company_name text, OUT address text, OUT city text, OUT county text, OUT state text, OUT zip text, OUT phone1 text, OUT phone2 text, OUT email text, OUT web text, OUT lat text, OUT lng text) RETURNS SETOF RECORD AS $$ DECLARE r record; f_adr text; l_lat text; l_lng text; BEGIN FOR r IN SELECT * FROM leads LOOP f_adr := r.address || ',' || r.city || ',' || r.state; EXECUTE 'SELECT lat, lng FROM www_fdw_geocoder_google WHERE address = $1' INTO l_lat, l_lng USING f_adr; SELECT r.first_name, r.last_name, r.company_name, r.address, r.city, r.county, r.state, r.zip, r.phone1, r.phone2, r.email, r.web, l_lat, l_lng INTO first_name, last_name, company_name, address, city, county, state, zip, phone1, phone2, email, web, lat, lng; RETURN NEXT; END LOOP; END $$ LANGUAGE plpgsql;
  • 31. Writing a new FDW ● Might not need to write one if there is a http interface ● Use the Blackhole as a template – https://bitbucket.org/adunstan/blackhole_fdw
  • 32. Writing a new FDW Datum blackhole_fdw_handler(PG_FUNCTION_ARGS){ ... /* these are required */ fdwroutine->GetForeignRelSize = blackholeGetForeignRelSize; fdwroutine->GetForeignPaths = blackholeGetForeignPaths; fdwroutine->GetForeignPlan = blackholeGetForeignPlan; fdwroutine->BeginForeignScan = blackholeBeginForeignScan; fdwroutine->IterateForeignScan = blackholeIterateForeignScan; fdwroutine->ReScanForeignScan = blackholeReScanForeignScan; fdwroutine->EndForeignScan = blackholeEndForeignScan; /* remainder are optional - use NULL if not required */ /* support for insert / update / delete */ fdwroutine->AddForeignUpdateTargets = blackholeAddForeignUpdateTargets; fdwroutine->PlanForeignModify = blackholePlanForeignModify; fdwroutine->BeginForeignModify = blackholeBeginForeignModify; fdwroutine->ExecForeignInsert = blackholeExecForeignInsert; fdwroutine->ExecForeignUpdate = blackholeExecForeignUpdate; fdwroutine->ExecForeignDelete = blackholeExecForeignDelete; fdwroutine->EndForeignModify = blackholeEndForeignModify; /* support for EXPLAIN */ fdwroutine->ExplainForeignScan = blackholeExplainForeignScan; fdwroutine->ExplainForeignModify = blackholeExplainForeignModify; /* support for ANALYSE */ fdwroutine->AnalyzeForeignTable = blackholeAnalyzeForeignTable; PG_RETURN_POINTER(fdwroutine); }
  • 33. Future ● Even more Wrappers ● Check Constraints on Foreign Tables – ● Allows partitioning Joins – Custom Scan API ● Probably will not be the way to do this, but progress being made