SlideShare a Scribd company logo
Kamil Bajda-Pawlikowski
Co-founder and CTO
www.starburstdata.com
Fast SQL-on-Anything
Alluxio meetup
2018 @ CA
Presto is SQL on anything
Query anything, anywhere
© 2018 2
Presto Users
https://github.com/prestodb/presto/wiki/Presto-Users
Presto in production
Facebook: 1000s of nodes, HDFS (ORC, RCFile), sharded MySQL, 1000s of users
Uber: 800+ nodes (2 clusters on premises) with 200K+ queries daily over HDFS (Parquet/ORC)
Twitter: 800+ nodes (several clusters on premises) for HDFS (Parquet)
LinkedIn: 350+ nodes (2 clusters on premises), 40K+ queries daily over HDFS (ORC), 600+ users
Netflix: 250+ nodes in AWS, 40+ PB in S3 (Parquet)
Lyft: 200+ nodes in AWS, 20K+ queries daily, 20+ PBs in Parquet
Yahoo! Japan: 200+ nodes (4 clusters on premises) for HDFS (ORC), ObjectStore, and Cassandra
FINRA: 120+ nodes in AWS, 4PB in S3 (ORC), 200+ users
5
Why Presto?
© 2018 6
Why Presto?
Community-driven
open source project
High performance ANSI SQL engine
• New Cost-Based Query Optimizer
• Proven scalability
• High concurrency
Separation of compute and
storage
• Scale storage and compute
independently
• No ETL or data integration
necessary to get to insights
• SQL-on-anything
No vendor lock-in
• No Hadoop distro vendor lock-in
• No storage engine vendor lock-in
• No cloud vendor lock-in
7
Beyond ANSI SQL
Presto offers a wide variety of built-in functions including:
● regular expression functions
● lambda expressions and functions
● geospatial functions
Complex data types:
● JSON
● ARRAY
● MAP
● ROW / STRUCT
SELECT regexp_extract_all('1a 2b 14m', 'd+'); -- [1, 2, 14]
SELECT filter(ARRAY [5, -6, NULL, 7], x -> x > 0); -- [5, 7]
SELECT transform(ARRAY [5, 6], x -> x + 1); -- [6, 7]
SELECT c.city_id, count(*) as trip_count
FROM trips_table as t
JOIN city_table as c
ON st_contains(c.geo_shape,
st_point(t.dest_lng, t.dest_lat))
WHERE t.trip_date = ‘2018-05-01’
GROUP BY 1;
JDBC / ODBC drivers for BI/SQL tools
C/C++, Go, Java, Node.js, Python, PHP, R and Ruby on Rails
UDFs, UDAFs, Connector SPI
Tools, bindings, extensibility
https://www.starburstdata.com/presto-aws-cloud/
https://www.starburstdata.com/technical-blog/presto-available-on-aws-marketplace/
Presto on AWS
Fully integrated with AWS:
● Amazon S3
● AWS Glue Catalog
● Autoscaling
● AWS Marketplace
https://www.starburstdata.com/presto-azure/
https://azure.microsoft.com/en-us/blog/azure-hdinsight-and-starburst-brings-presto-to-micr
osoft-azure-customers/
Presto on Azure
Fully integrated with Azure HDInsight:
● Azure Blob Storage
● Azure Data Lake Storage
● External Hive Metastore
● Microsoft PowerBI
Presto
Performance
12© 2018
Built for Performance
Query Execution Engine:
• MPP-style pipelined in-memory execution
• Columnar and vectorized data processing
• Runtime query bytecode compilation
• Memory efficient data structures
• Multi-threaded multi-core execution
• Optimized readers for columnar formats (ORC and Parquet)
• Now also Cost-Based Optimizer
13© 2018
CBO in a nutshell
Cost-Based Optimizer v1 includes:
• support for statistics stored in Hive Metastore
• join reordering based on selectivity estimates and cost
• automatic join type selection (repartitioned vs broadcast)
• automatic left/right side selection for joined tables
https://www.starburstdata.com/technical-blog/
14© 2018
Presto CBO Speedup
Duration of TPC-DS queries (lower is better)
© 2018 15
https://www.starburstdata.com/presto-benchmarks/
Cloud cost reduction
● on average 7x improvement vs EMR Presto
● EMR Presto cannot execute many TPC-DS queries
● All TPC-DS queries pass on Starburst Presto
16© 2018
https://www.starburstdata.com/presto-aws/
Further reading
https://www.starburstdata.com/technical-blog/
https://fivetran.com/blog/warehouse-benchmark
https://www.concurrencylabs.com/blog/starburst-presto-vs-aws-emr-sql/
http://bytes.schibsted.com/bigdata-sql-query-engine-benchmark/
https://virtuslab.com/blog/benchmarking-spark-sql-presto-hive-bi-processing-g
oogles-cloud-dataproc/
17
Why Starburst?
© 2018 18
Starburst Data
© 2018 19
Founded by Presto committers:
● Over 3 years of contributions to Presto
● Presto distro for on-prem and cloud env
● Supporting customers in production
● Enterprise subscription add-ons
Notable features contributed:
● ANSI SQL syntax enhancements
● Execution engine improvements
● Security integrations
● Spill to disk
● Cost-Based Optimizer
https://www.starburstdata.com/presto-enterprise/
Thank You!
20
Twitter: @starburstdata @prestodb
Blog: www.starburstdata.com/technical-blog/
Newsletter: www.starburstdata.com/newsletter
© 2018

More Related Content

PDF
Presto: SQL-on-Anything. Netherlands Hadoop User Group Meetup
PDF
Presto: Query Anything - Data Engineer’s perspective
PDF
Using Alluxio as a Fault Tolerant Pluggable Optimization Component to Compute...
PDF
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
PDF
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
PDF
Presto Summit 2018 - 01 - Facebook Presto
PDF
Presto + Alluxio on steroids a romantic drama on Production with happy end
PDF
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
Presto: SQL-on-Anything. Netherlands Hadoop User Group Meetup
Presto: Query Anything - Data Engineer’s perspective
Using Alluxio as a Fault Tolerant Pluggable Optimization Component to Compute...
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
Presto Summit 2018 - 01 - Facebook Presto
Presto + Alluxio on steroids a romantic drama on Production with happy end
High Performance Data Lake with Apache Hudi and Alluxio at T3Go

What's hot (20)

PDF
Apache Iceberg Presentation for the St. Louis Big Data IDEA
PPTX
Presto: SQL-on-anything
PDF
Building Fast SQL Analytics on Anything with Presto, Alluxio
PDF
Iceberg + Alluxio for Fast Data Analytics
PDF
Apache Iceberg - A Table Format for Hige Analytic Datasets
PPTX
Presto query optimizer: pursuit of performance
PDF
Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid
PDF
Presto Summit 2018 - 09 - Netflix Iceberg
PDF
Prestogres, ODBC & JDBC connectivity for Presto
PDF
Accelerating Hive with Alluxio on S3
PDF
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
PDF
Presto@Uber
PDF
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
PDF
Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...
PDF
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
PDF
Alluxio Use Cases and Future Directions
PPTX
Presto@Netflix Presto Meetup 03-19-15
PDF
Open source data ingestion
PDF
The Practice of Alluxio in JD.com
PDF
Presto on Alluxio Hands-On Lab
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Presto: SQL-on-anything
Building Fast SQL Analytics on Anything with Presto, Alluxio
Iceberg + Alluxio for Fast Data Analytics
Apache Iceberg - A Table Format for Hige Analytic Datasets
Presto query optimizer: pursuit of performance
Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Presto Summit 2018 - 09 - Netflix Iceberg
Prestogres, ODBC & JDBC connectivity for Presto
Accelerating Hive with Alluxio on S3
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Presto@Uber
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
Alluxio Use Cases and Future Directions
Presto@Netflix Presto Meetup 03-19-15
Open source data ingestion
The Practice of Alluxio in JD.com
Presto on Alluxio Hands-On Lab
Ad

Similar to Presto Fast SQL on Anything (20)

PDF
Presto talk @ Global AI conference 2018 Boston
PDF
AWSug.nl Data recap Jan 2023
PDF
Oracle Database / Exadata Cloud 最新情報(Oracle Cloudウェビナーシリーズ: 2020年7月2日)
ODP
OrientDB for real & Web App development
PPTX
Stargate, the gateway for some multi-models data API
PPTX
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PDF
Machine Learning with H2O, Spark, and Python at Strata 2015
PDF
Seattle StrongLoop Node.js Workshop
PDF
自律型データベース Oracle Autonomous Database 最新情報(Oracle Cloudウェビナーシリーズ: 2020年8月6日)
PPT
Cloud State of the Union for Java Developers
PDF
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
PPT
OGCE Overview for SciDAC 2009
PDF
Presto @ Zalando - Big Data Tech Warsaw 2020
PDF
Druid: Under the Covers (Virtual Meetup)
PDF
Cloud computing overview & running your code on Google Cloud
PDF
Spark Streaming @ Berlin Apache Spark Meetup, March 2015
PDF
【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2020年8月版]
PDF
Deploy Basic AI web apps with Serverless Computing from Google Cloud
PDF
Query Anything, Anywhere with Kubernetes
Presto talk @ Global AI conference 2018 Boston
AWSug.nl Data recap Jan 2023
Oracle Database / Exadata Cloud 最新情報(Oracle Cloudウェビナーシリーズ: 2020年7月2日)
OrientDB for real & Web App development
Stargate, the gateway for some multi-models data API
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Presto: Optimizing Performance of SQL-on-Anything Engine
Machine Learning with H2O, Spark, and Python at Strata 2015
Seattle StrongLoop Node.js Workshop
自律型データベース Oracle Autonomous Database 最新情報(Oracle Cloudウェビナーシリーズ: 2020年8月6日)
Cloud State of the Union for Java Developers
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
OGCE Overview for SciDAC 2009
Presto @ Zalando - Big Data Tech Warsaw 2020
Druid: Under the Covers (Virtual Meetup)
Cloud computing overview & running your code on Google Cloud
Spark Streaming @ Berlin Apache Spark Meetup, March 2015
【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2020年8月版]
Deploy Basic AI web apps with Serverless Computing from Google Cloud
Query Anything, Anywhere with Kubernetes
Ad

More from Alluxio, Inc. (20)

PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
Introduction to Apache Iceberg™ & Tableflow
PDF
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
PDF
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
PDF
From Data Preparation to Inference: How Alluxio Speeds Up AI
PDF
Best Practice for LLM Serving in the Cloud
PDF
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
PDF
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
PDF
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
PDF
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
PDF
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
PDF
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
PDF
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
PDF
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
PDF
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
PDF
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
PDF
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
PDF
Alluxio Webinar | Accelerate AI: Alluxio 101
PDF
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
PDF
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Introduction to Apache Iceberg™ & Tableflow
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
From Data Preparation to Inference: How Alluxio Speeds Up AI
Best Practice for LLM Serving in the Cloud
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio Webinar | Accelerate AI: Alluxio 101
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...

Recently uploaded (20)

PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Approach and Philosophy of On baking technology
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Empathic Computing: Creating Shared Understanding
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Getting Started with Data Integration: FME Form 101
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Machine learning based COVID-19 study performance prediction
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
August Patch Tuesday
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Univ-Connecticut-ChatGPT-Presentaion.pdf
Approach and Philosophy of On baking technology
Programs and apps: productivity, graphics, security and other tools
Empathic Computing: Creating Shared Understanding
SOPHOS-XG Firewall Administrator PPT.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Getting Started with Data Integration: FME Form 101
Assigned Numbers - 2025 - Bluetooth® Document
OMC Textile Division Presentation 2021.pptx
Encapsulation_ Review paper, used for researhc scholars
Machine learning based COVID-19 study performance prediction
A comparative study of natural language inference in Swahili using monolingua...
Spectral efficient network and resource selection model in 5G networks
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
August Patch Tuesday
NewMind AI Weekly Chronicles - August'25-Week II
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...

Presto Fast SQL on Anything