SlideShare a Scribd company logo
Falcon - built for speed

Ann Harrison
Kevin Lewis


                           MySQL Users' Conference April 2009
If it's so fast, why isn't it done yet?
Talk overview
 Falcon at a glance
 Project history
 Multi-threading for the database developer
 Cycle locking
Falcon at a glance – read first record
 MySQL                 Record Cache
 Server

                                        Serial
          Page Cache                     Log
                                        Windows



                                      Serial Log
     Database
                                        Files
     Tablespaces
Falcon at a glance – read complete
 MySQL                 Record Cache
 Server

                                        Serial
          Page Cache                     Log
                                        Windows



                                      Serial Log
     Database
                                        Files
     Tablespaces
Falcon at a glance – read again
 MySQL                 Record Cache
 Server

                                        Serial
          Page Cache                     Log
                                        Windows



                                      Serial Log
     Database
                                        Files
     Tablespaces
Falcon at a glance – write new record
 MySQL                 Record Cache
 Server

                                        Serial
          Page Cache                     Log
                                        Windows



                                      Serial Log
     Database
                                        Files
     Tablespaces
Falcon at a glance – commit
 MySQL                 Record Cache
 Server

                                        Serial
          Page Cache                     Log
                                        Windows



                                      Serial Log
     Database
                                        Files
     Tablespaces
Falcon at a glance – write complete
 MySQL                 Record Cache
 Server

                                        Serial
          Page Cache                     Log
                                        Windows



                                      Serial Log
     Database
                                        Files
     Tablespaces
Falcon history
 Origin
   Transactional SQL Engine for Web App Environment
   Bought by MySQL in 2006
 MVCC
   Consistent Read
   Verisons control write access
   Memory only – no steal
 Indexes and data separate
 Data encoded on disk and in memory
 Fine grained multi-threading
Falcon Goals circa 2006

Exploit large memory for more than just a bigger cache
Use threads and processors for data migration
Eliminate tradeoffs, minimize tuning
Scale gracefully to very heavy loads
Support web applications
Web application characteristics
 Large archive of data
 Smaller active set
 High read:write ratio
 Uneven, bursty activity
What we did instead

 Enforce limit on record cache size
 Respond to simple atypical loads
   Autocommit single record access
   Repeat “insert ... select”
   Single pass read of large data set
 Challenge InnoDB on DBT2
   Large working set
   Continuous heavy load
 Hired the world's most vicious test designer
Record Cache
 Record Cache contains:
   Committed records with no versions
Record Cache
 Record Cache contains:
   Committed records with no versions

   New, uncommitted records
Record Cache
 Record Cache contains:
   Committed records with no versions

   New, uncommitted records

   Records with multiple versions
Record Cache cleanup – step 1
 Cleanup old committed single
 version records
 Scavenger
 Runs on schedule or demand
 Removes oldest mature records
 Settable limits – start and stop
Record Cache Cleanup – step 2
 Clean out record versions too old
 to be useful

 Prune
   Remove old, unneeded versions
Record Cache Cleanup – step 3
Clean up a cache full of new records

Chill
  Copy new record data to log
  Done by transaction thread
  Settable start size
Record Cache Cleanup – step 4
 Clean up multiple versions of a
 single record created by a single
 transaction

 Remove intermediate versions
   Created by a single transaction
   Rolled back to save point
   Repeated updates
Record Cache Cleanup – step 5
 Clean up records with multiple
 versions, still potentially visible
 Backlog
    Copy entire record tree to disk
    Expensive
    Not yet working
Simple, atypical loads
 Challenge:
   Autocommit single record access
      Record cache is useless
      Record encoding is useless
      Transaction creation / destruction is too expensive

 Response:
   Reuse read only transactions

 Result:
   Multi-threaded bookkeeping nightmare
Simple, atypical loads
 Challenge:
   Repeat “insert ... select...”

 Fill cache with old and new records
Simple, atypical loads
 Challenge:
    Repeat “insert ... select...”

 Fill cache with old and new records

 First solution
    Scavenge old records
    Chill new record data
Simple, atypical loads
 Challenge:
   Repeat “insert ... select...”
 Fill cache with old and new records
 First solution
   Scavenge old records
   Chill new records
 Second solution
   Move the records headers out
   Also helps index creation
Simple, atypical loads

 Single pass read of large data set
   Read more records than
   Read them over and over
   Caches are useless
   Encoding is overhead
 Response:
   Make encoding optional?
Challenge InnoDB on DBT2
                Initial results were not encouraging (2007)
               30000




               25000




               20000
Transactions




                                                                          Falcon2007
               15000
                                                                          InnoDB2007




               10000




               5000




                  0
                       10    20      50                 100   150   200
                                          Connections
Challenge InnoDB on DBT2
                But Falcon has improved a lot since April 2007
               30000




               25000




               20000
Transactions




                                                                        Falcon2007
               15000                                                    InnoDB2007
                                                                        Falcon2009



               10000




               5000




                  0
                       10   20     50                 100   150   200
                                        Connections
Challenge InnoDB on DBT2
                 So did InnoDB
               30000




               25000




               20000
Transactions




                                                                      Falcon2007
                                                                      InnoDB2007
               15000
                                                                      Falcon2009
                                                                      InnoDB2009


               10000




               5000




                  0
                       10   20   50                 100   150   200
                                      Connections
Bug trends
Multi-threading
 Databases are a natural fit for multi-threading
   Connections
   Gophers
   Scavenger
   Disk reader/writer
 Except for shared structures
 Locking blocks parallel operations

 Challenge – sharing without locking
Multi-threading
 Non-locking operation



 Purge old record versions
Multi-threading
 Non-locking operation




 Purge old record versions
Multi-threading
 Locking operation

 Remove intermediate versions
Multi-threading
 Locking operation

 Remove intermediate versions

 What granularity of lock?
Multi-threading – Lock granularity

 One per record:
   Too many interlocked instructions

 One per record group:
   Thread reading one record prevents scavenge of another

 No answer is right – more options?
Cycle locking – read record chain
 Before starting to read a record chain,
 get a shared lock on a “cycle”
         Cycle 1 = 3   Cycle 2
          shared     inactive


 Transaction A
 Transaction B
 Transaction C
Cycle locking – clean a record chain
 Before starting to read a record chain,
 get a shared lock on a “cycle”
         Cycle 1 = 4    Cycle 2
          shared       inactive


 Transaction A active in Cycle 1
 Transaction B active in Cycle 1
 Transaction C active in Cycle 1
 Scavenger unlinks versions
 from record chain and links them
 to a “to be deleted” list.
Cycle locking – records relinked


        Cycle 1 = 1   Cycle 2
         shared       inactive

 Transaction A releases lock
 Transaction B releases lock
 Transaction C still active
 Scavenger releases lock
Cycle locking – swap cycles
 New access locks cycle 2

        Cycle 1 = 1 Cycle 2 = 1
         shared     shared

 Transaction C holds Cycle 1 lock
 Cycle Manager requests exclusive
 on Cycle 1 (pumps cycle)
 Transaction A acquires Cycle 2 lock
Cycle locking – cleanup phase


        Cycle 1 = 0 Cycle 2 = 2
         shared     shared
        exclusive
 Transaction C releases lock
 Transaction B acquires Cycle 2 lock
 Cycle manager exclusive Cycle 1
Cycle locking – cleanup complete


         Cycle 1     Cycle 2 = 2
         exclusive   shared

 Transaction C acquires Cycle 2 lock
 Cycle manager exclusive Cycle 1
 Remove unlinked, unloved, old
 versions
 When cleanup is done, Cycle
 manager releases cycle 1
Questions

More Related Content

PDF
IPW2008 - my.opera.com scalability
PDF
Keith Larson Replication
PPTX
Mgangler Virtualization
PDF
OFM SOA Suite 11gR1 – Installation Demonstration
PDF
Techorama 2017 - What's new in Windows Server 2016
PPT
Jurijs Velikanovs Direct NFS - Why and How?
PDF
Storage in windows server 2012
PDF
Ef09 installing-alfresco-components-1-by-1
IPW2008 - my.opera.com scalability
Keith Larson Replication
Mgangler Virtualization
OFM SOA Suite 11gR1 – Installation Demonstration
Techorama 2017 - What's new in Windows Server 2016
Jurijs Velikanovs Direct NFS - Why and How?
Storage in windows server 2012
Ef09 installing-alfresco-components-1-by-1

What's hot (12)

PDF
WildFly AppServer - State of the Union
PDF
Samba4 Introduction
PDF
Linux container & docker
PDF
JBoss AS / EAP and Java EE6
PDF
EBS in an hour: Build a Vision instance - FAST - in Oracle Virtualbox
PDF
JBoss EAP / WildFly, State of the Union
PDF
11g r2 rac_guide
PDF
OpenSolaris Web Stack MySQL BOF
PDF
JCR In 10 Minutes
PDF
12 Things About WebLogic 12.1.3 #oow2014 #otnla15
PDF
Ameba Piggの裏側
PDF
WildFly BOF and V9 update @ Devoxx 2014
WildFly AppServer - State of the Union
Samba4 Introduction
Linux container & docker
JBoss AS / EAP and Java EE6
EBS in an hour: Build a Vision instance - FAST - in Oracle Virtualbox
JBoss EAP / WildFly, State of the Union
11g r2 rac_guide
OpenSolaris Web Stack MySQL BOF
JCR In 10 Minutes
12 Things About WebLogic 12.1.3 #oow2014 #otnla15
Ameba Piggの裏側
WildFly BOF and V9 update @ Devoxx 2014
Ad

Viewers also liked (6)

PPTX
Dix french movies - Box Office XXth century
PDF
Rollo manual-cz-web
DOCX
SEO(Search Engine Optimization) On Page and Off Page Factor
PDF
Introduction to Search Engine Optimization On Page
PDF
Googles 200 ranking factors - How to Rank a Website in Google
PDF
Albert Camus en BD
Dix french movies - Box Office XXth century
Rollo manual-cz-web
SEO(Search Engine Optimization) On Page and Off Page Factor
Introduction to Search Engine Optimization On Page
Googles 200 ranking factors - How to Rank a Website in Google
Albert Camus en BD
Ad

Similar to Falcon Storage Engine Designed For Speed Presentation (20)

PDF
InnoDB architecture and performance optimization (Пётр Зайцев)
PDF
Locality of (p)reference
PDF
Join-fu: The Art of SQL Tuning for MySQL
PDF
Congratsyourthedbatoo
PDF
Inno Db Performance And Usability Patches
PPTX
Databases for Storage Engineers
PDF
"Advanced MySQL 5 Tuning" by Michael Monty Widenius @ eLiberatica 2007
PPTX
Falando de MySQL
ODP
MySQL Scaling Presentation
PDF
Make Your Life Easier With Maatkit
PDF
Pldc2012 innodb architecture and internals
ODP
The care and feeding of a MySQL database
PDF
Linux and H/W optimizations for MySQL
PDF
InnoDB Internal
PDF
Cloudcon East Presentation
PDF
Cloudcon East Presentation
PDF
2008 MySQL Conference Recap
PDF
Advanced mysql replication for the masses
PDF
2012.10.20 OSC 2012 Hiroshima
PDF
090507.New Replication Features(2)
InnoDB architecture and performance optimization (Пётр Зайцев)
Locality of (p)reference
Join-fu: The Art of SQL Tuning for MySQL
Congratsyourthedbatoo
Inno Db Performance And Usability Patches
Databases for Storage Engineers
"Advanced MySQL 5 Tuning" by Michael Monty Widenius @ eLiberatica 2007
Falando de MySQL
MySQL Scaling Presentation
Make Your Life Easier With Maatkit
Pldc2012 innodb architecture and internals
The care and feeding of a MySQL database
Linux and H/W optimizations for MySQL
InnoDB Internal
Cloudcon East Presentation
Cloudcon East Presentation
2008 MySQL Conference Recap
Advanced mysql replication for the masses
2012.10.20 OSC 2012 Hiroshima
090507.New Replication Features(2)

More from elliando dias (20)

PDF
Clojurescript slides
PDF
Why you should be excited about ClojureScript
PDF
Functional Programming with Immutable Data Structures
PPT
Nomenclatura e peças de container
PDF
Geometria Projetiva
PDF
Polyglot and Poly-paradigm Programming for Better Agility
PDF
Javascript Libraries
PDF
How to Make an Eight Bit Computer and Save the World!
PDF
Ragel talk
PDF
A Practical Guide to Connecting Hardware to the Web
PDF
Introdução ao Arduino
PDF
Minicurso arduino
PDF
Incanter Data Sorcery
PDF
PDF
Fab.in.a.box - Fab Academy: Machine Design
PDF
The Digital Revolution: Machines that makes
PDF
Hadoop + Clojure
PDF
Hadoop - Simple. Scalable.
PDF
Hadoop and Hive Development at Facebook
PDF
Multi-core Parallelization in Clojure - a Case Study
Clojurescript slides
Why you should be excited about ClojureScript
Functional Programming with Immutable Data Structures
Nomenclatura e peças de container
Geometria Projetiva
Polyglot and Poly-paradigm Programming for Better Agility
Javascript Libraries
How to Make an Eight Bit Computer and Save the World!
Ragel talk
A Practical Guide to Connecting Hardware to the Web
Introdução ao Arduino
Minicurso arduino
Incanter Data Sorcery
Fab.in.a.box - Fab Academy: Machine Design
The Digital Revolution: Machines that makes
Hadoop + Clojure
Hadoop - Simple. Scalable.
Hadoop and Hive Development at Facebook
Multi-core Parallelization in Clojure - a Case Study

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
TLE Review Electricity (Electricity).pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Getting Started with Data Integration: FME Form 101
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
Spectroscopy.pptx food analysis technology
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
1. Introduction to Computer Programming.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
Tartificialntelligence_presentation.pptx
NewMind AI Weekly Chronicles - August'25-Week II
SOPHOS-XG Firewall Administrator PPT.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Network Security Unit 5.pdf for BCA BBA.
TLE Review Electricity (Electricity).pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Encapsulation_ Review paper, used for researhc scholars
Unlocking AI with Model Context Protocol (MCP)
Programs and apps: productivity, graphics, security and other tools
Building Integrated photovoltaic BIPV_UPV.pdf
Empathic Computing: Creating Shared Understanding
Getting Started with Data Integration: FME Form 101
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Spectroscopy.pptx food analysis technology
Spectral efficient network and resource selection model in 5G networks
1. Introduction to Computer Programming.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Heart disease approach using modified random forest and particle swarm optimi...
Assigned Numbers - 2025 - Bluetooth® Document
Tartificialntelligence_presentation.pptx

Falcon Storage Engine Designed For Speed Presentation

  • 1. Falcon - built for speed Ann Harrison Kevin Lewis MySQL Users' Conference April 2009
  • 2. If it's so fast, why isn't it done yet?
  • 3. Talk overview Falcon at a glance Project history Multi-threading for the database developer Cycle locking
  • 4. Falcon at a glance – read first record MySQL Record Cache Server Serial Page Cache Log Windows Serial Log Database Files Tablespaces
  • 5. Falcon at a glance – read complete MySQL Record Cache Server Serial Page Cache Log Windows Serial Log Database Files Tablespaces
  • 6. Falcon at a glance – read again MySQL Record Cache Server Serial Page Cache Log Windows Serial Log Database Files Tablespaces
  • 7. Falcon at a glance – write new record MySQL Record Cache Server Serial Page Cache Log Windows Serial Log Database Files Tablespaces
  • 8. Falcon at a glance – commit MySQL Record Cache Server Serial Page Cache Log Windows Serial Log Database Files Tablespaces
  • 9. Falcon at a glance – write complete MySQL Record Cache Server Serial Page Cache Log Windows Serial Log Database Files Tablespaces
  • 10. Falcon history Origin Transactional SQL Engine for Web App Environment Bought by MySQL in 2006 MVCC Consistent Read Verisons control write access Memory only – no steal Indexes and data separate Data encoded on disk and in memory Fine grained multi-threading
  • 11. Falcon Goals circa 2006 Exploit large memory for more than just a bigger cache Use threads and processors for data migration Eliminate tradeoffs, minimize tuning Scale gracefully to very heavy loads Support web applications
  • 12. Web application characteristics Large archive of data Smaller active set High read:write ratio Uneven, bursty activity
  • 13. What we did instead Enforce limit on record cache size Respond to simple atypical loads Autocommit single record access Repeat “insert ... select” Single pass read of large data set Challenge InnoDB on DBT2 Large working set Continuous heavy load Hired the world's most vicious test designer
  • 14. Record Cache Record Cache contains: Committed records with no versions
  • 15. Record Cache Record Cache contains: Committed records with no versions New, uncommitted records
  • 16. Record Cache Record Cache contains: Committed records with no versions New, uncommitted records Records with multiple versions
  • 17. Record Cache cleanup – step 1 Cleanup old committed single version records Scavenger Runs on schedule or demand Removes oldest mature records Settable limits – start and stop
  • 18. Record Cache Cleanup – step 2 Clean out record versions too old to be useful Prune Remove old, unneeded versions
  • 19. Record Cache Cleanup – step 3 Clean up a cache full of new records Chill Copy new record data to log Done by transaction thread Settable start size
  • 20. Record Cache Cleanup – step 4 Clean up multiple versions of a single record created by a single transaction Remove intermediate versions Created by a single transaction Rolled back to save point Repeated updates
  • 21. Record Cache Cleanup – step 5 Clean up records with multiple versions, still potentially visible Backlog Copy entire record tree to disk Expensive Not yet working
  • 22. Simple, atypical loads Challenge: Autocommit single record access Record cache is useless Record encoding is useless Transaction creation / destruction is too expensive Response: Reuse read only transactions Result: Multi-threaded bookkeeping nightmare
  • 23. Simple, atypical loads Challenge: Repeat “insert ... select...” Fill cache with old and new records
  • 24. Simple, atypical loads Challenge: Repeat “insert ... select...” Fill cache with old and new records First solution Scavenge old records Chill new record data
  • 25. Simple, atypical loads Challenge: Repeat “insert ... select...” Fill cache with old and new records First solution Scavenge old records Chill new records Second solution Move the records headers out Also helps index creation
  • 26. Simple, atypical loads Single pass read of large data set Read more records than Read them over and over Caches are useless Encoding is overhead Response: Make encoding optional?
  • 27. Challenge InnoDB on DBT2 Initial results were not encouraging (2007) 30000 25000 20000 Transactions Falcon2007 15000 InnoDB2007 10000 5000 0 10 20 50 100 150 200 Connections
  • 28. Challenge InnoDB on DBT2 But Falcon has improved a lot since April 2007 30000 25000 20000 Transactions Falcon2007 15000 InnoDB2007 Falcon2009 10000 5000 0 10 20 50 100 150 200 Connections
  • 29. Challenge InnoDB on DBT2 So did InnoDB 30000 25000 20000 Transactions Falcon2007 InnoDB2007 15000 Falcon2009 InnoDB2009 10000 5000 0 10 20 50 100 150 200 Connections
  • 31. Multi-threading Databases are a natural fit for multi-threading Connections Gophers Scavenger Disk reader/writer Except for shared structures Locking blocks parallel operations Challenge – sharing without locking
  • 32. Multi-threading Non-locking operation Purge old record versions
  • 33. Multi-threading Non-locking operation Purge old record versions
  • 34. Multi-threading Locking operation Remove intermediate versions
  • 35. Multi-threading Locking operation Remove intermediate versions What granularity of lock?
  • 36. Multi-threading – Lock granularity One per record: Too many interlocked instructions One per record group: Thread reading one record prevents scavenge of another No answer is right – more options?
  • 37. Cycle locking – read record chain Before starting to read a record chain, get a shared lock on a “cycle” Cycle 1 = 3 Cycle 2 shared inactive Transaction A Transaction B Transaction C
  • 38. Cycle locking – clean a record chain Before starting to read a record chain, get a shared lock on a “cycle” Cycle 1 = 4 Cycle 2 shared inactive Transaction A active in Cycle 1 Transaction B active in Cycle 1 Transaction C active in Cycle 1 Scavenger unlinks versions from record chain and links them to a “to be deleted” list.
  • 39. Cycle locking – records relinked Cycle 1 = 1 Cycle 2 shared inactive Transaction A releases lock Transaction B releases lock Transaction C still active Scavenger releases lock
  • 40. Cycle locking – swap cycles New access locks cycle 2 Cycle 1 = 1 Cycle 2 = 1 shared shared Transaction C holds Cycle 1 lock Cycle Manager requests exclusive on Cycle 1 (pumps cycle) Transaction A acquires Cycle 2 lock
  • 41. Cycle locking – cleanup phase Cycle 1 = 0 Cycle 2 = 2 shared shared exclusive Transaction C releases lock Transaction B acquires Cycle 2 lock Cycle manager exclusive Cycle 1
  • 42. Cycle locking – cleanup complete Cycle 1 Cycle 2 = 2 exclusive shared Transaction C acquires Cycle 2 lock Cycle manager exclusive Cycle 1 Remove unlinked, unloved, old versions When cleanup is done, Cycle manager releases cycle 1