SlideShare a Scribd company logo
Efficient Shared Data  in Perl Perrin Harkins
What’s your problem? Apache is multi-process Process assignment is random Information wants to be shared Inter-process data sharing is ad hoc
Sharing is good for Sessions Caching Usually transient data Otherwise, use a RDBMS
Approaches Files One big file One file per record DBM Shared memory Seems like the obvious choice, but… RDBMS
Playing well together Atomic updates Prevents corruption Exclusive Locking Prevents lost updates Without this, last save wins Perl Fund Blossom Buttercup $100 $105 $2100 $100
Cache::Cache Consistent interface to multiple storage methods File system Shared memory via IPC::ShareLite Many cache-related features built in Expiration times Size limit Multiple namespaces
Cache::Cache, continued Atomic updates Easy to install No compiler needed for file-based storage Benchmarks are on backend storage classes Cache::FileBackend not Cache::FileCache
Cache::Mmap Uses one big mmap’ed file Many tuning options Size of blocks Size of locking regions Optimization for scalar data Uses locks internally Requires compiler
MLDBM::Sync Extension of MLDBM Originally developed for Apache::ASP Uses lock file, tie/untie Choice of DBM types SDBM is fastest, but limited Tied interface Locks on entire database Explicit locking in API Can run with standard library
BerkeleyDB Not DB_File, BerkeleyDB.pm Requires Berkeley DB library from sleepycat.com Tricky to install on some systems Tied or OO interface No built-in support for complex data structures Locks on entire database or on pages Supports transactions Shared memory cache Tests are on BTree
IPC::MM Interface for Engelschall’s mm Implements shared BTree and Hash in C Tied interface Data is not persistent Only shares between related processes
Tie::TextDir Dirt-simple: one record per file Keys must be legal file names No compiler needed Doesn’t handle complex data structures
IPC::Shareable Very Perlish and transparent Shared memory Lots going on under the hood Explicit locking supported Tied interface Requires a compiler
DBD::SQLite Fast, single-file SQL engine in a DBD Full transaction support! Locking between processes at database level
DBD::MySQL Adds network capabilities Atomic updates or transactions More work than most to set up
memcached Networked daemon Intended for clusters Non-blocking I/O Clients for Perl, PHP, Java Requires a Linux kernel patch, until 2.6 is out
Testing Methodology P4 2.53 Ghz, 512MB RAM, Red Hat 9, ext3, Perl 5.8.0 Abstraction layer IPC::SharedHash Implements  new(),   fetch(), store() Handles serialization where necessary Calls  FETCH()  and  STORE()  instead of using tied interface mod_perl handler ab (Apache Bench)
Variables Number of parallel clients Percentage of writes Sessions can have a lot of writes Caches are mostly read, by definition Locality of access Scalars vs. complex data
Read-Only Sharing
Effect of Increasing Clients
Effect of Read/Write Ratio
Scalars vs. Complex Data Structures
Latest Results
Analysis Why is shared memory so slow? Still has to serialize Moving too much data at once What about IPC::MM? Moves one at a time Moving parts are in C Why is the file system so fast? Modern VM system Kernel-managed caching
Analysis Why is Tie::TextDir faster than Cache::FileBackend? Digest::SHA1 Splitting into multiple directories not normally necessary on modern filesystems:  /mu/lt/ip/ledirs
Problems with this test Size of values not considered Size of overall hash not considered correctly BerkeleyDB should be tested with fancier lock mode Needs a real network test for memchached and MySQL Should try harder to reduce margin of error
A Word About Clustering Shared filesystems NFS Samba/CIFS RDBMS Most reliable, well understood, easy integration Replicated data Multicast Spread
What about threads? Apache 2/mod_perl 2/Perl 5.8 bring threads to the table Still not clear how this will work with complex data structures and objects Threaded performance is mostly bad in 5.8
Questions to help you choose Do you need to store complex data? BerkeleyDB, Tie::TextDir, and IPC::MM require a wrapper for this Are your keys valid filenames? Tie::TextDir does not hash the keys Do you need persistence? IPC::MM is not persistent Do you need explicit locking? MLDBM::Sync, MySQL, BerkeleyDB
Questions to help you choose No compiler? Cache::FileBackend, Tie::TextDir, MLDBM::Sync if you have Storable Need clustering? DBD::MySQL, memcached

More Related Content

PDF
SeaweedFS introduction
PPT
Distributed file systems (from Google)
PDF
Bespoke service discovery with HAProxy and Marathon on Mesos
PDF
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
PDF
Architecture by Accident
PPTX
Redis by-hari
PPTX
Scalable Web Solutions - Use Case: Regulatory Reform In Vietnam On eZ Publish...
PDF
Hdfs internals
SeaweedFS introduction
Distributed file systems (from Google)
Bespoke service discovery with HAProxy and Marathon on Mesos
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
Architecture by Accident
Redis by-hari
Scalable Web Solutions - Use Case: Regulatory Reform In Vietnam On eZ Publish...
Hdfs internals

What's hot (20)

ODP
Experience In Building Scalable Web Sites Through Infrastructure's View
PPT
Session Handling Using Memcache
PDF
Redis vs Infinispan | DevNation Tech Talk
PDF
Ceph Day Beijing - Small Files & All Flash: Inspur's works on Ceph
PDF
MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine
PDF
Webinar slides: MySQL & MariaDB load balancing with ProxySQL & ClusterControl...
PPT
Gfs final
ODP
The rsyslog v8 engine (developer's view)
PPTX
redis basics
PPTX
State of the Container Ecosystem
PPS
Linux17 MySQL_installation
ODP
Caching Strategies
PPTX
HDFS Basics
PPTX
Redis database
KEY
Living with SQL and NoSQL at craigslist, a Pragmatic Approach
ODP
YDAL Barcelona
PDF
Web session replication with Hazelcast
PDF
A Technical Introduction to WiredTiger
PPTX
Lessons Learned Migrating 2+ Billion Documents at Craigslist
PDF
HBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at Huawei
Experience In Building Scalable Web Sites Through Infrastructure's View
Session Handling Using Memcache
Redis vs Infinispan | DevNation Tech Talk
Ceph Day Beijing - Small Files & All Flash: Inspur's works on Ceph
MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine
Webinar slides: MySQL & MariaDB load balancing with ProxySQL & ClusterControl...
Gfs final
The rsyslog v8 engine (developer's view)
redis basics
State of the Container Ecosystem
Linux17 MySQL_installation
Caching Strategies
HDFS Basics
Redis database
Living with SQL and NoSQL at craigslist, a Pragmatic Approach
YDAL Barcelona
Web session replication with Hazelcast
A Technical Introduction to WiredTiger
Lessons Learned Migrating 2+ Billion Documents at Craigslist
HBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at Huawei
Ad

Viewers also liked (7)

ODP
Database Programming with Perl and DBIx::Class
PDF
DBI Advanced Tutorial 2007
ODP
perl usage at database applications
ODP
Introducing Modern Perl
PPT
Working with databases in Perl
ODP
Perl Introduction
Database Programming with Perl and DBIx::Class
DBI Advanced Tutorial 2007
perl usage at database applications
Introducing Modern Perl
Working with databases in Perl
Perl Introduction
Ad

Similar to Efficient Shared Data in Perl (20)

PDF
2008 MySQL Conference Recap
PDF
The Native NDB Engine for Memcached
PDF
PDF
Redis — memcached on steroids
PDF
Caching for Cash: Caching
PDF
/* pOrt80BKK */ - PHP Day - PHP Performance with APC + Memcached for Windows
ODP
Vote NO for MySQL
KEY
Introduction to memcached
PDF
WebCamp: Developer Day: The Big, the Small and the Redis - Андрей Савченко
PDF
Big Data! Great! Now What? #SymfonyCon 2014
PDF
MySQL NoSQL APIs
PPTX
PHP Performance with APC + Memcached
PDF
Perly Parallel Processing of Fixed Width Data Records
KEY
CHI - YAPC NA 2012
ODP
Intro to The PHP SPL
PPT
Tokyocabinet
PDF
Caching objects-in-memory
PDF
Top 10 Perl Performance Tips
PDF
Give Your Site a Boost with Memcache
PPTX
No sql solutions - 공개용
2008 MySQL Conference Recap
The Native NDB Engine for Memcached
Redis — memcached on steroids
Caching for Cash: Caching
/* pOrt80BKK */ - PHP Day - PHP Performance with APC + Memcached for Windows
Vote NO for MySQL
Introduction to memcached
WebCamp: Developer Day: The Big, the Small and the Redis - Андрей Савченко
Big Data! Great! Now What? #SymfonyCon 2014
MySQL NoSQL APIs
PHP Performance with APC + Memcached
Perly Parallel Processing of Fixed Width Data Records
CHI - YAPC NA 2012
Intro to The PHP SPL
Tokyocabinet
Caching objects-in-memory
Top 10 Perl Performance Tips
Give Your Site a Boost with Memcache
No sql solutions - 공개용

More from Perrin Harkins (12)

PDF
PyGotham 2014 Introduction to Profiling
PDF
Introduction to performance tuning perl web applications
PDF
Care and feeding notes
PDF
Scalable talk notes
PDF
Low maintenance perl notes
ODP
Choosing a Web Architecture for Perl
PDF
Building Scalable Websites with Perl
PPT
Choosing a Templating System
PDF
Scaling Databases with DBIx::Router
PDF
Low-Maintenance Perl
PDF
Care and Feeding of Large Web Applications
PDF
The Most Common Template Toolkit Mistake
PyGotham 2014 Introduction to Profiling
Introduction to performance tuning perl web applications
Care and feeding notes
Scalable talk notes
Low maintenance perl notes
Choosing a Web Architecture for Perl
Building Scalable Websites with Perl
Choosing a Templating System
Scaling Databases with DBIx::Router
Low-Maintenance Perl
Care and Feeding of Large Web Applications
The Most Common Template Toolkit Mistake

Recently uploaded (20)

DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Advanced IT Governance
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Approach and Philosophy of On baking technology
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PDF
cuic standard and advanced reporting.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Cloud computing and distributed systems.
The AUB Centre for AI in Media Proposal.docx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Electronic commerce courselecture one. Pdf
Advanced IT Governance
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Approach and Philosophy of On baking technology
Reach Out and Touch Someone: Haptics and Empathic Computing
Per capita expenditure prediction using model stacking based on satellite ima...
NewMind AI Weekly Chronicles - August'25 Week I
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
NewMind AI Monthly Chronicles - July 2025
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Big Data Technologies - Introduction.pptx
Spectral efficient network and resource selection model in 5G networks
20250228 LYD VKU AI Blended-Learning.pptx
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
cuic standard and advanced reporting.pdf
MYSQL Presentation for SQL database connectivity
Cloud computing and distributed systems.

Efficient Shared Data in Perl

  • 1. Efficient Shared Data in Perl Perrin Harkins
  • 2. What’s your problem? Apache is multi-process Process assignment is random Information wants to be shared Inter-process data sharing is ad hoc
  • 3. Sharing is good for Sessions Caching Usually transient data Otherwise, use a RDBMS
  • 4. Approaches Files One big file One file per record DBM Shared memory Seems like the obvious choice, but… RDBMS
  • 5. Playing well together Atomic updates Prevents corruption Exclusive Locking Prevents lost updates Without this, last save wins Perl Fund Blossom Buttercup $100 $105 $2100 $100
  • 6. Cache::Cache Consistent interface to multiple storage methods File system Shared memory via IPC::ShareLite Many cache-related features built in Expiration times Size limit Multiple namespaces
  • 7. Cache::Cache, continued Atomic updates Easy to install No compiler needed for file-based storage Benchmarks are on backend storage classes Cache::FileBackend not Cache::FileCache
  • 8. Cache::Mmap Uses one big mmap’ed file Many tuning options Size of blocks Size of locking regions Optimization for scalar data Uses locks internally Requires compiler
  • 9. MLDBM::Sync Extension of MLDBM Originally developed for Apache::ASP Uses lock file, tie/untie Choice of DBM types SDBM is fastest, but limited Tied interface Locks on entire database Explicit locking in API Can run with standard library
  • 10. BerkeleyDB Not DB_File, BerkeleyDB.pm Requires Berkeley DB library from sleepycat.com Tricky to install on some systems Tied or OO interface No built-in support for complex data structures Locks on entire database or on pages Supports transactions Shared memory cache Tests are on BTree
  • 11. IPC::MM Interface for Engelschall’s mm Implements shared BTree and Hash in C Tied interface Data is not persistent Only shares between related processes
  • 12. Tie::TextDir Dirt-simple: one record per file Keys must be legal file names No compiler needed Doesn’t handle complex data structures
  • 13. IPC::Shareable Very Perlish and transparent Shared memory Lots going on under the hood Explicit locking supported Tied interface Requires a compiler
  • 14. DBD::SQLite Fast, single-file SQL engine in a DBD Full transaction support! Locking between processes at database level
  • 15. DBD::MySQL Adds network capabilities Atomic updates or transactions More work than most to set up
  • 16. memcached Networked daemon Intended for clusters Non-blocking I/O Clients for Perl, PHP, Java Requires a Linux kernel patch, until 2.6 is out
  • 17. Testing Methodology P4 2.53 Ghz, 512MB RAM, Red Hat 9, ext3, Perl 5.8.0 Abstraction layer IPC::SharedHash Implements new(), fetch(), store() Handles serialization where necessary Calls FETCH() and STORE() instead of using tied interface mod_perl handler ab (Apache Bench)
  • 18. Variables Number of parallel clients Percentage of writes Sessions can have a lot of writes Caches are mostly read, by definition Locality of access Scalars vs. complex data
  • 22. Scalars vs. Complex Data Structures
  • 24. Analysis Why is shared memory so slow? Still has to serialize Moving too much data at once What about IPC::MM? Moves one at a time Moving parts are in C Why is the file system so fast? Modern VM system Kernel-managed caching
  • 25. Analysis Why is Tie::TextDir faster than Cache::FileBackend? Digest::SHA1 Splitting into multiple directories not normally necessary on modern filesystems: /mu/lt/ip/ledirs
  • 26. Problems with this test Size of values not considered Size of overall hash not considered correctly BerkeleyDB should be tested with fancier lock mode Needs a real network test for memchached and MySQL Should try harder to reduce margin of error
  • 27. A Word About Clustering Shared filesystems NFS Samba/CIFS RDBMS Most reliable, well understood, easy integration Replicated data Multicast Spread
  • 28. What about threads? Apache 2/mod_perl 2/Perl 5.8 bring threads to the table Still not clear how this will work with complex data structures and objects Threaded performance is mostly bad in 5.8
  • 29. Questions to help you choose Do you need to store complex data? BerkeleyDB, Tie::TextDir, and IPC::MM require a wrapper for this Are your keys valid filenames? Tie::TextDir does not hash the keys Do you need persistence? IPC::MM is not persistent Do you need explicit locking? MLDBM::Sync, MySQL, BerkeleyDB
  • 30. Questions to help you choose No compiler? Cache::FileBackend, Tie::TextDir, MLDBM::Sync if you have Storable Need clustering? DBD::MySQL, memcached

Editor's Notes

  • #20: 10 processes, low locality, scalars
  • #22: 10 processes, high-locality, scalars