SlideShare a Scribd company logo
Millions Quotes Per Second.
       A story of pure Java
       market data vendor

© 2013, Roman Elizarov, Devexperts
Market Data Rates
                      10000 000



                       9000 000



                       8000 000



                       7000 000
messages per second




                       6000 000



                       5000 000



                       4000 000



                       3000 000



                       2000 000



                       1000 000



                             0
                                  Основной   Основной      Основной       Основной   Основной   Основной   Основной   Основной



                                               US Equities, Indexes and Futures                   OPRA
Market Data Vendor

• Process data coming from exchange data feeds
   - Parse
   - Normalize
• Distribute data to customers
   - Gather into a single feed
   - Store and retrieve (for onDemand historical requests)
   - Serialize and transfer
   - Scatter to multiple consumers based on actual subscription
dxFeed High Level Picture

                                   CME, CBOT, NYMEX, COMEX,
                                ICE Futures U.S., CBOE, TSX, TSXV,
                                                MX




         Chicago ticker plant

                                                  10Gbit
                                    resilient redundant connectivity
                                              infrastructure
                                                                          NYSE, AMEX,
                                                                            NASDAQ,
                                                                           ISE, OPRA,
                                                                       FINRA, PinkSheets




                                               New York ticker plant


                        Direct cross-connect
                                                                                       Customer connection point
                                 SFTI
                                 TNS
                               SAVVIS
                             BT Radianz
                               Internet
A Bit of History

• Devexperts was founded in 2002
   - as an Upscale Financial IT company
• QDS project was born in 2003
   - to address market data distribution problem
   - in a high performance-way (initial design goal was 1M mps)
• dxFeed service was launched in 2008
   - to provide our customers with live market data directly from
     exchanges, using QDS for distribution
• dxFeed API was created on top of QDS in 2009
   - to provide an easier customer-facing API and enable 3rd party
     developers to integrate their code with dxFeed
Threads                                       Portability
                                Community                        Developers
  Garbage Collection

                                                     Libraries and frameworks
      Backwards-compatibility

Refactoring                                                     Type Safety


      Open source
                                                              Memory model
   Reflection
                    Productivity             Tools
                                                                Readability
  HotSpot JIT

                                                            Byte-code manipulation
Simplicity       The most popular language
* Applies to any language
Java object layout
          String[]            • String[] that is filled with
                                some strings in Java
          header


           size      String
            [0]
                     header
            [1]                      char[]
            [2]      value
                                    header
            [3]      hash

            ...      String          size

                                      „T‟
                     header
                                      „E‟

                     value            „S‟

                     hash             „T‟


                       ...             ...
Millions quotes per second in pure java
Memory layout solution

• Prefer array-based data-structures to linked ones
   - Most Java programs get immediate performance boost by replacing all
     mentions of LinkedList by ArrayList
• Use Java arrays or ByteBuffer classes where it matters
   - They are guaranteed to be contiguous in memory
   - Layout your data into array manually
• That‟s how QDS core is designed
   - All it critical data structures are rolled onto int[] and Object[]
byte[] vs ByteBuffer

• byte[] is always heap-based
   - Faster for byte-oriented access
• ByteBuffer can be both “heap” and “direct”
   - Be especially careful with direct ByteBuffers
   - If you don‟t Pool them, you may run out of native memory before Java
     GC has a chance to run
   - Can be faster for short-, int- or long- oriented access via get/putXXX
     methods
      • But make sure you use native byte order (BIG_ENDIAN is default)
   - Direct ByteBuffers don‟t need an extra buffer copy when doing
     input/output with NIO
Measure, measure, measure
The cost of later change is too high
Garbage collection

• Makes your code much easier
   - to design
   - to debug
   - to maintain
• GC performs really well when
   - Objects are very short-lived
      • They are not promoted to old gen
      • They are reclaimed by high-throughput scavenge GC
   - Object are very long-lived and are not modified or contain primitives
      • Scavenge GC does not waste time scanning them
Object allocation

• Allocation of small objects is fast
   - new String() is ~20 bytes on 64bit VM with compressed oops
      • not counting char[] object inside of it
   - ~4.5ns per allocation (on 2.6GHz i5)
• But becomes slower when you include amortized GC cost
• And can become much slower if you
   - have big static memory footprint
   - have “medium-lived” objects
   - have lots of threads (and thus a lot of GC roots and coordination)
   - use references (java.lang.ref) a lot
   - mutate your memory a lot, especially references (GC card marking)
Manual memory management

• When you would consider manual memory management in native
  code (custom object pools), consider doing the same in Java
• General advise
   - Pool large objects
      • They are expensive to be allocated and to be collected by GC
   - Avoid small objects
      • Especially “medium-lived” ones
      • Layout them into arrays if you need store them
Millions quotes per second in pure java
Object allocation action plan (1)

• Watch the percentage of time your system spends doing GC
   - -verbose:gc
     -XX:+PrintGCDetails
     -XX:+PrintGCTimeStamps
   - “jconsole” and “jvisualvm” tools show this information
   - It is available programmatically via GarbageCollectorMXBean
       • At Devexperts we collect it and report (push) in real-time via
          MARS (Monitoring and Reporting System) using a dedicated
          JVMSelfMonitoring plugin
       • Our support team have alerts configured on high GC % in our
          systems
• Act when it becomes too big
Object allocation action plan (2)

• Tune GC to reduce overhead without code changes
• Identify places when most of allocations take places and optimize
  them
   - Use off-the-shelf Java profilers
   - Use Devexperts aprof for a full allocation picture at production speed
     http://code.devexperts.com/display/AProf/
Object reuse and sharing

• Pooling small objects in often a bad idea
   - Unless you are trying to quickly speed up code that heavily relies on
     lots of small objects
   - It‟s better to get rid of small objects altogether
        • See boxing in performance critical code  get rid of it
• But reusing / sharing small objects is great
   - Strings are typical candidate for data-processing code
• Common pitfalls (don‟t do it, unless you fully understand it)
   - String.intern
   - WeakReference
Actually, by their char arrays
String I/O

• String are often duplicated in memory
• Reading any string-denoted data from database, from file, from
  network – all produces new strings
• Where performance matters, reuse strings
   - For example see StringCache class from
     http://docs.dxfeed.com/dxlib/api/com/devexperts/util/StringCache.html
   - The key method is get(char[])
      • You can reuse char[] where data is read
      • And get an instance of String from cache if it is there
Radical object / reference elimination

• Unroll complex objects into arrays
   - For example, a collection of strings can be represented in a single
     byte[]
• Renumber shared object instances
   - Represent string reference as int
   - That‟s what QDS core does for efficient String manipulation
      • Faster to compare
      • Faster to hash
      • Avoids slower “modify reference” operations (marks GC cards)
   - But requires hand-crafted memory management
      • QDS does reference counting, but custom GC is also feasible
Hardcore optimization

• Use sun.misc.Unsafe when everything else fails
   - It gives you full native speed
   - But no range checks nor type-safety
      • You are on your own!
   - Good fit for integration with native data structures when needed
• QDS core uses it in few places
   - Mainly to provide wait-free execution guarantees with an appropriate
     synchronization for array-based data structures
   - But there is a fallback code for cases when sun.misc.Unsafe is not
     available
Even more hardcore – hand-written SMT

• If you have to use linked data structures
   - Consider traversing multiple linked lists simultaneously in the same
     thread
   - Akin to hardware SMT, but in software
   - The code becomes much more complicated
   - But the performance can considerably increase




                     * Not a Java-specific optimization, but fun to mention here
Millions quotes per second in pure java
Threads and scalability

• Share data across the threads to further reduce memory footprint
   - But carefully design and implement this sharing
• Learn and love Java Memory Model
   - It makes your correctly-synchronized multi-threaded code fully
     portable across CPU architectures
• QDS core is a thread-safe data structure with a mix of lock-
  free, fine-grained and coarse-grained locking approaches which
  makes it vertically scalable
Be careful with threads and locks

• Thread switches introduce a
  considerable latency (~20us)   1. Enter Lock



• Lock contention forces even                    2. Context Switch

  more thread switches                                               3. Try to lock

• It is not a Java-specific                      4. Context Switch

                                  5. Exit Lock
  concern, but a common Java-                                        6. Context switch

  specific problem, since Java                                        and enter lock


  makes threads easier for
  programmers to use (and many
  do use them)
Millions quotes per second in pure java
Data flow for horizontal scalability

                                   Subscribes:
                             IBM, GE. QQQQ, MSFT,
                                   INTC, SPX

                                                 IBM, GE ticks

                             Multiplexor



                                           QDTicker




                                                          GE ticks
                         IBM, GE ticks

                               Subscibes:              Subscibes:
                         IBM, GE, QQQQ, MSFT          GE, INTC, SPX




                     QDTicker                                     QDTicker


                   IBM
                                 GE                         GE                SPX
                  MSFT
                                IBM                        INTC              INTC
                  QQQQ
Millions quotes per second in pure java
HotSpot Server VM

• Run “java -server” (it is a default on server-class machines)
• Does
   - Very deep code inlining
   - Loop unrolling
   - Optimize virtual and interface calls based on collected profile
   - Escape analysis for synchronization and allocation elimination
• Embrace it!
   - Don‟t fear writing your code in a nice object-oriented way
      • In most of cases, that is
      • Do still avoid too much “object orientation” in the most
        performance-sensitive places
HotSpot challenges

• It is harder to profile, stress-test, and tune code
   - You need to “warm up” the code to get meaningful result
   - Small changes in code can lead to big differences that are hard to
     explain
   - Compilation of less busy code can trigger at any time and cause
     unexpected latency spikes
• Don‟t do micro-tests
   - Test the whole system together instead
• Do micro-tests
   - To learn which code patters are better across the board
   - Small savings add up
Millions quotes per second in pure java
Looking at generated assembly code

• -XX:+UnlockDiagnosticVMOptions
  -XX:CompileCommand=print,*<class-name>.<method-name>
  -XX:PrintAssemblyOptions=intel
• You will need “hsdis” library added to your JRE/JDK with the actual
  disassembler code
   - But you have to build it yourself:
     http://hg.openjdk.java.net/jdk7/hotspot/hotspot/file/tip/src/share/tools/hsdis/README
Use native profilers

• Java profiles are great tools, but they don‟t use processor
  performance counters and lack the ability to recognize such
  problems like memory pressure
   - And they don‟t always produce a clear picture
   - All “cpu time” is reported at the nearest “safe point”, not at the actual
     code line that consumed CPU
• Use native profilers to figure it out
   - Sun Studio Performance Analyzer
   - Intel VTune Amplifier
   - AMD CodeAnalyst
Millions quotes per second in pure java
General (1)

• Classic data structures and algorithms
   - Use CPU and memory efficient data structures and algorithms
   - Know and love hash tables
      • They are the most useful data structure in a typical business
        application
• Lock-free data structures will help you to scale vertically
• Every byte counts. Remember about bytes.
   - QDS core compactly represents data as 4-byte integers while working
     with them in memory
   - QDS uses compact byte-level compression on the wire
   - Even more compact bit-level compression is used in long-term store
General (2)

• Burst handling
   - Process data in batches to amortize batch overhead across messages
   - QDS increases batch size under load to decrease overhead
• Architecture
   - Use layers
   - Lower layers of architectures should generally be used in more places
     and be more optimized
   - The outer layer, dxFeed API, is the easies one to use and understand
     and most object-oriented, but less optimized
Architecture layers


        JS API

          dxFeed API           Tools      Gateways

                       QDS Core

                  Transport Protocol

                                   ZLIB        SSL

        Sockets          NIO              Files, etc
Millions quotes per second in pure java
QDS API (1)
print quote bid/ask on the screen
QDS API (2)
QDS API Summary

• Pros
   - High-performance design
   - Flexible (can be used in various ways)
      • QDS Multiplexor is an application on top of QDS API
      • As well as all other command-line QDS tools
   - Extensible with clear separation of interfaces and implementation
• Cons
   - Verbose, lots of code to do simple things
   - Error-prone (easy to get wrong and to introduce subtle bugs)
• Everybody needs Quote, Trade, etc with easy-to-use API
   - Hence, dxFeed API was born
dxFeed API
print quote bid/ask on the screen
Contact me by email: elizarov at devexperts.com

More Related Content

PPT
Disaster Recovery & Data Backup Strategies
PPT
Presentation on backup and recoveryyyyyyyyyyyyy
PPT
03 backup-and-recovery
PDF
Introduction Data Compression/ Data compression, modelling and coding,Image C...
PDF
Data Warehouse Design and Best Practices
PPT
Backup strategy
PPTX
RocksDB compaction
PDF
HBase: How to get MTTR below 1 minute
Disaster Recovery & Data Backup Strategies
Presentation on backup and recoveryyyyyyyyyyyyy
03 backup-and-recovery
Introduction Data Compression/ Data compression, modelling and coding,Image C...
Data Warehouse Design and Best Practices
Backup strategy
RocksDB compaction
HBase: How to get MTTR below 1 minute

What's hot (20)

PPTX
Column Oriented Databases
PPT
ch-10.ppt
PPTX
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
PDF
Restoring Restoration's Reputation in Kafka Streams with Bruno Cadonna & Luca...
PDF
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
PDF
Ekon20 mORMot Legacy Code Technical Debt Delphi Conference
PPT
Backup And Recovery
PDF
Monitoring Apache Kafka with Confluent Control Center
PDF
Object Storage 1: The Fundamentals of Objects and Object Storage
PPTX
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
PDF
Storing time series data with Apache Cassandra
PDF
Building Event Driven (Micro)services with Apache Kafka
PPTX
Building Reliable Lakehouses with Apache Flink and Delta Lake
PDF
Multi-Tenant HBase Cluster - HBaseCon2018-final
PPTX
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
PDF
MongoDB WiredTiger Internals
PPTX
Data backup and disaster recovery
PPTX
Asynchronous processing in big system
PDF
Greenplum Database Overview
 
PDF
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Column Oriented Databases
ch-10.ppt
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Restoring Restoration's Reputation in Kafka Streams with Bruno Cadonna & Luca...
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
Ekon20 mORMot Legacy Code Technical Debt Delphi Conference
Backup And Recovery
Monitoring Apache Kafka with Confluent Control Center
Object Storage 1: The Fundamentals of Objects and Object Storage
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Storing time series data with Apache Cassandra
Building Event Driven (Micro)services with Apache Kafka
Building Reliable Lakehouses with Apache Flink and Delta Lake
Multi-Tenant HBase Cluster - HBaseCon2018-final
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
MongoDB WiredTiger Internals
Data backup and disaster recovery
Asynchronous processing in big system
Greenplum Database Overview
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Ad

Similar to Millions quotes per second in pure java (20)

PDF
Challenges in Maintaining a High Performance Search Engine Written in Java
PPT
Best Practices for performance evaluation and diagnosis of Java Applications ...
PDF
What Your Jvm Has Been Trying To Tell You
KEY
Everything I Ever Learned About JVM Performance Tuning @Twitter
PDF
System Integration
PDF
Java Performance Tuning
PPT
GC free coding in @Java presented @Geecon
PDF
JVM Multitenancy (JavaOne 2012)
PDF
JavaOne summary
PPT
10 interesting things about java
PPTX
Simple insites into JVM
PDF
Java Performance
PDF
City search documentation
PDF
JavaOne 2013: Effective Foreign Function Interfaces: From JNI to JNR
PDF
IBM Java PackedObjects
PPTX
Serialization and performance by Sergey Morenets
DOC
City search documentation
PDF
Java Performance and Profiling
PDF
Torus brochure financial services
PDF
Torus brochure financial services
Challenges in Maintaining a High Performance Search Engine Written in Java
Best Practices for performance evaluation and diagnosis of Java Applications ...
What Your Jvm Has Been Trying To Tell You
Everything I Ever Learned About JVM Performance Tuning @Twitter
System Integration
Java Performance Tuning
GC free coding in @Java presented @Geecon
JVM Multitenancy (JavaOne 2012)
JavaOne summary
10 interesting things about java
Simple insites into JVM
Java Performance
City search documentation
JavaOne 2013: Effective Foreign Function Interfaces: From JNI to JNR
IBM Java PackedObjects
Serialization and performance by Sergey Morenets
City search documentation
Java Performance and Profiling
Torus brochure financial services
Torus brochure financial services
Ad

More from Roman Elizarov (20)

PDF
Kotlin Coroutines in Practice @ KotlinConf 2018
PDF
Deep dive into Coroutines on JVM @ KotlinConf 2017
PDF
Introduction to Coroutines @ KotlinConf 2017
PDF
Fresh Async with Kotlin @ QConSF 2017
PDF
Scale Up with Lock-Free Algorithms @ JavaOne
PDF
Kotlin Coroutines Reloaded
PDF
Lock-free algorithms for Kotlin Coroutines
PDF
Introduction to Kotlin coroutines
PPTX
Non blocking programming and waiting
PDF
ACM ICPC 2016 NEERC (Northeastern European Regional Contest) Problems Review
PDF
Многопоточное Программирование - Теория и Практика
PDF
Wait for your fortune without Blocking!
PDF
ACM ICPC 2015 NEERC (Northeastern European Regional Contest) Problems Review
PDF
ACM ICPC 2014 NEERC (Northeastern European Regional Contest) Problems Review
PDF
Why GC is eating all my CPU?
PDF
Многопоточные Алгоритмы (для BitByte 2014)
PDF
Теоретический минимум для понимания Java Memory Model (для JPoint 2014)
PPTX
DIY Java Profiling
PDF
ACM ICPC 2013 NEERC (Northeastern European Regional Contest) Problems Review
PPTX
Java Serialization Facts and Fallacies
Kotlin Coroutines in Practice @ KotlinConf 2018
Deep dive into Coroutines on JVM @ KotlinConf 2017
Introduction to Coroutines @ KotlinConf 2017
Fresh Async with Kotlin @ QConSF 2017
Scale Up with Lock-Free Algorithms @ JavaOne
Kotlin Coroutines Reloaded
Lock-free algorithms for Kotlin Coroutines
Introduction to Kotlin coroutines
Non blocking programming and waiting
ACM ICPC 2016 NEERC (Northeastern European Regional Contest) Problems Review
Многопоточное Программирование - Теория и Практика
Wait for your fortune without Blocking!
ACM ICPC 2015 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2014 NEERC (Northeastern European Regional Contest) Problems Review
Why GC is eating all my CPU?
Многопоточные Алгоритмы (для BitByte 2014)
Теоретический минимум для понимания Java Memory Model (для JPoint 2014)
DIY Java Profiling
ACM ICPC 2013 NEERC (Northeastern European Regional Contest) Problems Review
Java Serialization Facts and Fallacies

Millions quotes per second in pure java

  • 1. Millions Quotes Per Second. A story of pure Java market data vendor © 2013, Roman Elizarov, Devexperts
  • 2. Market Data Rates 10000 000 9000 000 8000 000 7000 000 messages per second 6000 000 5000 000 4000 000 3000 000 2000 000 1000 000 0 Основной Основной Основной Основной Основной Основной Основной Основной US Equities, Indexes and Futures OPRA
  • 3. Market Data Vendor • Process data coming from exchange data feeds - Parse - Normalize • Distribute data to customers - Gather into a single feed - Store and retrieve (for onDemand historical requests) - Serialize and transfer - Scatter to multiple consumers based on actual subscription
  • 4. dxFeed High Level Picture CME, CBOT, NYMEX, COMEX, ICE Futures U.S., CBOE, TSX, TSXV, MX Chicago ticker plant 10Gbit resilient redundant connectivity infrastructure NYSE, AMEX, NASDAQ, ISE, OPRA, FINRA, PinkSheets New York ticker plant Direct cross-connect Customer connection point SFTI TNS SAVVIS BT Radianz Internet
  • 5. A Bit of History • Devexperts was founded in 2002 - as an Upscale Financial IT company • QDS project was born in 2003 - to address market data distribution problem - in a high performance-way (initial design goal was 1M mps) • dxFeed service was launched in 2008 - to provide our customers with live market data directly from exchanges, using QDS for distribution • dxFeed API was created on top of QDS in 2009 - to provide an easier customer-facing API and enable 3rd party developers to integrate their code with dxFeed
  • 6. Threads Portability Community Developers Garbage Collection Libraries and frameworks Backwards-compatibility Refactoring Type Safety Open source Memory model Reflection Productivity Tools Readability HotSpot JIT Byte-code manipulation Simplicity The most popular language
  • 7. * Applies to any language
  • 8. Java object layout String[] • String[] that is filled with some strings in Java header size String [0] header [1] char[] [2] value header [3] hash ... String size „T‟ header „E‟ value „S‟ hash „T‟ ... ...
  • 10. Memory layout solution • Prefer array-based data-structures to linked ones - Most Java programs get immediate performance boost by replacing all mentions of LinkedList by ArrayList • Use Java arrays or ByteBuffer classes where it matters - They are guaranteed to be contiguous in memory - Layout your data into array manually • That‟s how QDS core is designed - All it critical data structures are rolled onto int[] and Object[]
  • 11. byte[] vs ByteBuffer • byte[] is always heap-based - Faster for byte-oriented access • ByteBuffer can be both “heap” and “direct” - Be especially careful with direct ByteBuffers - If you don‟t Pool them, you may run out of native memory before Java GC has a chance to run - Can be faster for short-, int- or long- oriented access via get/putXXX methods • But make sure you use native byte order (BIG_ENDIAN is default) - Direct ByteBuffers don‟t need an extra buffer copy when doing input/output with NIO
  • 13. The cost of later change is too high
  • 14. Garbage collection • Makes your code much easier - to design - to debug - to maintain • GC performs really well when - Objects are very short-lived • They are not promoted to old gen • They are reclaimed by high-throughput scavenge GC - Object are very long-lived and are not modified or contain primitives • Scavenge GC does not waste time scanning them
  • 15. Object allocation • Allocation of small objects is fast - new String() is ~20 bytes on 64bit VM with compressed oops • not counting char[] object inside of it - ~4.5ns per allocation (on 2.6GHz i5) • But becomes slower when you include amortized GC cost • And can become much slower if you - have big static memory footprint - have “medium-lived” objects - have lots of threads (and thus a lot of GC roots and coordination) - use references (java.lang.ref) a lot - mutate your memory a lot, especially references (GC card marking)
  • 16. Manual memory management • When you would consider manual memory management in native code (custom object pools), consider doing the same in Java • General advise - Pool large objects • They are expensive to be allocated and to be collected by GC - Avoid small objects • Especially “medium-lived” ones • Layout them into arrays if you need store them
  • 18. Object allocation action plan (1) • Watch the percentage of time your system spends doing GC - -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps - “jconsole” and “jvisualvm” tools show this information - It is available programmatically via GarbageCollectorMXBean • At Devexperts we collect it and report (push) in real-time via MARS (Monitoring and Reporting System) using a dedicated JVMSelfMonitoring plugin • Our support team have alerts configured on high GC % in our systems • Act when it becomes too big
  • 19. Object allocation action plan (2) • Tune GC to reduce overhead without code changes • Identify places when most of allocations take places and optimize them - Use off-the-shelf Java profilers - Use Devexperts aprof for a full allocation picture at production speed http://code.devexperts.com/display/AProf/
  • 20. Object reuse and sharing • Pooling small objects in often a bad idea - Unless you are trying to quickly speed up code that heavily relies on lots of small objects - It‟s better to get rid of small objects altogether • See boxing in performance critical code  get rid of it • But reusing / sharing small objects is great - Strings are typical candidate for data-processing code • Common pitfalls (don‟t do it, unless you fully understand it) - String.intern - WeakReference
  • 21. Actually, by their char arrays
  • 22. String I/O • String are often duplicated in memory • Reading any string-denoted data from database, from file, from network – all produces new strings • Where performance matters, reuse strings - For example see StringCache class from http://docs.dxfeed.com/dxlib/api/com/devexperts/util/StringCache.html - The key method is get(char[]) • You can reuse char[] where data is read • And get an instance of String from cache if it is there
  • 23. Radical object / reference elimination • Unroll complex objects into arrays - For example, a collection of strings can be represented in a single byte[] • Renumber shared object instances - Represent string reference as int - That‟s what QDS core does for efficient String manipulation • Faster to compare • Faster to hash • Avoids slower “modify reference” operations (marks GC cards) - But requires hand-crafted memory management • QDS does reference counting, but custom GC is also feasible
  • 24. Hardcore optimization • Use sun.misc.Unsafe when everything else fails - It gives you full native speed - But no range checks nor type-safety • You are on your own! - Good fit for integration with native data structures when needed • QDS core uses it in few places - Mainly to provide wait-free execution guarantees with an appropriate synchronization for array-based data structures - But there is a fallback code for cases when sun.misc.Unsafe is not available
  • 25. Even more hardcore – hand-written SMT • If you have to use linked data structures - Consider traversing multiple linked lists simultaneously in the same thread - Akin to hardware SMT, but in software - The code becomes much more complicated - But the performance can considerably increase * Not a Java-specific optimization, but fun to mention here
  • 27. Threads and scalability • Share data across the threads to further reduce memory footprint - But carefully design and implement this sharing • Learn and love Java Memory Model - It makes your correctly-synchronized multi-threaded code fully portable across CPU architectures • QDS core is a thread-safe data structure with a mix of lock- free, fine-grained and coarse-grained locking approaches which makes it vertically scalable
  • 28. Be careful with threads and locks • Thread switches introduce a considerable latency (~20us) 1. Enter Lock • Lock contention forces even 2. Context Switch more thread switches 3. Try to lock • It is not a Java-specific 4. Context Switch 5. Exit Lock concern, but a common Java- 6. Context switch specific problem, since Java and enter lock makes threads easier for programmers to use (and many do use them)
  • 30. Data flow for horizontal scalability Subscribes: IBM, GE. QQQQ, MSFT, INTC, SPX IBM, GE ticks Multiplexor QDTicker GE ticks IBM, GE ticks Subscibes: Subscibes: IBM, GE, QQQQ, MSFT GE, INTC, SPX QDTicker QDTicker IBM GE GE SPX MSFT IBM INTC INTC QQQQ
  • 32. HotSpot Server VM • Run “java -server” (it is a default on server-class machines) • Does - Very deep code inlining - Loop unrolling - Optimize virtual and interface calls based on collected profile - Escape analysis for synchronization and allocation elimination • Embrace it! - Don‟t fear writing your code in a nice object-oriented way • In most of cases, that is • Do still avoid too much “object orientation” in the most performance-sensitive places
  • 33. HotSpot challenges • It is harder to profile, stress-test, and tune code - You need to “warm up” the code to get meaningful result - Small changes in code can lead to big differences that are hard to explain - Compilation of less busy code can trigger at any time and cause unexpected latency spikes • Don‟t do micro-tests - Test the whole system together instead • Do micro-tests - To learn which code patters are better across the board - Small savings add up
  • 35. Looking at generated assembly code • -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=print,*<class-name>.<method-name> -XX:PrintAssemblyOptions=intel • You will need “hsdis” library added to your JRE/JDK with the actual disassembler code - But you have to build it yourself: http://hg.openjdk.java.net/jdk7/hotspot/hotspot/file/tip/src/share/tools/hsdis/README
  • 36. Use native profilers • Java profiles are great tools, but they don‟t use processor performance counters and lack the ability to recognize such problems like memory pressure - And they don‟t always produce a clear picture - All “cpu time” is reported at the nearest “safe point”, not at the actual code line that consumed CPU • Use native profilers to figure it out - Sun Studio Performance Analyzer - Intel VTune Amplifier - AMD CodeAnalyst
  • 38. General (1) • Classic data structures and algorithms - Use CPU and memory efficient data structures and algorithms - Know and love hash tables • They are the most useful data structure in a typical business application • Lock-free data structures will help you to scale vertically • Every byte counts. Remember about bytes. - QDS core compactly represents data as 4-byte integers while working with them in memory - QDS uses compact byte-level compression on the wire - Even more compact bit-level compression is used in long-term store
  • 39. General (2) • Burst handling - Process data in batches to amortize batch overhead across messages - QDS increases batch size under load to decrease overhead • Architecture - Use layers - Lower layers of architectures should generally be used in more places and be more optimized - The outer layer, dxFeed API, is the easies one to use and understand and most object-oriented, but less optimized
  • 40. Architecture layers JS API dxFeed API Tools Gateways QDS Core Transport Protocol ZLIB SSL Sockets NIO Files, etc
  • 42. QDS API (1) print quote bid/ask on the screen
  • 44. QDS API Summary • Pros - High-performance design - Flexible (can be used in various ways) • QDS Multiplexor is an application on top of QDS API • As well as all other command-line QDS tools - Extensible with clear separation of interfaces and implementation • Cons - Verbose, lots of code to do simple things - Error-prone (easy to get wrong and to introduce subtle bugs) • Everybody needs Quote, Trade, etc with easy-to-use API - Hence, dxFeed API was born
  • 45. dxFeed API print quote bid/ask on the screen
  • 46. Contact me by email: elizarov at devexperts.com