SlideShare a Scribd company logo
Optimizing your Java Applications for multi-core hardware  Prashanth K Nageshappa [email_address] Java Technologies IBM
Agenda Evolution of Processor Architecture Why should I care? Think about Scalability How to exploit Parallelism in Java JVM optimizations for multi-core scalability
As The World Gets Smarter, Demands On IT Will Grow Smart energy grids Smart healthcare Smart food systems  Intelligent  oil field technologies  Smart supply chains  Smart retail IT infrastructure must grow to meet these demands global scope, processing scale, efficiency Digital data is projected to grow tenfold from 2007 to 2011. Devices will be connected to the internet by 2011 1 Trillion Global trading systems are under extreme stress, handling billions of market data messages each day 25 Billion 70% on average is spent on maintaining current IT infrastructure versus adding new capabilities 10x
Hardware Trends Increasing transistor density Clock Speed leveling off More number of cores Non-Uniform Memory Access Main memory getting larger
In 2010 POWER Systems Brings Massive Parallelism 2001 180 nm 2004 130 nm 2007 65 nm 2010 45 nm POWER7™ 4 threads/core 8 cores/chip 32 sockets/server 1024 threads POWER6™ 2 threads/core 2 cores/chip 32 sockets/server 128 threads POWER5™ 2 threads/core 2 cores/chip 32 sockets/server 128 threads POWER4™ 1 thread/core 2 cores/chip 16 sockets/server 32 threads Threads
Agenda Evolution of Processor Architecture Why should I care? Think about Scalability How to exploit Parallelism in Java JVM optimizations for multi-core scalability
Why should I care? Your application may be re-used Better performance Better leverage additional resources Cores, hardware threads, memory etc
Think about scalability Serial bottlenecks inhibit scalability Organize your application into parallel tasks Consider TaskExecutor API Too many threads can be just as bad as too few Do not rely on JVM to discover opportunities No automatic parallelization  Java class libraries do not exploit vector processor capabilities
Think about scalability Load imbalance Workload not evenly distributed Consider breaking large tasks into smaller ones Change serial algorithms to parallel ones Tracing and I/O Bottleneck unless infrequent updates or log is striped (RAID) Blocking disk/console I/O inhibit scalability
Synchronization and locking J9's Three-tiered locking Spin Yield OS Avoid synchronization in static methods Consider breaking long synchronized blocks into several smaller ones May be bad if results in many context switches Java Lock Monitor (JLM) tool can help http://perfinsp.sourceforge.net/jlm.html
Synchronization and locking Volatiles Compiler will not cache the value Creates memory barrier Avoid synchronized container classes Building scalable data structures is difficult Use java.util.concurrent (j/u/c) Non-blocking object access Possible with j/u/c
Agenda Evolution of Processor Architecture Why should I care? Think about Scalability How to exploit Parallelism in Java JVM optimizations for multi-core scalability
java.util.concurrent package Introduced in Java SE 5  Alternative strong synchronization Lighter weight, better scalability  Comparing to intrinsic locks java.util.concurrent.atomic.* java.util.concurrent.locks.* ConcurrentCollections Synchronizers TaskExecutor
j/u/c/atomic.* Atomic primitives Strong form of synchronization But does not use lock – non blocking Exploit atomic instructions such as compare-and-swap in hardware Supports compounded actions AtomicLongFieldUpdater AtomicMarkableReference AtomicReference AtomicReferenceArray AtomicReferenceFieldUpdater AtomicStampedReference AtomicBoolean AtomicInteger AtomicIntegerArray AtomicIntegerFieldUpdater AtomicLong AtomicLongArray
j/u/c/atomic.* Getter and setters get set lazySet Updates getAndSet getAndAdd/getAndIncrement/getAndDecrement addAndGet/incrementAndGet/decrementAndGet CAS compareAndSet/weakCompareAndSet Conversions toString, intValue, longValue, floatValue, doubleValue
j/u/c/locks.* Problems with intrinsic locks Impossible to back off from a lock attempt Deadlock Lack of features Read vs write Fairness policies Block-structured Must lock and release in the same method j/u/c/locks Greater flexibility for locks and conditions Non-block-structured Provides reader-writer locks Why block other readers? Better scalability
j/u/c/locks.* Interfaces: Condition Lock ReadWriteLock Classes: ReentrantLock ReentrantReadWriteLock LockSupport AbstractQueuedSynchronizer
j/u/c.* - Concurrent Collections Concurrent, thread safe implementations of several collections HashMap  ->  ConcurrentHashMap TreeMap  ->  ConcurrentSkipListMap ArrayList  ->  CopyOnWriteArrayList ArraySet  ->  CopyOnWriteArraySet Queues  ->  ConcurrentLinkedQueue or one of the blocking queues
Strains on the VM Excessive use of temporary memory can lead to increased garbage collector activity Stop the world GC pauses the application Excessive class loading Updating class hierarchy Invalidating JIT optimizations Consider creating a “startup” phase Transitions between Java and native code VM access lock
Memory Footprint Little control over object allocation in Java Small short lived objects are easier to cache Large long lived objects likely to cause cache misses Memory Analysis Tool (MAT) can help Consider using large pages for TLB misses -Xlp, requires OS support Tune your heap settings Heap lock contention with flat heap
Affinitizing JVMs Can exploit cache hierarchy on a subset of cores JVM working set can fit within the physical memory of a single node in a NUMA system Linux:  taskset, numactl  Windows:  start
Is my application scalable? Low CPU means resources are not maximized Evaluate if application has too few/many threads Locks and synchronization Network connections, I/O Thrashing  working set is too large for physical memory High CPU is generally good, as long as resources are spent in application threads, doing meaningful work Evaluate where time is being spent Garbage collection VM/JIT OS Kernel functions Other processes Tune, tune, tune
Write Once, Tune Everywhere HealthCenter, GCMV, MAT http://www.ibm.com/developerworks/java/jdk/tools/ Dependence on operating System Memory allocation Socket layer Tune for hardware capabilities How many cores? How much memory? What is the limit on network access? Are there storage bottlenecks?
Agenda Evolution of Processor Architecture Why should I care? Think about Scalability How to exploit Parallelism in Java JVM optimizations for multi-core scalability
IBM Java Execution Model is Built for Parallelism JIT Compiler Garbage Collector Application Threads Generates high performance code for application threads Customizes execution to underlying hardware Optimizes locking performance Asynchronous compilation thread Java software threads are executed on multiple hardware threads Thread safe libraries with scalable concurrency support for parallel programming Manages memory on behalf of the application Must balance throughput against observed pauses Exploits many multiple hardware threads
Configurable Garbage Collection policies Multiple policies to match varying user requirements Pause time, Throughput, Memory footprint and  GC overhead All modes exploit parallel execution Dynamic adaptation to number of available hardware cores & threads GC scalability independent from user application scalability Very low overhead (<3%) on typical workloads
How do GC policies compare? - optthruput Time Thread 1 Thread 2 Thread 3 Thread n GC Java Optimize Throughput Highly parallel GC + streamlined application thread execution May cause longer pause times -Xgcpolicy:optthruput Picture is only illustrative and doesn’t reflect any particular real-life application. The purpose is to show theoretical differences in pause times between GC policies.
How do GC policies compare? - optavgpause Time GC Java Concurrent Tracing Optimize Pause Time GC cleans up concurrently with application thread execution Sacrifice some throughput to reduce average pause times -Xgcpolicy:optavgpause Picture is only illustrative and doesn’t reflect any particular real-life application. The purpose is to show theoretical differences in pause times between GC policies. Thread 1 Thread 2 Thread 3 Thread n
How do GC policies compare? - gencon Time Global GC Java Concurrent Tracing Scavenge GC Balanced Clean up many short-lived objects concurrent with application threads Some pauses needed to collect longer-lived objects -Xgcpolicy:gencon Picture is only illustrative and doesn’t reflect any particular real-life application. The purpose is to show theoretical differences in pause times between GC policies. Thread 1 Thread 2 Thread 3 Thread n
How do GC policies compare? - subpools Uses multiple free lists Tries to predict the size of future allocation requests based on earlier allocation requests.  Recreates free lists at the end of each GC based on these predictions.  While allocating objects on the heap, free chunks are chosen using a “best fit” method, as against the “first fit” method used in other algorithms. Concurrent marking is disabled Scalable Scalable GC focused on the larger multiprocessor machines Improved object allocation algorithm May not be appropriate for small-to-midsize configurations – Xgcpolicy:subpool
JVM optimizations for multi-core scalability Lock removal across JVM and class libraries java.util.concurrent package optimizations Better working set for cache efficiency Stack allocation Remove/optimize synchronization Thread local storage for send/receive buffers Non-blocking containers Asynch JIT compilation on a separate thread Right-sized application runtimes
Merci Grazie Gracias Obrigado Danke French Russian German Italian Spanish Brazilian Portuguese Arabic Simplified Chinese Traditional Chinese Thai Korean Thank You Questions? Email:  [email_address] http://www.ibm.com/developerworks/java/ Japanese
Special notices ©  IBM Corporation 2010. All Rights Reserved. The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views.  They are provided for informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or advice to any participant.  While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided AS IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other materials. Nothing contained in this presentation is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way.  Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results.  Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment.  The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed.  Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved.  Actual environmental costs and performance characteristics may vary by customer. The following are trademarks of the International Business Machines Corporation in the United States and/or other countries:  ibm.com/legal/copytrade.shtmlAIX, CICS, CICSPlex, DataPower, DB2, DB2 Universal Database, i5/OS, IBM, the IBM logo, IMS/ESA, Power Systems, Lotus, OMEGAMON, OS/390, Parallel Sysplex, pureXML, Rational, Redbooks, Sametime, SMART SOA, System z , Tivoli, WebSphere, and z/OS. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at ibm.com/legal/copytrade.shtml. Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office Intel and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

More Related Content

PPTX
Memory Management: What You Need to Know When Moving to Java 8
PPT
Efficient Memory and Thread Management in Highly Parallel Java Applications
PDF
Introduce_non-volatile_generic_object_programming_model_for_In-Memory_Computing
PDF
PostgreSQL Query Cache - "pqc"
PDF
Deep Learning Computer Build
PDF
Uptime Database Appliance - Technology Preview
PDF
The Challenges facing Libraries and Imperative Languages from Massively Paral...
PDF
Built-in Replication in PostgreSQL
Memory Management: What You Need to Know When Moving to Java 8
Efficient Memory and Thread Management in Highly Parallel Java Applications
Introduce_non-volatile_generic_object_programming_model_for_In-Memory_Computing
PostgreSQL Query Cache - "pqc"
Deep Learning Computer Build
Uptime Database Appliance - Technology Preview
The Challenges facing Libraries and Imperative Languages from Massively Paral...
Built-in Replication in PostgreSQL

What's hot (20)

PDF
Distributed DNN training: Infrastructure, challenges, and lessons learned
PDF
MIT's experience on OpenPOWER/POWER 9 platform
PDF
Five cool ways the JVM can run Apache Spark faster
PDF
Biomedical Signal and Image Analytics using MATLAB
PDF
Exploring the Performance Impact of Virtualization on an HPC Cloud
PDF
Apache Spark At Scale in the Cloud
PPTX
MapReduce Container ReUse
PDF
Strata London 2019 Scaling Impala
PDF
Pedal to the Metal: Accelerating Spark with Silicon Innovation
PDF
The JVM is your friend
PDF
Postgres & Red Hat Cluster Suite
 
PDF
"The BG collaboration, Past, Present, Future. The new available resources". P...
PDF
5 Steps to PostgreSQL Performance
PDF
Deep Dive into RDS PostgreSQL Universe
PDF
MySQL Infrastructure Testing Automation at GitHub
PPTX
Hadoop and Big Data Overview
PDF
Eliminating the Pauses in your Java Application
PDF
User-space Network Processing
PDF
High performance computing tutorial, with checklist and tips to optimize clus...
PDF
PostgreSQL and Benchmarks
Distributed DNN training: Infrastructure, challenges, and lessons learned
MIT's experience on OpenPOWER/POWER 9 platform
Five cool ways the JVM can run Apache Spark faster
Biomedical Signal and Image Analytics using MATLAB
Exploring the Performance Impact of Virtualization on an HPC Cloud
Apache Spark At Scale in the Cloud
MapReduce Container ReUse
Strata London 2019 Scaling Impala
Pedal to the Metal: Accelerating Spark with Silicon Innovation
The JVM is your friend
Postgres & Red Hat Cluster Suite
 
"The BG collaboration, Past, Present, Future. The new available resources". P...
5 Steps to PostgreSQL Performance
Deep Dive into RDS PostgreSQL Universe
MySQL Infrastructure Testing Automation at GitHub
Hadoop and Big Data Overview
Eliminating the Pauses in your Java Application
User-space Network Processing
High performance computing tutorial, with checklist and tips to optimize clus...
PostgreSQL and Benchmarks
Ad

Similar to Optimizing your java applications for multi core hardware (20)

PPT
J2EE Batch Processing
PPT
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
PPTX
Cassandra in Operation
PPT
Climbing the beanstalk
PPT
Caching fundamentals by Shrikant Vashishtha
PPT
Introduction to Real Time Java
PPTX
Low latency in java 8 v5
PPT
11g R2
PPT
Shopzilla On Concurrency
PPT
Shopzilla On Concurrency
PPT
Web Speed And Scalability
PPT
Java Core | Modern Java Concurrency | Martijn Verburg & Ben Evans
PPT
Java programing considering performance
PDF
Weblogic performance tuning1
PDF
Weblogic Cluster performance tuning
PPT
Virtual Classroom
PDF
Secrets of Performance Tuning Java on Kubernetes
PDF
Optimizing elastic search on google compute engine
PDF
Running ElasticSearch on Google Compute Engine in Production
PPT
Clustering van IT-componenten
J2EE Batch Processing
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Cassandra in Operation
Climbing the beanstalk
Caching fundamentals by Shrikant Vashishtha
Introduction to Real Time Java
Low latency in java 8 v5
11g R2
Shopzilla On Concurrency
Shopzilla On Concurrency
Web Speed And Scalability
Java Core | Modern Java Concurrency | Martijn Verburg & Ben Evans
Java programing considering performance
Weblogic performance tuning1
Weblogic Cluster performance tuning
Virtual Classroom
Secrets of Performance Tuning Java on Kubernetes
Optimizing elastic search on google compute engine
Running ElasticSearch on Google Compute Engine in Production
Clustering van IT-componenten
Ad

More from IndicThreads (20)

PPTX
Http2 is here! And why the web needs it
ODP
Understanding Bitcoin (Blockchain) and its Potential for Disruptive Applications
PPT
Go Programming Language - Learning The Go Lang way
PPT
Building Resilient Microservices
PPT
App using golang indicthreads
PDF
Building on quicksand microservices indicthreads
PDF
How to Think in RxJava Before Reacting
PPT
Iot secure connected devices indicthreads
PDF
Real world IoT for enterprises
PPT
IoT testing and quality assurance indicthreads
PPT
Functional Programming Past Present Future
PDF
Harnessing the Power of Java 8 Streams
PDF
Building & scaling a live streaming mobile platform - Gr8 road to fame
PPTX
Internet of things architecture perspective - IndicThreads Conference
PDF
Cars and Computers: Building a Java Carputer
PPTX
Scrap Your MapReduce - Apache Spark
PPT
Continuous Integration (CI) and Continuous Delivery (CD) using Jenkins & Docker
PPTX
Speed up your build pipeline for faster feedback
PPT
Unraveling OpenStack Clouds
PPTX
Digital Transformation of the Enterprise. What IT leaders need to know!
Http2 is here! And why the web needs it
Understanding Bitcoin (Blockchain) and its Potential for Disruptive Applications
Go Programming Language - Learning The Go Lang way
Building Resilient Microservices
App using golang indicthreads
Building on quicksand microservices indicthreads
How to Think in RxJava Before Reacting
Iot secure connected devices indicthreads
Real world IoT for enterprises
IoT testing and quality assurance indicthreads
Functional Programming Past Present Future
Harnessing the Power of Java 8 Streams
Building & scaling a live streaming mobile platform - Gr8 road to fame
Internet of things architecture perspective - IndicThreads Conference
Cars and Computers: Building a Java Carputer
Scrap Your MapReduce - Apache Spark
Continuous Integration (CI) and Continuous Delivery (CD) using Jenkins & Docker
Speed up your build pipeline for faster feedback
Unraveling OpenStack Clouds
Digital Transformation of the Enterprise. What IT leaders need to know!

Recently uploaded (20)

PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Spectroscopy.pptx food analysis technology
PDF
Encapsulation_ Review paper, used for researhc scholars
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Machine Learning_overview_presentation.pptx
PDF
Approach and Philosophy of On baking technology
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPT
Teaching material agriculture food technology
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Encapsulation theory and applications.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Machine learning based COVID-19 study performance prediction
gpt5_lecture_notes_comprehensive_20250812015547.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Spectroscopy.pptx food analysis technology
Encapsulation_ Review paper, used for researhc scholars
“AI and Expert System Decision Support & Business Intelligence Systems”
MYSQL Presentation for SQL database connectivity
Building Integrated photovoltaic BIPV_UPV.pdf
Machine Learning_overview_presentation.pptx
Approach and Philosophy of On baking technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Teaching material agriculture food technology
Per capita expenditure prediction using model stacking based on satellite ima...
Unlocking AI with Model Context Protocol (MCP)
Agricultural_Statistics_at_a_Glance_2022_0.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Assigned Numbers - 2025 - Bluetooth® Document
Encapsulation theory and applications.pdf
20250228 LYD VKU AI Blended-Learning.pptx

Optimizing your java applications for multi core hardware

  • 1. Optimizing your Java Applications for multi-core hardware Prashanth K Nageshappa [email_address] Java Technologies IBM
  • 2. Agenda Evolution of Processor Architecture Why should I care? Think about Scalability How to exploit Parallelism in Java JVM optimizations for multi-core scalability
  • 3. As The World Gets Smarter, Demands On IT Will Grow Smart energy grids Smart healthcare Smart food systems Intelligent oil field technologies Smart supply chains Smart retail IT infrastructure must grow to meet these demands global scope, processing scale, efficiency Digital data is projected to grow tenfold from 2007 to 2011. Devices will be connected to the internet by 2011 1 Trillion Global trading systems are under extreme stress, handling billions of market data messages each day 25 Billion 70% on average is spent on maintaining current IT infrastructure versus adding new capabilities 10x
  • 4. Hardware Trends Increasing transistor density Clock Speed leveling off More number of cores Non-Uniform Memory Access Main memory getting larger
  • 5. In 2010 POWER Systems Brings Massive Parallelism 2001 180 nm 2004 130 nm 2007 65 nm 2010 45 nm POWER7™ 4 threads/core 8 cores/chip 32 sockets/server 1024 threads POWER6™ 2 threads/core 2 cores/chip 32 sockets/server 128 threads POWER5™ 2 threads/core 2 cores/chip 32 sockets/server 128 threads POWER4™ 1 thread/core 2 cores/chip 16 sockets/server 32 threads Threads
  • 6. Agenda Evolution of Processor Architecture Why should I care? Think about Scalability How to exploit Parallelism in Java JVM optimizations for multi-core scalability
  • 7. Why should I care? Your application may be re-used Better performance Better leverage additional resources Cores, hardware threads, memory etc
  • 8. Think about scalability Serial bottlenecks inhibit scalability Organize your application into parallel tasks Consider TaskExecutor API Too many threads can be just as bad as too few Do not rely on JVM to discover opportunities No automatic parallelization Java class libraries do not exploit vector processor capabilities
  • 9. Think about scalability Load imbalance Workload not evenly distributed Consider breaking large tasks into smaller ones Change serial algorithms to parallel ones Tracing and I/O Bottleneck unless infrequent updates or log is striped (RAID) Blocking disk/console I/O inhibit scalability
  • 10. Synchronization and locking J9's Three-tiered locking Spin Yield OS Avoid synchronization in static methods Consider breaking long synchronized blocks into several smaller ones May be bad if results in many context switches Java Lock Monitor (JLM) tool can help http://perfinsp.sourceforge.net/jlm.html
  • 11. Synchronization and locking Volatiles Compiler will not cache the value Creates memory barrier Avoid synchronized container classes Building scalable data structures is difficult Use java.util.concurrent (j/u/c) Non-blocking object access Possible with j/u/c
  • 12. Agenda Evolution of Processor Architecture Why should I care? Think about Scalability How to exploit Parallelism in Java JVM optimizations for multi-core scalability
  • 13. java.util.concurrent package Introduced in Java SE 5 Alternative strong synchronization Lighter weight, better scalability Comparing to intrinsic locks java.util.concurrent.atomic.* java.util.concurrent.locks.* ConcurrentCollections Synchronizers TaskExecutor
  • 14. j/u/c/atomic.* Atomic primitives Strong form of synchronization But does not use lock – non blocking Exploit atomic instructions such as compare-and-swap in hardware Supports compounded actions AtomicLongFieldUpdater AtomicMarkableReference AtomicReference AtomicReferenceArray AtomicReferenceFieldUpdater AtomicStampedReference AtomicBoolean AtomicInteger AtomicIntegerArray AtomicIntegerFieldUpdater AtomicLong AtomicLongArray
  • 15. j/u/c/atomic.* Getter and setters get set lazySet Updates getAndSet getAndAdd/getAndIncrement/getAndDecrement addAndGet/incrementAndGet/decrementAndGet CAS compareAndSet/weakCompareAndSet Conversions toString, intValue, longValue, floatValue, doubleValue
  • 16. j/u/c/locks.* Problems with intrinsic locks Impossible to back off from a lock attempt Deadlock Lack of features Read vs write Fairness policies Block-structured Must lock and release in the same method j/u/c/locks Greater flexibility for locks and conditions Non-block-structured Provides reader-writer locks Why block other readers? Better scalability
  • 17. j/u/c/locks.* Interfaces: Condition Lock ReadWriteLock Classes: ReentrantLock ReentrantReadWriteLock LockSupport AbstractQueuedSynchronizer
  • 18. j/u/c.* - Concurrent Collections Concurrent, thread safe implementations of several collections HashMap -> ConcurrentHashMap TreeMap -> ConcurrentSkipListMap ArrayList -> CopyOnWriteArrayList ArraySet -> CopyOnWriteArraySet Queues -> ConcurrentLinkedQueue or one of the blocking queues
  • 19. Strains on the VM Excessive use of temporary memory can lead to increased garbage collector activity Stop the world GC pauses the application Excessive class loading Updating class hierarchy Invalidating JIT optimizations Consider creating a “startup” phase Transitions between Java and native code VM access lock
  • 20. Memory Footprint Little control over object allocation in Java Small short lived objects are easier to cache Large long lived objects likely to cause cache misses Memory Analysis Tool (MAT) can help Consider using large pages for TLB misses -Xlp, requires OS support Tune your heap settings Heap lock contention with flat heap
  • 21. Affinitizing JVMs Can exploit cache hierarchy on a subset of cores JVM working set can fit within the physical memory of a single node in a NUMA system Linux: taskset, numactl Windows: start
  • 22. Is my application scalable? Low CPU means resources are not maximized Evaluate if application has too few/many threads Locks and synchronization Network connections, I/O Thrashing working set is too large for physical memory High CPU is generally good, as long as resources are spent in application threads, doing meaningful work Evaluate where time is being spent Garbage collection VM/JIT OS Kernel functions Other processes Tune, tune, tune
  • 23. Write Once, Tune Everywhere HealthCenter, GCMV, MAT http://www.ibm.com/developerworks/java/jdk/tools/ Dependence on operating System Memory allocation Socket layer Tune for hardware capabilities How many cores? How much memory? What is the limit on network access? Are there storage bottlenecks?
  • 24. Agenda Evolution of Processor Architecture Why should I care? Think about Scalability How to exploit Parallelism in Java JVM optimizations for multi-core scalability
  • 25. IBM Java Execution Model is Built for Parallelism JIT Compiler Garbage Collector Application Threads Generates high performance code for application threads Customizes execution to underlying hardware Optimizes locking performance Asynchronous compilation thread Java software threads are executed on multiple hardware threads Thread safe libraries with scalable concurrency support for parallel programming Manages memory on behalf of the application Must balance throughput against observed pauses Exploits many multiple hardware threads
  • 26. Configurable Garbage Collection policies Multiple policies to match varying user requirements Pause time, Throughput, Memory footprint and GC overhead All modes exploit parallel execution Dynamic adaptation to number of available hardware cores & threads GC scalability independent from user application scalability Very low overhead (<3%) on typical workloads
  • 27. How do GC policies compare? - optthruput Time Thread 1 Thread 2 Thread 3 Thread n GC Java Optimize Throughput Highly parallel GC + streamlined application thread execution May cause longer pause times -Xgcpolicy:optthruput Picture is only illustrative and doesn’t reflect any particular real-life application. The purpose is to show theoretical differences in pause times between GC policies.
  • 28. How do GC policies compare? - optavgpause Time GC Java Concurrent Tracing Optimize Pause Time GC cleans up concurrently with application thread execution Sacrifice some throughput to reduce average pause times -Xgcpolicy:optavgpause Picture is only illustrative and doesn’t reflect any particular real-life application. The purpose is to show theoretical differences in pause times between GC policies. Thread 1 Thread 2 Thread 3 Thread n
  • 29. How do GC policies compare? - gencon Time Global GC Java Concurrent Tracing Scavenge GC Balanced Clean up many short-lived objects concurrent with application threads Some pauses needed to collect longer-lived objects -Xgcpolicy:gencon Picture is only illustrative and doesn’t reflect any particular real-life application. The purpose is to show theoretical differences in pause times between GC policies. Thread 1 Thread 2 Thread 3 Thread n
  • 30. How do GC policies compare? - subpools Uses multiple free lists Tries to predict the size of future allocation requests based on earlier allocation requests. Recreates free lists at the end of each GC based on these predictions. While allocating objects on the heap, free chunks are chosen using a “best fit” method, as against the “first fit” method used in other algorithms. Concurrent marking is disabled Scalable Scalable GC focused on the larger multiprocessor machines Improved object allocation algorithm May not be appropriate for small-to-midsize configurations – Xgcpolicy:subpool
  • 31. JVM optimizations for multi-core scalability Lock removal across JVM and class libraries java.util.concurrent package optimizations Better working set for cache efficiency Stack allocation Remove/optimize synchronization Thread local storage for send/receive buffers Non-blocking containers Asynch JIT compilation on a separate thread Right-sized application runtimes
  • 32. Merci Grazie Gracias Obrigado Danke French Russian German Italian Spanish Brazilian Portuguese Arabic Simplified Chinese Traditional Chinese Thai Korean Thank You Questions? Email: [email_address] http://www.ibm.com/developerworks/java/ Japanese
  • 33. Special notices © IBM Corporation 2010. All Rights Reserved. The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views. They are provided for informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or advice to any participant. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided AS IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other materials. Nothing contained in this presentation is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. The following are trademarks of the International Business Machines Corporation in the United States and/or other countries: ibm.com/legal/copytrade.shtmlAIX, CICS, CICSPlex, DataPower, DB2, DB2 Universal Database, i5/OS, IBM, the IBM logo, IMS/ESA, Power Systems, Lotus, OMEGAMON, OS/390, Parallel Sysplex, pureXML, Rational, Redbooks, Sametime, SMART SOA, System z , Tivoli, WebSphere, and z/OS. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at ibm.com/legal/copytrade.shtml. Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office Intel and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.