SlideShare a Scribd company logo
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
A Day of Real-World Performance
10/2/2018
Andrew Holdsworth, Tom Kyte, Graham Wood
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Some Computer
Science Basics
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Network Network Database
Server
Response Time v DB Time v Latency
Application
Server
End User
Total User Response Time
Time Line
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Database Time – Total time spent in database
Db file sequential read
Run-queue
On CPU
User 1
Actual wait time
Recorded wait time
Db file sequential read
Run-queue Lock Wait
Latch
Wait
Run-queue
On CPU On CPU On CPUOn CPU On CPU On CPU On CPU On CPUOn CPU
User 2
Actual wait time Actual wait timeActual wait time
Recorded wait time Recorded wait time Recorded wait time
ON DEGRADED SYSTEM
Lock Wait
ON IDLE SYSTEM
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Latency - Some Important Numbers
Block Location Access Time
L2 CPU cache ~ 1 nano sec ( 10-9 )
Virtual Memory ~ 1 micro sec ( 10-6 )
NUMA Far Memory ~ 10 micro sec ( 10-6 )
Flash Memory (PCI) ~ 0.01 milli sec ( 10-3 )
Flash Memory (Networked) ~ 0.1 milli sec ( 10-3 )
Disk I/O ~ 1-10 milli sec ( 10-3 )
Best Block Access Speeds
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Database Performance Core Principles
• The Oracle database is a process based architecture and to perform
efficiently each process requires:
– To be efficiently scheduled by the O/S until the process completes the SQL statement,
or blocks on an operation required to complete the SQL statement e.g. Disk I/O
– If the process has to fight to get scheduled, or needs to be scheduled for an over
extended period of time due to SQL inefficiencies, or any blocking operation takes a
long time, then database performance will be poor
• Database performance engineers spend most of their time looking for CPU-
consuming processes and eliminating blocking events
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Database Performance Core Principles
• To determine acceptable CPU utilization take a probabilistic approach to the
subject.
– If a CPU is 50% busy the chance of getting scheduled is 1 in 2
– If a CPU is 66% busy the chance of getting scheduled is 1 in 3
– If a CPU is 80% busy the chance of getting scheduled is 1 in 5
– If a CPU is 90% busy the chance of getting scheduled is 1 in10
• If the probabilities are used as indicator of the predictability of user response
time, then the variance in user response time becomes noticeable at about 60-
65%
• This has been observed in production and laboratory conditions for many years.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Impact of Too Many Processes
Database Core Principles
0
2000
4000
6000
8000
10000
12000
14000
16000
4 8 12 16 20 24 28 32
1 Proc/Core
10 Proc/Core Avg
50 Proc/Core Avg
10 Proc/Core Max
50 Proc/Core Max
10 Proc/Core Min
50 Proc/Core Min
#of CPUs
Tx/s
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Connection Pooling
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Connection Pools
Performance Data
The workload is increased by
doubling the load. System appears
scalable up to 60% CPU on the DB
server.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Connection Pools
Performance Data
A checkpoint is initiated, creating
a CPU spike that results in
unpredictable response time
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Connection Pools
Performance Data
A slight increase to the
workload results in a
disproportionate CPU increase
and response time degrades.
System monitoring tools
become unreliable
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Connection Pools
Performance Data
Reducing the connection pool by 50% results in
more application server queuing and less DB
processes in a wait state. No observable
improvement in response time or transaction rate
(value or consistency)
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Connection Pools
Performance Data
Connection pool reduced to 96.
Note improvement in response time
and transaction rate.
CPU utilization is reduced.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Resource Management
Performance Data
By reducing the CPU_COUNT in the
resource manager, the database can be
throttled back. Note the increase in
response time and wait event resmgr:
cpu quantum
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Leaking
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Leaking
• Intermittent error: “ORA-01000: Maximum number of cursors exceeded”.
Application server fails and must be restarted
• The DBA has suggested that the init.ora parameter open_cursors be reset
to 30,000 to make the problem “go away for a while”.
• Symptoms of cursor leaking
Observations
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Performance Data
Leaking
Error message:
ORA-01000 Maximum open
cursors exceeded
“SQL*Net break/reset
to client”
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Cursor Data
Leaking
Cursor list with Count > 1
implies “leaked” cursors
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Leaking
• After a period of time, the system performance begins to decline and then
degrades rapidly
• After rapid degradation, the application servers time out and the system is
unavailable
• The DBA claims the database is not the problem and simply needs more
connections
– The init.ora parameter processes is increased to 20,000
Observations
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Leaking
• Due to coding errors on exception handling, the application leaks
connections in the connection pool making them programmatically
impossible to use
• This reduces the effective size of the connection pool
• The remaining connection are unable to keep up with the incoming
workload
• The rate of connection leakage is accelerated until there are no useable
connections left in the pool
Session Leaking
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Leaking
• Potential indicators of session leaking:
– Frequent application server resets
– init.ora parameters process and sessions set very high
– Configuration of large and dynamic connection pools
– Large number of idle connections connected to the database
– Free memory on database server continually reduced
– Presence of idle connection kill scripts or middleware configured to kill idle sessions
Session Leaking
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Leaking
• Without warning, the database appears to hang and the application servers
time out simultaneously
• The DBA sees that all connections are waiting on a single lock held by a
process that has not been active for a while.
• Each time the problem occurs, the DBA responds by running a script to kill
sessions held by long time lock holders and allowing the system to restart.
Observations
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Leaking
• Lock leaking is usually a side effect of session leaking and the exception
handling code failing to execute a commit or rollback in the exception
handling process.
• A leaked session may be programmatically lost to the connection while
holding locks and uncommitted changes to the database.
Lock Leaking
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Leaking
• Programming error impact:
– Potential system hangs: all connections queue up for the held lock
– Potential database logical corruptions: end users may have thought transactions were
committed when in fact they have not been
– If sessions return to the connection pool but still have uncommitted changes, it is not
deterministic, if and/or when the changes are committed or rolled back. This is a
serious data integrity issue.
Lock Leaking
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Leaking
• Developer Bugs
– Incorrect/untested exception handling
• Cursor, session and lock leaking
– High values for init.ora ( open_cursors, processes, sessions )
– Idle process and lock holder kill scripts
– Oversized connection pools of largely idle processes
How to Develop High Performance Applications
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Database / Middleware
Interaction
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Scenario
Database / Middleware Interaction
• Devices ship files.
• Files read and
processed by
multiple
application
servers
• Each application
server uses
multiple threads
that connect to
database through
a connection pool
which is
distributed by a
scan listener over
two instances.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Problem
Database / Middleware Interaction
• It’s too slow
• It’s a problem
with the
database
– Look at all
those waits
• Need to be able
to process an
order of
magnitude
more data
• Obviously need
to move to
Hadoop
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Analysis
Database / Middleware Interaction
• Only small
amount of data
being
processed.
• Both instances
essentially idle
with most
processes
waiting in RAC
and
concurrency
waits.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Solution
Database / Middleware Interaction
• Remove all of
those RAC waits
by running
against a single
database
instance.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Analysis
Database / Middleware Interaction
• Throughput up
by factor of 10x
• RAC waits gone
• CPU time
actually visible
• High
concurrency
waits
– Buffer busy
– Tx index
contention
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Solution
Database / Middleware Interaction
• Reduce
contention
waits by
processing a file
entirely within a
single
application
server
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Analysis
Database / Middleware Interaction
• Throughput
improved again
• Concurrency
events reduced
but still present
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Solution
Database / Middleware Interaction
• Introduce
affinity for a
related set of
records to a
single thread by
hashing
• All records for
the same
primary key
processed by
single thread so
no contention in
index for same
primary key vale
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Analysis
Database / Middleware Interaction
• More
throughput
• Log file sync
predominant
event
• CPU usage close
to core count
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Solution
Database / Middleware Interaction
• Reintroduce
RAC to add
more CPU
resource
• Implement
separate service
for each
instance
• Connect
application
server to one
instance

More Related Content

PPTX
Database Core performance principles
PPTX
Power of the AWR Warehouse
PDF
AWR & ASH Analysis
PDF
SmartDB Office Hours: Connection Pool Sizing Concepts
PPTX
Updated Power of the AWR Warehouse, Dallas, HQ, etc.
PDF
Performance tuning intro
PDF
AWR, ASH with EM13 at HotSos 2016
PDF
2.Oracle’S High Availability Vision
Database Core performance principles
Power of the AWR Warehouse
AWR & ASH Analysis
SmartDB Office Hours: Connection Pool Sizing Concepts
Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Performance tuning intro
AWR, ASH with EM13 at HotSos 2016
2.Oracle’S High Availability Vision

What's hot (20)

PPTX
Kellyn Pot'Vin-Gorman - Power awr warehouse2
PDF
SafePeak - Poria hospital case study
PPTX
Approaches for WebLogic Server in the Cloud (OpenWorld, September 2014)
PPTX
Docker based Hadoop provisioning - anywhere
PDF
New Generation Oracle RAC Performance
PPTX
AWR and ASH Deep Dive
PDF
Database Design Thoughts
PPTX
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
PDF
ODTUG Webinar AWR Warehouse
PPTX
AWR and ASH Advanced Usage with DB12c
PDF
How to Use EXAchk Effectively to Manage Exadata Environments
PDF
CON 2107- Think Async: Embrace and Get Addicted to the Asynchronicity of EE
PDF
AUSOUG - NZOUG - Groundbreakers - Jun 2019 - 19 Troubleshooting Tips and Tric...
PDF
TFA, ORAchk and EXAchk 20.2 - What's new
PPTX
New lessons in connection management
PPTX
Storage and-compute-hdfs-map reduce
PDF
SafePeak whitepaper
PPTX
Fail safe modeling for cloud services and applications
PPTX
2015 UJUG, Servlet 4.0 portion
PPTX
The Power of Java and Oracle WebLogic Server in the Public Cloud (OpenWorld, ...
Kellyn Pot'Vin-Gorman - Power awr warehouse2
SafePeak - Poria hospital case study
Approaches for WebLogic Server in the Cloud (OpenWorld, September 2014)
Docker based Hadoop provisioning - anywhere
New Generation Oracle RAC Performance
AWR and ASH Deep Dive
Database Design Thoughts
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
ODTUG Webinar AWR Warehouse
AWR and ASH Advanced Usage with DB12c
How to Use EXAchk Effectively to Manage Exadata Environments
CON 2107- Think Async: Embrace and Get Addicted to the Asynchronicity of EE
AUSOUG - NZOUG - Groundbreakers - Jun 2019 - 19 Troubleshooting Tips and Tric...
TFA, ORAchk and EXAchk 20.2 - What's new
New lessons in connection management
Storage and-compute-hdfs-map reduce
SafePeak whitepaper
Fail safe modeling for cloud services and applications
2015 UJUG, Servlet 4.0 portion
The Power of Java and Oracle WebLogic Server in the Public Cloud (OpenWorld, ...
Ad

Similar to Some Oracle AWR observations (20)

PDF
Developer day v2
PDF
AWR and ASH in an EM12c World
PPTX
Oracle SQL Developer for the DBA
PDF
HTTP/2 Comes to Java - What Servlet 4.0 Means to You
PDF
Introduction to Distributed Computing & Distributed Databases
PPTX
Kellyn Pot'Vin-Gorman - Awr and Ash
PPTX
Oracle Database Lifecycle Management
PDF
20150110 my sql-performanceschema
PPTX
Database as a Service, Collaborate 2016
PDF
Apouc 2014-enterprise-manager-12c
PDF
Using MySQL Enterprise Monitor for Continuous Performance Improvement
PPTX
Oracle WebLogic Server 12c: Seamless Oracle Database Integration (with NEC, O...
PDF
Ebs performance tuning session feb 13 2013---Presented by Oracle
PPTX
Em13c New Features- Two of Two
PDF
Database failover from client perspective
PDF
Alta Disponibilidade no MySQL 5.7
PPTX
OUGLS 2016: How profiling works in MySQL
PPTX
Servlet 4.0 Adopt-a-JSR 10 Minute Infodeck
PPTX
Geek Sync I CSI for SQL: Learn to be a SQL Sleuth
PDF
MySQL Manchester TT - Performance Tuning
Developer day v2
AWR and ASH in an EM12c World
Oracle SQL Developer for the DBA
HTTP/2 Comes to Java - What Servlet 4.0 Means to You
Introduction to Distributed Computing & Distributed Databases
Kellyn Pot'Vin-Gorman - Awr and Ash
Oracle Database Lifecycle Management
20150110 my sql-performanceschema
Database as a Service, Collaborate 2016
Apouc 2014-enterprise-manager-12c
Using MySQL Enterprise Monitor for Continuous Performance Improvement
Oracle WebLogic Server 12c: Seamless Oracle Database Integration (with NEC, O...
Ebs performance tuning session feb 13 2013---Presented by Oracle
Em13c New Features- Two of Two
Database failover from client perspective
Alta Disponibilidade no MySQL 5.7
OUGLS 2016: How profiling works in MySQL
Servlet 4.0 Adopt-a-JSR 10 Minute Infodeck
Geek Sync I CSI for SQL: Learn to be a SQL Sleuth
MySQL Manchester TT - Performance Tuning
Ad

More from Connor McDonald (20)

PDF
Flashback ITOUG
PDF
Sangam 19 - PLSQL still the coolest
PDF
Sangam 19 - Analytic SQL
PDF
UKOUG - 25 years of hints and tips
PDF
Sangam 19 - Successful Applications on Autonomous
PDF
Sangam 2019 - The Latest Features
PDF
UKOUG 2019 - SQL features
PDF
APEX tour 2019 - successful development with autonomous
PDF
APAC Groundbreakers 2019 - Perth/Melbourne
PDF
OOW19 - Flashback, not just for DBAs
PDF
OOW19 - Read consistency
PDF
OOW19 - Slower and less secure applications
PDF
OOW19 - Killing database sessions
PDF
OOW19 - Ten Amazing SQL features
PDF
Latin America Tour 2019 - 18c and 19c featues
PDF
Latin America tour 2019 - Flashback
PDF
Latin America Tour 2019 - 10 great sql features
PDF
Latin America Tour 2019 - pattern matching
PDF
Latin America Tour 2019 - slow data and sql processing
PDF
ANSI vs Oracle language
Flashback ITOUG
Sangam 19 - PLSQL still the coolest
Sangam 19 - Analytic SQL
UKOUG - 25 years of hints and tips
Sangam 19 - Successful Applications on Autonomous
Sangam 2019 - The Latest Features
UKOUG 2019 - SQL features
APEX tour 2019 - successful development with autonomous
APAC Groundbreakers 2019 - Perth/Melbourne
OOW19 - Flashback, not just for DBAs
OOW19 - Read consistency
OOW19 - Slower and less secure applications
OOW19 - Killing database sessions
OOW19 - Ten Amazing SQL features
Latin America Tour 2019 - 18c and 19c featues
Latin America tour 2019 - Flashback
Latin America Tour 2019 - 10 great sql features
Latin America Tour 2019 - pattern matching
Latin America Tour 2019 - slow data and sql processing
ANSI vs Oracle language

Recently uploaded (20)

PDF
Getting Started with Data Integration: FME Form 101
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
1. Introduction to Computer Programming.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Spectroscopy.pptx food analysis technology
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
August Patch Tuesday
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
cloud_computing_Infrastucture_as_cloud_p
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
Getting Started with Data Integration: FME Form 101
Group 1 Presentation -Planning and Decision Making .pptx
A comparative analysis of optical character recognition models for extracting...
1. Introduction to Computer Programming.pptx
Empathic Computing: Creating Shared Understanding
Accuracy of neural networks in brain wave diagnosis of schizophrenia
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Spectroscopy.pptx food analysis technology
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Spectral efficient network and resource selection model in 5G networks
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Encapsulation_ Review paper, used for researhc scholars
August Patch Tuesday
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
cloud_computing_Infrastucture_as_cloud_p
OMC Textile Division Presentation 2021.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing

Some Oracle AWR observations

  • 1. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | A Day of Real-World Performance 10/2/2018 Andrew Holdsworth, Tom Kyte, Graham Wood
  • 2. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Some Computer Science Basics
  • 3. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Network Network Database Server Response Time v DB Time v Latency Application Server End User Total User Response Time Time Line
  • 4. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Database Time – Total time spent in database Db file sequential read Run-queue On CPU User 1 Actual wait time Recorded wait time Db file sequential read Run-queue Lock Wait Latch Wait Run-queue On CPU On CPU On CPUOn CPU On CPU On CPU On CPU On CPUOn CPU User 2 Actual wait time Actual wait timeActual wait time Recorded wait time Recorded wait time Recorded wait time ON DEGRADED SYSTEM Lock Wait ON IDLE SYSTEM
  • 5. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Latency - Some Important Numbers Block Location Access Time L2 CPU cache ~ 1 nano sec ( 10-9 ) Virtual Memory ~ 1 micro sec ( 10-6 ) NUMA Far Memory ~ 10 micro sec ( 10-6 ) Flash Memory (PCI) ~ 0.01 milli sec ( 10-3 ) Flash Memory (Networked) ~ 0.1 milli sec ( 10-3 ) Disk I/O ~ 1-10 milli sec ( 10-3 ) Best Block Access Speeds
  • 6. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Database Performance Core Principles • The Oracle database is a process based architecture and to perform efficiently each process requires: – To be efficiently scheduled by the O/S until the process completes the SQL statement, or blocks on an operation required to complete the SQL statement e.g. Disk I/O – If the process has to fight to get scheduled, or needs to be scheduled for an over extended period of time due to SQL inefficiencies, or any blocking operation takes a long time, then database performance will be poor • Database performance engineers spend most of their time looking for CPU- consuming processes and eliminating blocking events
  • 7. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Database Performance Core Principles • To determine acceptable CPU utilization take a probabilistic approach to the subject. – If a CPU is 50% busy the chance of getting scheduled is 1 in 2 – If a CPU is 66% busy the chance of getting scheduled is 1 in 3 – If a CPU is 80% busy the chance of getting scheduled is 1 in 5 – If a CPU is 90% busy the chance of getting scheduled is 1 in10 • If the probabilities are used as indicator of the predictability of user response time, then the variance in user response time becomes noticeable at about 60- 65% • This has been observed in production and laboratory conditions for many years.
  • 8. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Impact of Too Many Processes Database Core Principles 0 2000 4000 6000 8000 10000 12000 14000 16000 4 8 12 16 20 24 28 32 1 Proc/Core 10 Proc/Core Avg 50 Proc/Core Avg 10 Proc/Core Max 50 Proc/Core Max 10 Proc/Core Min 50 Proc/Core Min #of CPUs Tx/s
  • 9. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Connection Pooling
  • 10. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Connection Pools Performance Data The workload is increased by doubling the load. System appears scalable up to 60% CPU on the DB server.
  • 11. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Connection Pools Performance Data A checkpoint is initiated, creating a CPU spike that results in unpredictable response time
  • 12. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Connection Pools Performance Data A slight increase to the workload results in a disproportionate CPU increase and response time degrades. System monitoring tools become unreliable
  • 13. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Connection Pools Performance Data Reducing the connection pool by 50% results in more application server queuing and less DB processes in a wait state. No observable improvement in response time or transaction rate (value or consistency)
  • 14. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Connection Pools Performance Data Connection pool reduced to 96. Note improvement in response time and transaction rate. CPU utilization is reduced.
  • 15. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Resource Management Performance Data By reducing the CPU_COUNT in the resource manager, the database can be throttled back. Note the increase in response time and wait event resmgr: cpu quantum
  • 16. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Leaking
  • 17. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Leaking • Intermittent error: “ORA-01000: Maximum number of cursors exceeded”. Application server fails and must be restarted • The DBA has suggested that the init.ora parameter open_cursors be reset to 30,000 to make the problem “go away for a while”. • Symptoms of cursor leaking Observations
  • 18. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Performance Data Leaking Error message: ORA-01000 Maximum open cursors exceeded “SQL*Net break/reset to client”
  • 19. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Cursor Data Leaking Cursor list with Count > 1 implies “leaked” cursors
  • 20. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Leaking • After a period of time, the system performance begins to decline and then degrades rapidly • After rapid degradation, the application servers time out and the system is unavailable • The DBA claims the database is not the problem and simply needs more connections – The init.ora parameter processes is increased to 20,000 Observations
  • 21. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Leaking • Due to coding errors on exception handling, the application leaks connections in the connection pool making them programmatically impossible to use • This reduces the effective size of the connection pool • The remaining connection are unable to keep up with the incoming workload • The rate of connection leakage is accelerated until there are no useable connections left in the pool Session Leaking
  • 22. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Leaking • Potential indicators of session leaking: – Frequent application server resets – init.ora parameters process and sessions set very high – Configuration of large and dynamic connection pools – Large number of idle connections connected to the database – Free memory on database server continually reduced – Presence of idle connection kill scripts or middleware configured to kill idle sessions Session Leaking
  • 23. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Leaking • Without warning, the database appears to hang and the application servers time out simultaneously • The DBA sees that all connections are waiting on a single lock held by a process that has not been active for a while. • Each time the problem occurs, the DBA responds by running a script to kill sessions held by long time lock holders and allowing the system to restart. Observations
  • 24. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Leaking • Lock leaking is usually a side effect of session leaking and the exception handling code failing to execute a commit or rollback in the exception handling process. • A leaked session may be programmatically lost to the connection while holding locks and uncommitted changes to the database. Lock Leaking
  • 25. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Leaking • Programming error impact: – Potential system hangs: all connections queue up for the held lock – Potential database logical corruptions: end users may have thought transactions were committed when in fact they have not been – If sessions return to the connection pool but still have uncommitted changes, it is not deterministic, if and/or when the changes are committed or rolled back. This is a serious data integrity issue. Lock Leaking
  • 26. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Leaking • Developer Bugs – Incorrect/untested exception handling • Cursor, session and lock leaking – High values for init.ora ( open_cursors, processes, sessions ) – Idle process and lock holder kill scripts – Oversized connection pools of largely idle processes How to Develop High Performance Applications
  • 27. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Database / Middleware Interaction
  • 28. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Scenario Database / Middleware Interaction • Devices ship files. • Files read and processed by multiple application servers • Each application server uses multiple threads that connect to database through a connection pool which is distributed by a scan listener over two instances.
  • 29. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Problem Database / Middleware Interaction • It’s too slow • It’s a problem with the database – Look at all those waits • Need to be able to process an order of magnitude more data • Obviously need to move to Hadoop
  • 30. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Analysis Database / Middleware Interaction • Only small amount of data being processed. • Both instances essentially idle with most processes waiting in RAC and concurrency waits.
  • 31. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Solution Database / Middleware Interaction • Remove all of those RAC waits by running against a single database instance.
  • 32. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Analysis Database / Middleware Interaction • Throughput up by factor of 10x • RAC waits gone • CPU time actually visible • High concurrency waits – Buffer busy – Tx index contention
  • 33. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Solution Database / Middleware Interaction • Reduce contention waits by processing a file entirely within a single application server
  • 34. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Analysis Database / Middleware Interaction • Throughput improved again • Concurrency events reduced but still present
  • 35. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Solution Database / Middleware Interaction • Introduce affinity for a related set of records to a single thread by hashing • All records for the same primary key processed by single thread so no contention in index for same primary key vale
  • 36. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Analysis Database / Middleware Interaction • More throughput • Log file sync predominant event • CPU usage close to core count
  • 37. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Solution Database / Middleware Interaction • Reintroduce RAC to add more CPU resource • Implement separate service for each instance • Connect application server to one instance