SlideShare a Scribd company logo
Spectrum Scale 4.1 System Administration
Quorums, Quorum Loss
and recovery
© Copyright IBM Corporation 2015
Unit objectives
After completing this unit, you should be able to:
• Review Quorums
• Review some Quorum best practices
• Describe recovery from quorum and cluster configuration node
failures
© Copyright IBM Corporation 2015
Spectrum Scale Quorum Nodes
Why do we need Quorum nodes?
• Quorum Nodes prevent multiple nodes from assuming the role
of the file system manager
• A majority of the quorum nodes MUST remain active in order
for the cluster to sustain normal file system activity.
• This prevents the cluster from becoming partitioned off where
two different nodes think they are each in control and corrupt
the file system
© Copyright IBM Corporation 2015
Best Practices: Quorum
• Quorum – set of 3-7 nodes use to determine the state of the
cluster
• Small subset of nodes designated as quorum nodes
• If there are only two nodes tiebreaker disks can be used to
determine quorum
• Recommendations
– Use an odd number of nodes
– Choose your most reliable nodes
– Choose nodes in separate: cabinets, power systems
– Use tiebreaker disks only on small clusters
– For small clusters, a simple VM can serve as a quorum node
– Quorum nodes should not be placed on extremely busy servers
© Copyright IBM Corporation 2015
RECOVERY SCENARIOS
© Copyright IBM Corporation 2015
How to recover: Node Quorum
How to recover from loss of majority for quorum nodes
• Scenario: You have 3 quorum nodes and 2 quorum nodes die.
– Nodes go into Arbitrating state
– Add 2 new quorum nodes
– mmchnode -–quorum –N node4,node5
– Now you have 5 quorum nodes and 3 are running
– Running nodes return to Active state
– Delete missing quorum nodes
– mmdelnode –N badnode1,badnode2
– or
– mmchnode –-nonquorum –N badnode1,badnode2
– Now you have 3 quorum nodes and 3 are running
– Running nodes remain in Active state
• You may have to clean up original quorum nodes once they
recover.
© Copyright IBM Corporation 2015
Recover from loss of primary configuration server
How to recover from loss of a primary configuration server
• Scenario: Primary cluster configuration server fails
– Secondary steps in
– Reassign primary
– mmchcluster –p newprimarynode
– To sync up the new primary
– mmchcluster –p LATEST
• Use the same process for failure of secondary cluster
configuration server.
© Copyright IBM Corporation 2015
How to recover: Loss of both configuration servers
How to recover from permanent loss of the primary configuration
and secondary servers
• Scenario: Primary and secondary cluster configuration servers
fail
– Cluster status
• Data is online
• mmgetstate -a no longer works
– Reassign both node configuration servers at the same time
– mmchcluster -p nodenewprimary –s nodenewsecondary
– Old servers still think they are in charge until
– mmchcluster –p Latest
© Copyright IBM Corporation 2015
I/O Concepts: Node config parameters (1 of 2)
• Spectrum Scale configuration parameters
• File system configuration
• Storage configuration
• Interconnect tuning
• Application design
© Copyright IBM Corporation 2015
I/O Concepts: Node config parameters (2 of 2)
• Spectrum Scale configuration parameters
• pagepool
• maxFilesToCache
• maxStatCache
• prefetchThreads
• socketRcvBufferSize
• socketSndBufferSize
• seqDiscardThreshhold
• Worker1threads
• generalWorkerThreads <<< Verify Role
• tokenMem <<< Verify Spelling
© Copyright IBM Corporation 2015
Performance tuning: File system parameters
• File system configuration
• -B Blocksize
• -n numnodes
• -l LogFileSize
• Storage Consideration
– RAID level
– RAID Strip Size (Segment Size)
– Storage Type: Flash, 10k RPM, 7,200 RPM
© Copyright IBM Corporation 2015
Performance tuning: Storage
• Storage configuration
– IO balance
• Even number of LUNS per path
• Even number of LUNS per NSD server
– RAID level
• Right level for IO pattern
– Cache configuration
• Read ahead not recommended for –j scatter
© Copyright IBM Corporation 2015
Performance tuning: Network configuration
• Interconnect selection and tuning
– TCP/IP tuning
– InfiniBand configuration
• Test Network performance with nsdperf
– In /usr/lpp/mmfs/samples/net/*
– README is documentation
– Some tools on Spectrum Scale Wiki
© Copyright IBM Corporation 2015
Performance tuning: Application design
• Make sure your applications are cluster aware
• Avoid and/or optimize for heavy operations
– Multiple nodes write locking a common entire file
• Use byte range locking
• Separate files
– Reduce use of IO heavy operations
• fsync()
– Some local node optimizations not applicable on cluster
• mmap
© Copyright IBM Corporation 2015
Direct I/O
• What is it?
– Bypass Spectrum Scale cache and transfer data directly from disk into
the user space buffer versus placing data in kernel memory.
• Why do I want it?
– Application with low file system cache hit rates
– Application with very large I/O
– When the application knows how to optimize IO
– Example: Oracle Database environment.
• How do I enable it?
– mmchattr –D yes <file_name>, or
– Specify O_DIRECT file access mode on the open() of the file.
© Copyright IBM Corporation 2015
RECOVERY SCENARIOS
© Copyright IBM Corporation 2015
How to recover: Node Quorum
How to recover from loss of majority for quorum nodes
• Scenario: You have 3 quorum nodes and 2 quorum nodes die.
– Nodes go into Arbitrating state
– Add 2 new quorum nodes
– mmchnode -–quorum –N node4,node5
– Now you have 5 quorum nodes and 3 are running
– Running nodes return to Active state
– Delete missing quorum nodes
– mmdelnode –N badnode1,badnode2
– or
– mmchnode –-nonquorum –N badnode1,badnode2
– Now you have 3 quorum nodes and 3 are running
– Running nodes remain in Active state
• You may have to clean up original quorum nodes once they
recover.
© Copyright IBM Corporation 2015
Recover from loss of primary configuration server
How to recover from loss of a primary configuration server
• Scenario: Primary cluster configuration server fails
– Secondary steps in
– Reassign primary
– mmchcluster –p newprimarynode
– To sync up the new primary
– mmchcluster –p LATEST
• Use the same process for failure of secondary cluster
configuration server.
© Copyright IBM Corporation 2015
How to recovery: Lose both configuration servers
How to recover from permanent loss of the primary configuration
and secondary servers
• Scenario: Primary and secondary cluster configuration servers
fail
– Cluster status
• Data is online
• mmgetstate -a no longer works
– Reassign both node configuration servers at the same time
– mmchcluster -p nodenewprimary –s nodenewsecondary
– Old servers still think they are in charge until
– mmchcluster –p Latest
© Copyright IBM Corporation 2015
Unit summary
Having completed this unit, you should be able to:
• Review some Spectrum Scale best practices
• Discuss approaches to Spectrum Scale performance tuning
• Describe recovery and recovery from quorum and cluster
configuration node failures
© Copyright IBM Corporation 2015

More Related Content

PPTX
Ibm spectrum scale fundamentals workshop for americas part 4 Replication, Str...
PPTX
Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...
PPTX
Ibm spectrum scale fundamentals workshop for americas part 3 Information Life...
PPTX
Ibm spectrum scale_backup_n_archive_v03_ash
PPTX
IBM Spectrum Scale Overview november 2015
PDF
IBM Spectrum Scale for File and Object Storage
PPTX
Ibm spectrum scale fundamentals workshop for americas part 5 spectrum scale_c...
PPT
Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...
Ibm spectrum scale fundamentals workshop for americas part 4 Replication, Str...
Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...
Ibm spectrum scale fundamentals workshop for americas part 3 Information Life...
Ibm spectrum scale_backup_n_archive_v03_ash
IBM Spectrum Scale Overview november 2015
IBM Spectrum Scale for File and Object Storage
Ibm spectrum scale fundamentals workshop for americas part 5 spectrum scale_c...
Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...

What's hot (20)

PPTX
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
PDF
Spectrum Scale Best Practices by Olaf Weiser
PPTX
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
PPTX
Ibm spectrum scale fundamentals workshop for americas part 7 spectrumscale el...
PDF
IBM Spectrum Scale for File and Object Storage
PPTX
IBM Platform Computing Elastic Storage
PPTX
Spectrum Scale Unified File and Object with WAN Caching
PDF
IBM Spectrum Scale Networking Flow
PPTX
Ibm spectrum scale fundamentals workshop for americas part 1 components archi...
PPTX
Engage for success ibm spectrum accelerate 2
PPTX
Spectrum Scale - Diversified analytic solution based on various storage servi...
PDF
S ss0885 spectrum-scale-elastic-edge2015-v5
PDF
Spectrum Scale final
PPTX
IBM Spectrum Scale Security
PDF
IBM #Softlayer infographic 2016
PPTX
Data OnTAP Cluster Mode Administrator
PDF
S016827 pendulum-swings-nola-v1710d
PPTX
Accel - EMC - Data Domain Series
PPTX
Award winning scale-up and scale-out storage for Xen
PDF
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
Spectrum Scale Best Practices by Olaf Weiser
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Ibm spectrum scale fundamentals workshop for americas part 7 spectrumscale el...
IBM Spectrum Scale for File and Object Storage
IBM Platform Computing Elastic Storage
Spectrum Scale Unified File and Object with WAN Caching
IBM Spectrum Scale Networking Flow
Ibm spectrum scale fundamentals workshop for americas part 1 components archi...
Engage for success ibm spectrum accelerate 2
Spectrum Scale - Diversified analytic solution based on various storage servi...
S ss0885 spectrum-scale-elastic-edge2015-v5
Spectrum Scale final
IBM Spectrum Scale Security
IBM #Softlayer infographic 2016
Data OnTAP Cluster Mode Administrator
S016827 pendulum-swings-nola-v1710d
Accel - EMC - Data Domain Series
Award winning scale-up and scale-out storage for Xen
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
Ad

Similar to Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Scale Quorum and I/O Concepts (8)

PPTX
9-clustering-.pptx
PPTX
Multi site Clustering with Windows Server 2008 Enterprise
PPTX
Disaster Recovery using Spectrum Scale Active File Management
PDF
S104872 spectrum nas-one-day-jburg-v1809e
PDF
Always on high availability best practices for informix
PDF
Informix HA Best Practices
PDF
Accelerate with ibm storage ibm spectrum virtualize hyper swap deep dive dee...
PPTX
So You've Lost Quorum: Lessons From Accidental Downtime
9-clustering-.pptx
Multi site Clustering with Windows Server 2008 Enterprise
Disaster Recovery using Spectrum Scale Active File Management
S104872 spectrum nas-one-day-jburg-v1809e
Always on high availability best practices for informix
Informix HA Best Practices
Accelerate with ibm storage ibm spectrum virtualize hyper swap deep dive dee...
So You've Lost Quorum: Lessons From Accidental Downtime
Ad

More from xKinAnx (20)

PPTX
Accelerate with ibm storage ibm spectrum virtualize hyper swap deep dive
PDF
Software defined storage provisioning using ibm smart cloud
PDF
Ibm spectrum virtualize 101
PDF
04 empalis -ibm_spectrum_protect_-_strategy_and_directions
PPTX
Ibm spectrum scale fundamentals workshop for americas part 6 spectrumscale el...
PDF
Presentation disaster recovery in virtualization and cloud
PDF
Presentation disaster recovery for oracle fusion middleware with the zfs st...
PDF
Presentation differentiated virtualization for enterprise clouds, large and...
PDF
Presentation desktops for the cloud the view rollout
PDF
Presentation design - key concepts and approaches for designing your deskto...
PDF
Presentation desarrollos cloud con oracle virtualization
PDF
Presentation deploying cloud based services
PDF
Presentation dell™ power vault™ md3
PDF
Presentation defend your company against cyber threats with security solutions
PDF
Presentation deduplication backup software and system
PDF
Presentation dc design for small and mid-size data center
PDF
Presentation db2 connections to db2 for z os
PDF
Presentation db2 best practices for optimal performance
PDF
Presentation data center partner technical
PDF
Presentation database security enhancements with oracle
Accelerate with ibm storage ibm spectrum virtualize hyper swap deep dive
Software defined storage provisioning using ibm smart cloud
Ibm spectrum virtualize 101
04 empalis -ibm_spectrum_protect_-_strategy_and_directions
Ibm spectrum scale fundamentals workshop for americas part 6 spectrumscale el...
Presentation disaster recovery in virtualization and cloud
Presentation disaster recovery for oracle fusion middleware with the zfs st...
Presentation differentiated virtualization for enterprise clouds, large and...
Presentation desktops for the cloud the view rollout
Presentation design - key concepts and approaches for designing your deskto...
Presentation desarrollos cloud con oracle virtualization
Presentation deploying cloud based services
Presentation dell™ power vault™ md3
Presentation defend your company against cyber threats with security solutions
Presentation deduplication backup software and system
Presentation dc design for small and mid-size data center
Presentation db2 connections to db2 for z os
Presentation db2 best practices for optimal performance
Presentation data center partner technical
Presentation database security enhancements with oracle

Recently uploaded (20)

PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
cuic standard and advanced reporting.pdf
PPT
Teaching material agriculture food technology
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Empathic Computing: Creating Shared Understanding
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
1. Introduction to Computer Programming.pptx
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
Machine Learning_overview_presentation.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
Spectral efficient network and resource selection model in 5G networks
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Network Security Unit 5.pdf for BCA BBA.
Mobile App Security Testing_ A Comprehensive Guide.pdf
cuic standard and advanced reporting.pdf
Teaching material agriculture food technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Approach and Philosophy of On baking technology
Advanced methodologies resolving dimensionality complications for autism neur...
Empathic Computing: Creating Shared Understanding
Building Integrated photovoltaic BIPV_UPV.pdf
20250228 LYD VKU AI Blended-Learning.pptx
1. Introduction to Computer Programming.pptx
Group 1 Presentation -Planning and Decision Making .pptx
Programs and apps: productivity, graphics, security and other tools
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Machine Learning_overview_presentation.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Assigned Numbers - 2025 - Bluetooth® Document

Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Scale Quorum and I/O Concepts

  • 1. Spectrum Scale 4.1 System Administration Quorums, Quorum Loss and recovery © Copyright IBM Corporation 2015
  • 2. Unit objectives After completing this unit, you should be able to: • Review Quorums • Review some Quorum best practices • Describe recovery from quorum and cluster configuration node failures © Copyright IBM Corporation 2015
  • 3. Spectrum Scale Quorum Nodes Why do we need Quorum nodes? • Quorum Nodes prevent multiple nodes from assuming the role of the file system manager • A majority of the quorum nodes MUST remain active in order for the cluster to sustain normal file system activity. • This prevents the cluster from becoming partitioned off where two different nodes think they are each in control and corrupt the file system © Copyright IBM Corporation 2015
  • 4. Best Practices: Quorum • Quorum – set of 3-7 nodes use to determine the state of the cluster • Small subset of nodes designated as quorum nodes • If there are only two nodes tiebreaker disks can be used to determine quorum • Recommendations – Use an odd number of nodes – Choose your most reliable nodes – Choose nodes in separate: cabinets, power systems – Use tiebreaker disks only on small clusters – For small clusters, a simple VM can serve as a quorum node – Quorum nodes should not be placed on extremely busy servers © Copyright IBM Corporation 2015
  • 5. RECOVERY SCENARIOS © Copyright IBM Corporation 2015
  • 6. How to recover: Node Quorum How to recover from loss of majority for quorum nodes • Scenario: You have 3 quorum nodes and 2 quorum nodes die. – Nodes go into Arbitrating state – Add 2 new quorum nodes – mmchnode -–quorum –N node4,node5 – Now you have 5 quorum nodes and 3 are running – Running nodes return to Active state – Delete missing quorum nodes – mmdelnode –N badnode1,badnode2 – or – mmchnode –-nonquorum –N badnode1,badnode2 – Now you have 3 quorum nodes and 3 are running – Running nodes remain in Active state • You may have to clean up original quorum nodes once they recover. © Copyright IBM Corporation 2015
  • 7. Recover from loss of primary configuration server How to recover from loss of a primary configuration server • Scenario: Primary cluster configuration server fails – Secondary steps in – Reassign primary – mmchcluster –p newprimarynode – To sync up the new primary – mmchcluster –p LATEST • Use the same process for failure of secondary cluster configuration server. © Copyright IBM Corporation 2015
  • 8. How to recover: Loss of both configuration servers How to recover from permanent loss of the primary configuration and secondary servers • Scenario: Primary and secondary cluster configuration servers fail – Cluster status • Data is online • mmgetstate -a no longer works – Reassign both node configuration servers at the same time – mmchcluster -p nodenewprimary –s nodenewsecondary – Old servers still think they are in charge until – mmchcluster –p Latest © Copyright IBM Corporation 2015
  • 9. I/O Concepts: Node config parameters (1 of 2) • Spectrum Scale configuration parameters • File system configuration • Storage configuration • Interconnect tuning • Application design © Copyright IBM Corporation 2015
  • 10. I/O Concepts: Node config parameters (2 of 2) • Spectrum Scale configuration parameters • pagepool • maxFilesToCache • maxStatCache • prefetchThreads • socketRcvBufferSize • socketSndBufferSize • seqDiscardThreshhold • Worker1threads • generalWorkerThreads <<< Verify Role • tokenMem <<< Verify Spelling © Copyright IBM Corporation 2015
  • 11. Performance tuning: File system parameters • File system configuration • -B Blocksize • -n numnodes • -l LogFileSize • Storage Consideration – RAID level – RAID Strip Size (Segment Size) – Storage Type: Flash, 10k RPM, 7,200 RPM © Copyright IBM Corporation 2015
  • 12. Performance tuning: Storage • Storage configuration – IO balance • Even number of LUNS per path • Even number of LUNS per NSD server – RAID level • Right level for IO pattern – Cache configuration • Read ahead not recommended for –j scatter © Copyright IBM Corporation 2015
  • 13. Performance tuning: Network configuration • Interconnect selection and tuning – TCP/IP tuning – InfiniBand configuration • Test Network performance with nsdperf – In /usr/lpp/mmfs/samples/net/* – README is documentation – Some tools on Spectrum Scale Wiki © Copyright IBM Corporation 2015
  • 14. Performance tuning: Application design • Make sure your applications are cluster aware • Avoid and/or optimize for heavy operations – Multiple nodes write locking a common entire file • Use byte range locking • Separate files – Reduce use of IO heavy operations • fsync() – Some local node optimizations not applicable on cluster • mmap © Copyright IBM Corporation 2015
  • 15. Direct I/O • What is it? – Bypass Spectrum Scale cache and transfer data directly from disk into the user space buffer versus placing data in kernel memory. • Why do I want it? – Application with low file system cache hit rates – Application with very large I/O – When the application knows how to optimize IO – Example: Oracle Database environment. • How do I enable it? – mmchattr –D yes <file_name>, or – Specify O_DIRECT file access mode on the open() of the file. © Copyright IBM Corporation 2015
  • 16. RECOVERY SCENARIOS © Copyright IBM Corporation 2015
  • 17. How to recover: Node Quorum How to recover from loss of majority for quorum nodes • Scenario: You have 3 quorum nodes and 2 quorum nodes die. – Nodes go into Arbitrating state – Add 2 new quorum nodes – mmchnode -–quorum –N node4,node5 – Now you have 5 quorum nodes and 3 are running – Running nodes return to Active state – Delete missing quorum nodes – mmdelnode –N badnode1,badnode2 – or – mmchnode –-nonquorum –N badnode1,badnode2 – Now you have 3 quorum nodes and 3 are running – Running nodes remain in Active state • You may have to clean up original quorum nodes once they recover. © Copyright IBM Corporation 2015
  • 18. Recover from loss of primary configuration server How to recover from loss of a primary configuration server • Scenario: Primary cluster configuration server fails – Secondary steps in – Reassign primary – mmchcluster –p newprimarynode – To sync up the new primary – mmchcluster –p LATEST • Use the same process for failure of secondary cluster configuration server. © Copyright IBM Corporation 2015
  • 19. How to recovery: Lose both configuration servers How to recover from permanent loss of the primary configuration and secondary servers • Scenario: Primary and secondary cluster configuration servers fail – Cluster status • Data is online • mmgetstate -a no longer works – Reassign both node configuration servers at the same time – mmchcluster -p nodenewprimary –s nodenewsecondary – Old servers still think they are in charge until – mmchcluster –p Latest © Copyright IBM Corporation 2015
  • 20. Unit summary Having completed this unit, you should be able to: • Review some Spectrum Scale best practices • Discuss approaches to Spectrum Scale performance tuning • Describe recovery and recovery from quorum and cluster configuration node failures © Copyright IBM Corporation 2015