SlideShare a Scribd company logo
2
Most read
3
Most read
9
Most read
PARALLEL AND DISTRIBUTED COMPUTING
FAULT TOLERANT DISTRIBUTED COMPUTING
FAULT TOLERANCE
 System ability to continue operating uninterrupted despite the
failure of one or more of its components
 How an OS Responds to and allows malfunctions and failures
 It guarantees no break in service
 Recovers from failure completely and transparently
FAULT TOLERANCE
 Every achievement in fault tolerance leads to a drawback
somewhere else
 The system will be slower, take more disk space, utilize more
machines and also increase other costs
 There for fault tolerance is always a trad-off between cost and
the degree of fault tolerance.
FAILUREVS ERROR
 System differs from expected behavior
 Failure might involve the system being unreachable or
producing incorrect output
 Error is incorrectness of system that may lead to a failure.
 Error do not must create failures but can be detect in the
system before they produce failure.
FAULT TOLERANCE
 Fault tolerance usually running through several phases.
 Error Detection: error has to be detect in order to avoid failure.
 Damage Confinement: it must prevent that the error spreads
through other components
 Error recovery: error must be removed, otherwise system would
run into failure
PROCESSOR FAULT
 Occur when the processor behaves in unexpected manner. It may
be classified into three kinds.
1. Fail Stop: totally failed and will never respond, neighboring
processors can detect the failed processor
2. Slowdown: processor might run in degraded form or might
totally fail
3. Byzantine: processor can fail, run in degraded fashion for some
time or execute at normal speed but tries to fail the computation
NETWORK FAULTS
 When processors are prevented from communicating with each
other. Link faults can cause new kinds of problems like
 One way Links: one processor can send messages but other
is not able to receive message.
 Network partition: network of portion is completely isolated
with other
ATTRIBUTES OF FAULT TOLERANT SYSTEM
Fault tolerance system is depended system which requires following
attributes
1. Availability: when system is in a ready state and ready to deliver tis
functions. Highly available systems works at a given instant in time.
2. Reliability: ability of computer to run continuously without failure, it is
defined as time interval instead of instant time. Reliable system works
constantly without interruption.
3. Safety: fails to carry out its corresponding processes correctly and
operations are incorrect but no major disastrous happened and also
doesn’t affect other system to be faulty
4. Maintainability: if failures can be notices and fixed easily.
TYPES OF FAILURE
CLASSIFICATION OF FAILURE
Transient:
Intermittent:
Permanent:
FAULT TOLERANCE MECHANISM IN DISTRIBUTED SYSTEM
 Replication based fault tolerance technique
 Process level redundancy technique
 Fusion based redundancy technique
REPLICATION BASED FAULTTOLERANCE TECHNIQUE
 Replicate the data on other machine. It will not cause the whole
system to stop.
 Replicate the data on different server.
 Problems of replication
 Consistency: major problem of replication is consistency
because of updating by any client. Consistency of data is
ensured by some model such as sequential, causal memory
consistency model
 Degree of replica: large number of replications are needed in
order to achieve high fault tolerance.
PROCESS LEVEL REDUNDANCY TECHNIQUES
 Faults that disappears without anything been done is called transient
faults.This type of faults are hard to identify
 Handling transient fault, software based fault tolerance technique
are used
 PLR Compares processes to ensure correct execution
 Check point and roll back are popular technique in which the
current state of system is done.
FUSION BASEDTECHNIQUE
 Replication: downside is multiple backups that increases cost
 This problem is solved by fusion based technique because it
requires fewer backup
 Backup machines are fused to a given set of system (NP-
Problem)
 Fusion based technique has very high overhead during recovery
process and it’s acceptable in low probability of fault in a
system.

More Related Content

PDF
Parallel and Distributed Computing Chapter 10
PDF
Parallel and Distributed Computing Chapter 8
PDF
Parallel and Distributed Computing Chapter 11
PDF
Parallel and Distributed Computing chapter 3
PPT
Parallel computing chapter 3
PDF
Os lab final
PPTX
Superscalar processor
PPT
Chapter 4 a interprocess communication
Parallel and Distributed Computing Chapter 10
Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 11
Parallel and Distributed Computing chapter 3
Parallel computing chapter 3
Os lab final
Superscalar processor
Chapter 4 a interprocess communication

What's hot (20)

PPTX
Distributed Operating Systems
PPTX
Parallel Distributed Systems and Heterogeneity.pptx
PDF
Cs8493 unit 2
PDF
MPI Tutorial
PDF
Practical Byzantine Fault Tolernace
PPT
Operating System - Monitors (Presentation)
PPTX
Concurrency Control in Distributed Systems.pptx
PPTX
Semophores and it's types
PPTX
Parallel programming model
PPT
Topic : ISDN(integrated services digital network) part 2
PPT
Switching
PDF
CS9222 ADVANCED OPERATING SYSTEMS
DOCX
Operating System Process Synchronization
PPTX
Server system architecture
PPTX
Multivector and multiprocessor
PPTX
Advance computer architecture
PPT
6 multiprogramming & time sharing
PPT
concurrency-control
PDF
Lecture 3 4. prnet
PPTX
Architectural Development Tracks
Distributed Operating Systems
Parallel Distributed Systems and Heterogeneity.pptx
Cs8493 unit 2
MPI Tutorial
Practical Byzantine Fault Tolernace
Operating System - Monitors (Presentation)
Concurrency Control in Distributed Systems.pptx
Semophores and it's types
Parallel programming model
Topic : ISDN(integrated services digital network) part 2
Switching
CS9222 ADVANCED OPERATING SYSTEMS
Operating System Process Synchronization
Server system architecture
Multivector and multiprocessor
Advance computer architecture
6 multiprogramming & time sharing
concurrency-control
Lecture 3 4. prnet
Architectural Development Tracks
Ad

Similar to Parallel and Distributed Computing Chapter 12 (20)

PPTX
Fault tolerance in distributed systems
PPTX
Fault tol final ppt.pptx
PDF
Developing fault tolerance integrity protocol for distributed real time systems
PPT
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
PDF
A Brief Review Of Approaches For Fault Tolerance In Distributed Systems
PPT
Lecture07_FaultTolerance in parallel and distributing
PPT
Lecture07_FaultTolerance in parallel and distributed
PDF
Fault tolerance
PPSX
Foult Tolerence In Distributed System
PPTX
Lect 2 Types of Distributed Systems.pptx
PPTX
Unit_4_Fault_Tolerance.pptx
PPTX
Fault Tolerance System
PPTX
Fault Tolerance in Distributed System
PPT
Introduction to Distributing Computing 5-13.ppt
PDF
CSL Seminar presented by Cassiano Campes - 17-03-13
PPTX
fault tolerance1.pptx
PDF
CS9222 ADVANCED OPERATING SYSTEMS
PPTX
CBS3209-4-High Level Fault Tolerant Techniques.pptx
PDF
Distributed computing for new bloods
PPTX
Fault tolerance techniques
Fault tolerance in distributed systems
Fault tol final ppt.pptx
Developing fault tolerance integrity protocol for distributed real time systems
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
A Brief Review Of Approaches For Fault Tolerance In Distributed Systems
Lecture07_FaultTolerance in parallel and distributing
Lecture07_FaultTolerance in parallel and distributed
Fault tolerance
Foult Tolerence In Distributed System
Lect 2 Types of Distributed Systems.pptx
Unit_4_Fault_Tolerance.pptx
Fault Tolerance System
Fault Tolerance in Distributed System
Introduction to Distributing Computing 5-13.ppt
CSL Seminar presented by Cassiano Campes - 17-03-13
fault tolerance1.pptx
CS9222 ADVANCED OPERATING SYSTEMS
CBS3209-4-High Level Fault Tolerant Techniques.pptx
Distributed computing for new bloods
Fault tolerance techniques
Ad

More from AbdullahMunir32 (16)

PDF
Mobile Application Development-Lecture 15 & 16.pdf
PDF
Mobile Application Development-Lecture 13 & 14.pdf
PDF
Mobile Application Development -Lecture 11 & 12.pdf
PDF
Mobile Application Development -Lecture 09 & 10.pdf
PDF
Mobile Application Development -Lecture 07 & 08.pdf
PDF
Mobile Application Development Lecture 05 & 06.pdf
PDF
Mobile Application Development-Lecture 03 & 04.pdf
PDF
Mobile Application Development-Lecture 01 & 02.pdf
PDF
Parallel and Distributed Computing Chapter 13
PDF
Parallel and Distributed Computing Chapter 9
PDF
Parallel and Distributed Computing Chapter 7
PDF
Parallel and Distributed Computing Chapter 6
PDF
Parallel and Distributed Computing Chapter 5
PDF
Parallel and Distributed Computing Chapter 4
PDF
Parallel and Distributed Computing Chapter 2
PDF
Parallel and Distributed Computing chapter 1
Mobile Application Development-Lecture 15 & 16.pdf
Mobile Application Development-Lecture 13 & 14.pdf
Mobile Application Development -Lecture 11 & 12.pdf
Mobile Application Development -Lecture 09 & 10.pdf
Mobile Application Development -Lecture 07 & 08.pdf
Mobile Application Development Lecture 05 & 06.pdf
Mobile Application Development-Lecture 03 & 04.pdf
Mobile Application Development-Lecture 01 & 02.pdf
Parallel and Distributed Computing Chapter 13
Parallel and Distributed Computing Chapter 9
Parallel and Distributed Computing Chapter 7
Parallel and Distributed Computing Chapter 6
Parallel and Distributed Computing Chapter 5
Parallel and Distributed Computing Chapter 4
Parallel and Distributed Computing Chapter 2
Parallel and Distributed Computing chapter 1

Recently uploaded (20)

PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
UNDER FIVE CLINICS OR WELL BABY CLINICS.pptx
PPTX
Open Quiz Monsoon Mind Game Prelims.pptx
PPTX
Introduction and Scope of Bichemistry.pptx
PDF
Electrolyte Disturbances and Fluid Management A clinical and physiological ap...
PDF
Piense y hagase Rico - Napoleon Hill Ccesa007.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
Pharma ospi slides which help in ospi learning
PDF
Mga Unang Hakbang Tungo Sa Tao by Joe Vibar Nero.pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
Nursing Management of Patients with Disorders of Ear, Nose, and Throat (ENT) ...
PDF
Pre independence Education in Inndia.pdf
PPTX
Cell Structure & Organelles in detailed.
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
NOI Hackathon - Summer Edition - GreenThumber.pptx
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Renaissance Architecture: A Journey from Faith to Humanism
UNDER FIVE CLINICS OR WELL BABY CLINICS.pptx
Open Quiz Monsoon Mind Game Prelims.pptx
Introduction and Scope of Bichemistry.pptx
Electrolyte Disturbances and Fluid Management A clinical and physiological ap...
Piense y hagase Rico - Napoleon Hill Ccesa007.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Pharma ospi slides which help in ospi learning
Mga Unang Hakbang Tungo Sa Tao by Joe Vibar Nero.pdf
O7-L3 Supply Chain Operations - ICLT Program
Nursing Management of Patients with Disorders of Ear, Nose, and Throat (ENT) ...
Pre independence Education in Inndia.pdf
Cell Structure & Organelles in detailed.
human mycosis Human fungal infections are called human mycosis..pptx
102 student loan defaulters named and shamed – Is someone you know on the list?
NOI Hackathon - Summer Edition - GreenThumber.pptx

Parallel and Distributed Computing Chapter 12

  • 1. PARALLEL AND DISTRIBUTED COMPUTING FAULT TOLERANT DISTRIBUTED COMPUTING
  • 2. FAULT TOLERANCE  System ability to continue operating uninterrupted despite the failure of one or more of its components  How an OS Responds to and allows malfunctions and failures  It guarantees no break in service  Recovers from failure completely and transparently
  • 3. FAULT TOLERANCE  Every achievement in fault tolerance leads to a drawback somewhere else  The system will be slower, take more disk space, utilize more machines and also increase other costs  There for fault tolerance is always a trad-off between cost and the degree of fault tolerance.
  • 4. FAILUREVS ERROR  System differs from expected behavior  Failure might involve the system being unreachable or producing incorrect output  Error is incorrectness of system that may lead to a failure.  Error do not must create failures but can be detect in the system before they produce failure.
  • 5. FAULT TOLERANCE  Fault tolerance usually running through several phases.  Error Detection: error has to be detect in order to avoid failure.  Damage Confinement: it must prevent that the error spreads through other components  Error recovery: error must be removed, otherwise system would run into failure
  • 6. PROCESSOR FAULT  Occur when the processor behaves in unexpected manner. It may be classified into three kinds. 1. Fail Stop: totally failed and will never respond, neighboring processors can detect the failed processor 2. Slowdown: processor might run in degraded form or might totally fail 3. Byzantine: processor can fail, run in degraded fashion for some time or execute at normal speed but tries to fail the computation
  • 7. NETWORK FAULTS  When processors are prevented from communicating with each other. Link faults can cause new kinds of problems like  One way Links: one processor can send messages but other is not able to receive message.  Network partition: network of portion is completely isolated with other
  • 8. ATTRIBUTES OF FAULT TOLERANT SYSTEM Fault tolerance system is depended system which requires following attributes 1. Availability: when system is in a ready state and ready to deliver tis functions. Highly available systems works at a given instant in time. 2. Reliability: ability of computer to run continuously without failure, it is defined as time interval instead of instant time. Reliable system works constantly without interruption. 3. Safety: fails to carry out its corresponding processes correctly and operations are incorrect but no major disastrous happened and also doesn’t affect other system to be faulty 4. Maintainability: if failures can be notices and fixed easily.
  • 11. FAULT TOLERANCE MECHANISM IN DISTRIBUTED SYSTEM  Replication based fault tolerance technique  Process level redundancy technique  Fusion based redundancy technique
  • 12. REPLICATION BASED FAULTTOLERANCE TECHNIQUE  Replicate the data on other machine. It will not cause the whole system to stop.  Replicate the data on different server.
  • 13.  Problems of replication  Consistency: major problem of replication is consistency because of updating by any client. Consistency of data is ensured by some model such as sequential, causal memory consistency model  Degree of replica: large number of replications are needed in order to achieve high fault tolerance.
  • 14. PROCESS LEVEL REDUNDANCY TECHNIQUES  Faults that disappears without anything been done is called transient faults.This type of faults are hard to identify  Handling transient fault, software based fault tolerance technique are used  PLR Compares processes to ensure correct execution  Check point and roll back are popular technique in which the current state of system is done.
  • 15. FUSION BASEDTECHNIQUE  Replication: downside is multiple backups that increases cost  This problem is solved by fusion based technique because it requires fewer backup  Backup machines are fused to a given set of system (NP- Problem)  Fusion based technique has very high overhead during recovery process and it’s acceptable in low probability of fault in a system.