SlideShare a Scribd company logo
1
CST402 DISTRIBUTED COMPUTING
MODULE 1
Syllabus- Distributed systems basics and
Computation model
Distributed System – Definition, Relation to computer system
components, Motivation, Primitives for distributed communication,
Design issues, Challenges and applications.
A model of distributed computations – Distributed program, Model of
distributed executions, Models of communication networks, Global
state of a distributed system, Cuts of a distributed computation, Past
and future cones of an event, Models of process communications.
2
Distributed System
A distributed system is a collection of independent entities that
cooperate to solve a problem that cannot be individually solved
A distributed system can be characterized as a collection of mostly
autonomous processors communicating over a communication
network
3
Distributed system has been
characterized in one of several ways
1. You know you are using one when the crash of a computer you have never heard of
prevents you from doing work--- prevents losing data in a computer crash
2. A collection of computers that do not share common memory or a common physical
clock, that communicate by a messages passing over a communication network, and
where each computer has its own memory and runs its own operating system
3. A collection of independent computers that appears to the users of the system as a
single coherent computer
4. A term that describes a wide range of computers, from weakly coupled systems such
as wide-area networks, to strongly coupled systems such as local area networks, to
very strongly coupled systems such as multiprocessor systems
4
Features of DS
No common physical clock: This is an important assumption because it introduces the element
of “distribution” in the system and gives rise to the inherent asynchrony amongst the processors.
No shared memory : This is a key feature that requires message-passing for communication.
This feature implies the absence of the common physical clock
Geographical separation: The geographically wider apart that the processors are, the more
representative is the system of a distributed system.
WAN
NOW/COW(network/cluster of workstations)--- eg, Google search engine
Autonomy and heterogeneity: The processors are “loosely coupled” in that they have different
speeds and each can be running a different operating system, cooperate with one another by
offering services or solving a problem jointly.
5
Relation to computer system
components
Each computer has a memory-processing unit and the computers are connected by a
communication network
6
Relationships of the software components that run on each of the
computers and use the local operating system and network protocol
stack for functioning
7
●The distributed software is also termed as middleware.
●A distributed execution is the execution of processes across the
distributed system to collaboratively achieve a common goal.An
execution is also sometimes termed a computation or a run.
●The distributed system uses a layered architecture to break down the complexity
of system design.
●The middleware is the distributed software that drives the distributed system,
while providing transparency of heterogeneity at the platform level
8
●The middleware layer does not contain the traditional application layer functions
of the network protocol stack, such as http, mail, ftp, and telnet.
●Various primitives and calls to functions defined in various libraries of the
middleware layer are embedded in the user program code.
●There exist several libraries to choose from to invoke primitives for the more
common functions – such as reliable and ordered multicasting – of the
middleware layer
●There are several standards such as Object Management Group’s (OMG)
common object request broker architecture (CORBA) and the remote procedure
call (RPC) mechanism.
9
Motivation/Benefits of DS
1. Inherently distributed computations: money transfer in banking, or reaching consensus among parties
that are geographically distant- computation is inherently distributed.
2. Resource sharing: eg-distributed databases such as DB2 partition the data sets across several servers,
in addition to replicating them at a few sites for rapid access as well as reliability
3. Access to geographically remote data and resources: data cannot be replicated at every site
participating in the distributed execution because it may be too large or too sensitive to be replicated
4. Enhanced reliability:
◦ Availability :- The resource should be accessible at all times.
◦ Integrity:- the value/state of the resource should be correct, in the face of concurrent access from
multiple processors, as per the semantics expected by the application.
◦ Fault-tolerance :- The ability to recover from system failures.
10
Motivation/Benefits of DS
5. Increased performance/cost ratio: By resource sharing and accessing geographically remote data
and resources, the performance/cost ratio is increased. Any task can be partitioned across the various
computers in the distributed system.
6. Scalability
7. Modularity and incremental expandability
11
Distributed Vs Parallel computing
12
13
Primitives for distributed
communication
Blocking/non-blocking, synchronous/asynchronous primitives
Processor synchrony
Libraries and standards
14
Blocking/non-blocking,
synchronous/asynchronous primitives
Message send and message receive communication primitives are denoted Send() and
Receive(), respectively.
15
There are two ways of sending data when the Send primitive is
invoked :
◦ Buffered option :
◦ The buffered option which is the standard option copies the data from the user buffer to
the kernel buffer.
◦ The data later gets copied from the kernel buffer onto the network.
◦ Unbuffered option :
◦ In the unbuffered option, the data gets copied directly from the user buffer onto the
network.
For the Receive primitive, the buffered option is usually required because the data may already
have arrived when the primitive is invoked, and needs a storage place in the kernel.
16
Blocking/non-blocking, synchronous/asynchronous
primitives - shortcut
17
Let’s Understand with Example - Synchronous
18
Synchronous primitives
● A Send or a Receive primitive is synchronous if both
the Send() and Receive() handshake with each other.
● The processing for the Send primitive completes
only after the invoking processor learns that the
other corresponding Receive primitive has also been
invoked and that the receive operation has been
completed.
● The processing for the Receive primitive completes
when the data to be received is copied into the
receiver’s user buffer.
19
Let’s Understand with Example- Asynchronous
20
Asynchronous primitives
A Send primitive is said to be asynchronous if
control returns back to the invoking process
after the data item to be sent has been copied
out of the user-specified buffer.
It does not make sense to define asynchronous
Receive primitives.
21
Let’s Understand with Example- Blocking
22
Blocking primitives
A primitive is blocking if control returns to the invoking process after
the processing for the primitive (whether in synchronous or
asynchronous mode) completes.
23
Let’s Understand with Example- Non Blocking
24
Non-blocking primitives
A primitive is non-blocking if control returns back to the invoking process
immediately after invocation, even though the operation has not completed.
For a non-blocking Send, control returns to the process even before the data is
copied out of the user buffer.
For a non-blocking Receive, control returns to the process even before the data
may have arrived from the sender.
25
For non-blocking primitives, a return parameter on the primitive call returns a
system-generated handle which can be later used to check the status of
completion of the call.
The process can check for the completion of the call in two ways.
1. First, it can keep checking (in a loop or periodically) if the handle has been
flagged or posted.
2. Second, it can issue a Wait with a list of handles as parameters.
26
The Wait call usually blocks until one of the parameter handles is posted.
Presumably after issuing the primitive in non-blocking mode, the process has
done whatever actions it could and now needs to know the status of completion
of the call, therefore using a blocking Wait() call is usual programming practice.
27
● If at the time that Wait() is issued, the processing for the primitive has
completed, the Wait() returns immediately
● The completion of the processing of the primitive is detectable by checking the
value of handleK .
● If the processing of the primitive has not completed, the Wait blocks and waits
for a signal to wake it up.
● When the processing for the primitive completes, the communication subsystem
software sets the value of handleK and wakes up (signals) any process with a
Wait call blocked on this handleK .
● This is called posting the completion of the operation.
28
Versions of the Send and Receive
primitive
Four versions of Send primitive :
1. Blocking synchronous Send
2. Non-blocking synchronous Send
3. Blocking asynchronous Send
4. Non-blocking asynchronous Send
Two versions of Receive primitive :
5. Blocking (synchronous)Receive
6. Non-blocking (synchronous)Receive
29
30
Blocking synchronous Send
● The data gets copied from the user
buffer to the kernel buffer and is then
sent over the network.
● After the data is copied to the
receiver’s system buffer and a
Receive call has been issued, an
acknowledgement back to the sender
causes control to return to the
process that invoked the Send
operation and completes the Send
31
Blocking Receive
● The Receive call blocks until the
data expected arrives and is
written in the specified user buffer.
● Then control is returned to the user
process.
32
Non-blocking Synchronous Send
●Control returns back to the invoking process as soon as the
copy of data from the user buffer to the kernel buffer is
initiated.
●A parameter in the non-blocking call also gets set with the
handle of a location that the user process can later check for
the completion of the synchronous send operation.
●The location gets posted after an acknowledgement returns
from the receiver
●The user process can keep checking for the completion of
the non-blocking synchronous Send by testing the returned
handle, or it can invoke the blocking Wait operation on the
returned handle
33
Non-blocking Receive
●The Receive call will cause the kernel to register
the call and return the handle of a location that the
user process can later check for the completion of
the non-blocking Receive operation.
●This location gets posted by the kernel after the
expected data arrives and is copied to the user-
specified buffer.
●The user process can check for the completion of
the non-blocking Receive by invoking the Wait
operation on the returned handle.
34
35
Blocking asynchronous Send
●The user process that invokes the Send is
blocked until the data is copied from the
user’s buffer to the kernel buffer.
●For the unbuffered option, the user process
that invokes the Send is blocked until the
data is copied from the user’s buffer to the
network.
36
Non-blocking asynchronous Send
●The user process that invokes the Send is blocked
until the transfer of the data from the user’s buffer to
the kernel buffer is initiated.
●Control returns to the user process as soon as this
transfer is initiated, and a handle is given back.
●The asynchronous Send completes when the data has
been copied out of the user’s buffer.
●The checking for the completion may be necessary if
the user wants to reuse the buffer from which the data
was sent.
37
38
●A synchronous Send is easier to use from a programmer’s perspective because the
handshake between the Send and the Receive makes the communication appear
instantaneous, thereby simplifying the program logic.
●The Receive may not get issued until much after the data arrives at Pj, in which case
the data arrived would have to be buffered in the system buffer at Pj and not in the
user buffer. At the same time, the sender would remain blocked. Thus, a synchronous
Send lowers the efficiency within process Pi.
●The non-blocking asynchronous Send is useful when a large data item is being sent because
it allows the process to perform other instructions in parallel with the completion of the Send.
●The non-blocking synchronous Send also avoids the potentially large delays for
handshaking, particularly when the receiver has not yet issued the Receive call.
39
●The non-blocking Receive is useful when a large data item is being received and/or when
the sender has not yet issued the Send call,
○because it allows the process to perform other instructions in parallel with the completion
of the Receive.
○If the data has already arrived, it is stored in the kernel buffer, and it may take a while to
copy it to the user buffer specified in the Receive call.
●For non-blocking calls, however, the burden on the programmer increases because he or she
has to keep track of the completion of such operations in order to meaningfully reuse (write
to or read from) the user buffers. Thus, conceptually, blocking primitives are easier to use.
40
Processor synchrony
Processor synchrony indicates that all the processors execute in lock-step with
their clocks synchronized.
As this synchrony is not attainable in a distributed system, what is more
generally indicated is that for a large granularity of code, usually termed as a
step, the processors are synchronized.
This abstraction is implemented using some form of barrier synchronization to
ensure that no processor begins executing the next step of code until all the
processors have completed executing the previous steps of code assigned to
each of the processors.
41
Processor synchrony
1. Processor Synchrony : This means that all the computers or processors in a system
work together perfectly, like synchronized dancers following the same rhythm. Their
internal clocks are all perfectly in sync.
2. In Distributed systems: In reality, achieving perfect synchrony among all processors in
a distributed system is very difficult or impossible. So, what we do instead is
synchronize them in a different way.
3. Synchronization at a higher level: Instead of making every little action perfectly
synchronized, we group many actions into larger chunks called steps. Think of these
steps like dance routines.
4. Barrier synchronization: To make sure these steps are performed in sync, we use a
mechanism called barrier synchronization. It’s like a checkpoint in a dance routine. No
dancer can move to the next step until everyone has completed the current one.
Similarly, in a distributed system, no processor can move on to the next step of their
work until all processors have finished their current step.
42
Design issues and challenges
We describe design issues and challenges after categorizing them as
1. having a greater component related to systems design and operating
systems design ( from system perspective)
2. having a greater component related to algorithm design ( algorithmic
challenges)
3. emerging from recent technology advances and/or driven by new
applications (application or technology driven)
43
Distributed systems challenges from a system
perspective
The following functions must be addressed when designing and building a distributed system:
1. Communication
2. Processes
3. Naming
4. Synchronization
5. Data storage and access
6. Consistency and replication
7. Fault tolerance
8. Security
9. Applications Programming Interface (API) and transparency
10. Scalability and modularity
44
45
46
47
48
49
Algorithmic challenges in distributed computing
❑Designing useful execution models and frameworks
❑Dynamic distributed graph algorithms and distributed routing algorithms
❑Time and global state in a distributed system
❑Synchronization/coordination mechanisms
❑Group communication, multicast, and ordered message delivery
❑Monitoring distributed events and predicates
❑Distributed program design and verification tools
❑Debugging distributed programs
❑Data replication, consistency models, and caching
50
Designing useful execution models and frameworks
●The interleaving model and partial order model are two widely
adopted models of distributed system executions.
●They have proved to be particularly useful for operational reasoning
and the design of distributed algorithms.
●The input/output automata model and the TLA (temporal logic of
actions) are two other examples of models that provide different
degrees of infrastructure for reasoning more formally with and
proving the correctness of distributed programs
51
Dynamic distributed graph algorithms and distributed
routing algorithms
● The distributed system is modeled as a distributed graph, and the graph
algorithms form the building blocks for a large number of higher level
communication, data dissemination, object location, and object search
functions.
● The algorithms need to deal with dynamically changing graph
characteristics, such as to model varying link loads in a routing
algorithm.
● The efficiency of these algorithms impacts not only the user-perceived
latency but also the traffic and hence the load or congestion in the
network.
● Hence, the design of efficient distributed graph algorithms is of
paramount importance
52
Time and global state in a distributed system
●The challenges pertain to providing accurate physical time, and to providing a
variant of time, called logical time.
●Logical time is relative time, and eliminates the overheads of providing physical
time for applications where physical time is not required. More importantly,
logical time can
○ capture the logic and inter-process dependencies within the distributed
program, and also
○ track the relative progress at each process.
It is not possible for any one process to directly observe a meaningful global state
across all the processes, without using extra state-gathering effort which needs to
be done in a coordinated manner
53
Synchronization/coordination mechanisms
●The processes must be allowed to execute concurrently, except when they
need to synchronize to exchange information, i.e., communicate about shared
data.
●Synchronization is essential for the distributed processes to overcome the
limited observation of the system state from the viewpoint of any one process.
●Overcoming this limited observation is necessary for taking any actions that
would impact other processes.
●The synchronization mechanisms can also be viewed as resource
management and concurrency management mechanisms to streamline the
behavior of the processes that would otherwise act independently.
54
Examples of Problems Requiring Synchronization
● Physical clock synchronization
● Leader election
● Mutual exclusion
● Deadlock detection and resolution
● Termination detection
● Garbage collection
55
56
Group communication, multicast, and ordered
message delivery
● A group is a collection of processes that share a common context and
collaborate on a common task within an application domain.
● Specific algorithms need to be designed to enable efficient group
communication and group management wherein processes can join and
leave groups dynamically, or even fail.
● When multiple processes send messages concurrently, different
recipients may receive the messages in different orders, possibly
violating the semantics of the distributed program.
● Hence, formal specifications of the semantics of ordered delivery need to
be formulated, and then implemented.
57
Monitoring distributed events and predicates
● Predicates defined on program variables that are local to different
processes are used for specifying conditions on the global system state,
and are useful for applications such as debugging, sensing the
environment, and in industrial process control.
● On-line algorithms for monitoring such predicates are hence important.
● An important paradigm for monitoring distributed events is that of event
streaming, wherein streams of relevant events reported from different
processes are examined collectively to detect predicates.
● Typically, the specification of such predicates uses physical or logical
time relationships.
58
Distributed program design and verification tools
● Methodically designed and verifiably correct programs can greatly
reduce the overhead of software design, debugging, and engineering.
● Designing mechanisms to achieve these design and verification goals
is a challenge.
59
Debugging distributed programs
● Debugging sequential programs is hard; debugging distributed
programs is that much harder because of the concurrency in actions
and the ensuing uncertainty due to the large number of possible
executions defined by the interleaved concurrent actions.
● Adequate debugging mechanisms and tools need to be designed to
meet this challenge.
60
Data replication, consistency models, and caching
● Fast access to data and other resources requires them to be
replicated in the distributed system.
● Managing such replicas in the face of updates introduces the
problems of ensuring consistency among the replicas and cached
copies.
● Additionally, placement of the replicas in the systems is also a
challenge because resources usually cannot be freely replicated.
61
World Wide Web design – caching, searching,
scheduling
● Minimizing response time to minimize user-perceived latencies is an
important challenge.
● Object search and navigation on the web are important functions in the
operation of the web, and are very resource-intensive.
● Designing mechanisms to do this efficiently and accurately is a great
challenge.
62
Distributed shared memory abstraction
● A shared memory abstraction simplifies the task of the programmer
because he or she has to deal only with read and write operations, and
no message communication primitives.
● However, under the covers in the middleware layer, the abstraction of a
shared address space has to be implemented by using message-passing.
● Hence, in terms of overheads, the shared memory abstraction is not less
expensive.
63
64
Reliable and fault-tolerant distributed systems
A reliable and fault-tolerant environment has multiple requirements and aspects,
and these can be addressed using various strategies:
● Consensus algorithms
● Replication and replica management
● Voting and quorum systems
● Distributed databases and distributed commit
● Self-stabilizing systems
● Checkpointing and recovery algorithms
● Failure detectors
65
66
67
Load balancing
● The goal of load balancing is to gain higher throughput, and reduce the user-perceived latency.
● Load balancing may be necessary because of a variety of factors such as high network traffic or
high request rate causing the network connection to be a bottleneck, or high computational load.
● A common situation where load balancing is used is in server farms, where the objective is to
service incoming client requests with the least turnaround time.
The following are some forms of load balancing:
● Data migration: The ability to move data (which may be replicated) around in the system, based
on the access pattern of the users.
● Computation migration: The ability to relocate processes in order to perform a redistribution of
the workload.
● Distributed scheduling: This achieves a better turnaround time for the users by using idle
processing power in the system more efficiently.
68
Applications of distributed computing and newer
challenges
1. Mobile systems
2. Sensor networks
3. Ubiquitous or pervasive computing
4. Peer-to-peer computing
5. Publish-subscribe, content distribution, and multimedia
6. Distributed agents
7. Distributed data mining
8. Grid computing
9. Security in distributed systems
69
1. Mobile systems
70
2. Sensor networks &
3. Ubiquitous or pervasive computing
A sensor is a processor with an electro-mechanical interface that is capable of
sensing physical parameters, such as temperature, velocity, pressure, humidity,
and chemicals. Sensors may be mobile or static;
71
4. Peer-to-peer computing &
5. Publish-subscribe, content distribution, and multimedia
72
6. Distributed agents
● Agents collect and process information, and can exchange such
information with other agents.
● Often, the agents cooperate as in an ant colony, but they can also have
friendly competition, as in a free market economy.
● Challenges in distributed agent systems include coordination mechanisms
among the agents, controlling the mobility of the agents, and their software
design and interfaces.
● Research in agents is interdisciplinary: spanning artificial intelligence,
mobile computing, economic market models, software engineering, and
distributed computing.
73
7. Distributed data mining
● The data is necessarily distributed and cannot be collected in a
single repository, as in banking applications where the data is
private and sensitive,
● or in atmospheric weather prediction where the data sets are far too
massive to collect and process at a single repository in real-time.
74
8. Grid computing
○Grid Computing is a subset of distributed computing, where a virtual
supercomputer comprises machines on a network connected by some bus,
mostly Ethernet or sometimes the Internet.
○It can also be seen as a form of Parallel Computing where instead of many
CPU cores on a single machine, it contains multiple cores spread across
various locations.
○Many challenges in making grid computing a reality include:
○scheduling jobs in such a distributed environment,
○a framework for implementing quality of service and real-time guarantees,
○Security of individual machines as well as of jobs being executed in this
setting.
75
9. Security in distributed systems
● The traditional challenges of security in a distributed setting include:
◦ Confidentiality (ensuring that only authorized processes can access
certain information),
◦ Authentication (ensuring the source of received information and the
identity of the sending process), and
◦ Availability (maintaining allowed access to services despite malicious
actions).
● The goal is to meet these challenges with efficient and scalable
solutions.
● These basic challenges have been addressed in traditional distributed
settings.
76
A model of distributed computations
❑A distributed system consists of a set of processors that are
connected by a communication network.
❑The communication network provides the facility of information
exchange among processors.
❑The communication delay is finite but unpredictable.
❑The processors do not share a common global memory and
communicate solely by passing messages over the communication
network.
77
78
❑There is no physical global clock in the system to which processes have
instantaneous access.
❑The communication medium may deliver messages out of order, messages may
be lost, garbled, or duplicated due to timeout and retransmission, processors
may fail, and communication links may go down.
❑The system can be modeled as a directed graph in which vertices represent the
processes and edges represent unidirectional communication channels.
❑A distributed application runs as a collection of processes on a distributed
system
A distributed program
A distributed program is composed of a set of n asynchronous processes
p1, p2,... , pi,... , pn that communicate by message passing over the
communication network.
◦ we assume that each process is running on a different processor
The processes do not share a global memory and communicate solely by
passing messages.
❑Cij: denote the channel from process pi to process pj
❑mij: denote a message sent by pi to pj
79
◆The communication delay is finite and unpredictable.
◆Also, these processes do not share a global clock that is
instantaneously accessible to these processes
◆Process execution and message transfer are asynchronous
◆ a process may execute an action spontaneously and a process sending a
message does not wait for the delivery of the message to be complete.
80
❖The global state of a distributed computation is composed of
the states of the processes and the communication channels
● The state of a process is characterized by the state of its local
memory and depends upon the context.
● The state of a channel is characterized by the set of messages
in transit in the channel.
81
A model of distributed executions
● The execution of a process consists of a sequential execution
of its actions.
● The actions are atomic and the actions of a process are
modeled as three types of events:
● Internal events
● Message send events, and
● Message receive events
82
83
84
Space–Time Diagram of a Distributed Execution involving Three
Processes
85
1. A horizontal line represents the progress of the process.
2. A dot indicates an event.
3. A slant arrow indicates a message transfer.
Causal precedence relation
86
87
For example, in Figure 2.1,
For example, in Figure 2.1,event e2
6
has the knowledge of all other
events shown in the figure.
88
For any two events ei and ej, denotes:
◦ Event ej does not directly or transitively dependent on event ei
◦ ie. Event ei does not causally affect event ej.
◦ Event ej is not aware of the execution of ei or any event executed
after ei on the same process.
For example, in Figure 2.1
Note the following two rules:
For any two events ei and ej, if and then events ei and
ej are said to be concurrent and the relation is denoted as ei || ej
Note that relation || is not transitive, ie.
For example, in Figure 2.1, however,
Note that for any two events ei and ej in a distributed execution, ei → ej
or ej → ei, or ei || ej
89
Logical vs. Physical Concurrency
In a distributed computation, two events are logically concurrent if and only if they do not
causally affect each other.
Physical concurrency, on the other hand, has a meaning that the events occur at the same
instant in physical time.
Two or more events may be logically concurrent even though they do not occur at the same
instant in physical time.
However, if processor speed and message delays would have been different, the execution of
these events could have very well coincided in physical time.
Whether a set of logically concurrent events coincide in the physical time or not, does not
change the outcome of the computation.
Therefore, even though a set of logically concurrent events may not have occurred at the same
instant in physical time, we can assume that these events occurred at the same instant in
physical time.
90
Models of communication networks
There are several models of the service provided by communication
networks, namely,
◦ FIFO (first-in, first-out): each channel acts as a first-in first-out message queue and
thus, message ordering is preserved by a channel
◦ non-FIFO: a channel acts like a set in which the sender process adds messages and
the receiver process removes messages from it in a random order
◦ causal ordering: is based on Lamport’s “happens before” relation. A system that
supports the causal ordering model satisfies the following property:
◦ This property ensures that causally related messages destined to the same
destination are delivered in an order that is consistent with their causality relation.
◦ Causally ordered delivery of messages implies FIFO message delivery.Furthermore,
note that
91
● Causal ordering model is useful in developing distributed algorithms.
● Generally, it considerably simplifies the design of distributed algorithms
because it provides a built-in synchronization.
● For example, in replicated database systems, it is important that every process
responsible for updating a replica receives the updates in the same order to
maintain database consistency.
● Without causal ordering, each update must be checked to ensure that database
consistency is not being violated. Causal ordering eliminates the need for such
checks.
92
Global state of a distributed system
● The global state of a distributed system is a collection of the local states of
its components, namely, the processes and the communication channels
● The state of a process at any time is defined by the contents of processor
registers, stacks, local memory, etc. and depends on the local context of the
distributed application.
● The state of a channel is given by the set of messages in transit in the
channel.
● The occurrence of events changes the states of respective processes and
channels, thus causing transitions in global system state. For eg,
⮚an internal event changes the state of the process at which it occurs.
⮚A send event (or a receive event) changes the state of the process that sends
(or receives) the message and the state of the channel on which the message
is sent (or received)
93
94
Let denote the xth
event at process pi
Let denote the state of process pi after the occurrence of event and
before the event
denotes the initial state of process pi
is a result of the execution of all the events executed by process pi till .
Let
The state of a channel is difficult to state formally because a channel is a distributed
entity and its state depends upon the states of the processes it connects.
Thus, channel state denotes all messages that pi sent up to event ei
x
and which
process pj had not received until event ej
y
95
Global state
96
● For a global snapshot to be meaningful, the states of all the components of the
distributed system must be recorded at the same instant. This will be possible if the
local clocks at processes were perfectly synchronized or there was a global system
clock that could be instantaneously read by the processes. However, both are
impossible.
● Even if the state of all the components in a distributed system has not been recorded at
the same instant, such a state will be meaningful provided every message that is
recorded as received is also recorded as sent.
● Basic idea is that an effect should not be present without its cause.
States Channels
Global state
• A message cannot be received if it was not sent; that is, the state should not
violate causality.
• Such states are called consistent global states and are meaningful global
states.
• Inconsistent global states are not meaningful in the sense that a distributed
system can never be in an inconsistent state.
97
98
Global state
A global state GS1 consisting of local states is inconsistent because
◦ the state of p2 has recorded the receipt of message m12,
◦ however, the state of p1 has not recorded its send.
On the contrary, a global state GS2 consisting of local states is consistent;
◦ all the channels are empty except C21 that contains message m21
99
The space–time diagram of a distributed execution.( Fig.2.2)
100
Cuts of a distributed computation
● In the space–time diagram of a distributed computation, a zigzag line
joining one arbitrary point on each process line is termed a cut in the
computation.
● Such a line slices the space–time diagram, and thus the set of events in the
distributed computation, into a PAST and a FUTURE.
● The PAST contains all the events to the left of the cut and the FUTURE
contains all the events to the right of the cut.
● For a cut C, let PAST(C) and FUTURE(C) denote the set of events in the
PAST and FUTURE of C, respectively.
● Every cut corresponds to a global state and every global state can be
graphically represented as a cut in the computation’s space–time diagram
101
A consistent global state corresponds to a cut in which every
message received in the PAST of the cut was sent in the PAST
of that cut.
Such a cut is known as a consistent cut.
All messages that cross the cut from the PAST to the FUTURE
are in transit in the corresponding consistent global state.
A cut is inconsistent if a message crosses the cut from the
FUTURE to the PAST
102
103
C1 is an inconsistent cut, whereas C2 is a consistent
cut.
Past and future cones of an event
an event ej could have been affected only by all events ei such that ei →
ej and all the information available at ei could be made accessible at ej.
All such events ei belong to the past of ej.
Let Past(ej) denote all events in the past of ej in a computation (H, →).
Then,
104
105
Models of process communications
There are two basic models of process communications
◦ synchronous
◦ asynchronous.
106
The synchronous communication model is a blocking type where on
a message send, the sender process blocks until the message has
been received by the receiver process.
◦ The sender process resumes execution only after it learns that the receiver
process has accepted the message.
◦ Thus, the sender and the receiver processes must synchronize to exchange a
message.
107
Asynchronous communication model is a non-blocking type where the sender and the receiver do
not synchronize to exchange a message.
After having sent a message, the sender process does not wait for the message to be delivered to
the receiver process.
The message is buffered by the system and is delivered to the receiver process when it is ready to
accept the message.
A buffer overflow may occur if a process sends a large a number of messages in a burst to another
process.
Asynchronous communication provides higher parallelism because the sender process can execute
while the message is in transit to the receiver
due to higher degree of parallelism and non-determinism, it is much more difficult to design,
verify, and implement distributed algorithms for asynchronous communications
108

More Related Content

PPTX
DC UNIT 1 cs 3551 DISTRIBUTED COMPUTING.pptx
PPTX
DC-Unit-1-Part1.pptx-distributed computing notes
PPTX
DS PPT NEW FOR DATA SCCIENCE FROM CSE DEPT CMR
PPT
distcomp.ppt
PPT
distcomp.ppt
PPT
distcomp.ppt
PDF
CS9222 ADVANCED OPERATING SYSTEMS
PDF
Inter-Process Communication in distributed systems
DC UNIT 1 cs 3551 DISTRIBUTED COMPUTING.pptx
DC-Unit-1-Part1.pptx-distributed computing notes
DS PPT NEW FOR DATA SCCIENCE FROM CSE DEPT CMR
distcomp.ppt
distcomp.ppt
distcomp.ppt
CS9222 ADVANCED OPERATING SYSTEMS
Inter-Process Communication in distributed systems

Similar to CST 402 Distributed Computing Module 1 Notes (20)

PPTX
Message Passing, Remote Procedure Calls and Distributed Shared Memory as Com...
PPT
2.communcation in distributed system
PPT
UNIT-1 Introduction to Distributed SystemPPT.ppt
PPTX
Processbejdndnnnnnjsnsgsvvdvvvguigv.pptx
PPTX
Processprehsjsjsjskakwkwkejjdbdbdjj.pptx
PPTX
UNIT I DIS.pptx
PPTX
UNIT 1 NOTES DC FINAL COPY TO BE PREPARED
PPTX
DC - UNIT 1 - INTRODUCTION FINAL COPY TO STUDY
PDF
Client Server Model and Distributed Computing
PDF
20CS2021 Distributed Computing
PDF
20CS2021-Distributed Computing module 2
PPT
Chapter 6 os
PDF
02 Models of Distribution Systems.pdf
PPT
Chapter 4 communication2
 
PPTX
Message Passing Systems
PPTX
unit 1 intoductionDistributed computing(1).pptx
PDF
18CS3040 Distributed System
PDF
18CS3040_Distributed Systems
PDF
Cs556 section2
PPTX
Intro to Distributed Systems (By Lasmon Kapota).pptx
Message Passing, Remote Procedure Calls and Distributed Shared Memory as Com...
2.communcation in distributed system
UNIT-1 Introduction to Distributed SystemPPT.ppt
Processbejdndnnnnnjsnsgsvvdvvvguigv.pptx
Processprehsjsjsjskakwkwkejjdbdbdjj.pptx
UNIT I DIS.pptx
UNIT 1 NOTES DC FINAL COPY TO BE PREPARED
DC - UNIT 1 - INTRODUCTION FINAL COPY TO STUDY
Client Server Model and Distributed Computing
20CS2021 Distributed Computing
20CS2021-Distributed Computing module 2
Chapter 6 os
02 Models of Distribution Systems.pdf
Chapter 4 communication2
 
Message Passing Systems
unit 1 intoductionDistributed computing(1).pptx
18CS3040 Distributed System
18CS3040_Distributed Systems
Cs556 section2
Intro to Distributed Systems (By Lasmon Kapota).pptx
Ad

Recently uploaded (20)

PPTX
“Next-Gen AI: Trends Reshaping Our World”
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
composite construction of structures.pdf
PDF
Geotechnical Engineering, Soil mechanics- Soil Testing.pdf
PPTX
Fluid Mechanics, Module 3: Basics of Fluid Mechanics
PPTX
Road Safety tips for School Kids by a k maurya.pptx
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPT
Drone Technology Electronics components_1
PDF
BRKDCN-2613.pdf Cisco AI DC NVIDIA presentation
PPTX
Geodesy 1.pptx...............................................
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
web development for engineering and engineering
PPT
Project quality management in manufacturing
PDF
Monitoring Global Terrestrial Surface Water Height using Remote Sensing - ARS...
PPTX
Unit 5 BSP.pptxytrrftyyydfyujfttyczcgvcd
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPT
Chapter 6 Design in software Engineeing.ppt
“Next-Gen AI: Trends Reshaping Our World”
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
CH1 Production IntroductoryConcepts.pptx
composite construction of structures.pdf
Geotechnical Engineering, Soil mechanics- Soil Testing.pdf
Fluid Mechanics, Module 3: Basics of Fluid Mechanics
Road Safety tips for School Kids by a k maurya.pptx
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Drone Technology Electronics components_1
BRKDCN-2613.pdf Cisco AI DC NVIDIA presentation
Geodesy 1.pptx...............................................
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
web development for engineering and engineering
Project quality management in manufacturing
Monitoring Global Terrestrial Surface Water Height using Remote Sensing - ARS...
Unit 5 BSP.pptxytrrftyyydfyujfttyczcgvcd
Embodied AI: Ushering in the Next Era of Intelligent Systems
Chapter 6 Design in software Engineeing.ppt
Ad

CST 402 Distributed Computing Module 1 Notes

  • 2. Syllabus- Distributed systems basics and Computation model Distributed System – Definition, Relation to computer system components, Motivation, Primitives for distributed communication, Design issues, Challenges and applications. A model of distributed computations – Distributed program, Model of distributed executions, Models of communication networks, Global state of a distributed system, Cuts of a distributed computation, Past and future cones of an event, Models of process communications. 2
  • 3. Distributed System A distributed system is a collection of independent entities that cooperate to solve a problem that cannot be individually solved A distributed system can be characterized as a collection of mostly autonomous processors communicating over a communication network 3
  • 4. Distributed system has been characterized in one of several ways 1. You know you are using one when the crash of a computer you have never heard of prevents you from doing work--- prevents losing data in a computer crash 2. A collection of computers that do not share common memory or a common physical clock, that communicate by a messages passing over a communication network, and where each computer has its own memory and runs its own operating system 3. A collection of independent computers that appears to the users of the system as a single coherent computer 4. A term that describes a wide range of computers, from weakly coupled systems such as wide-area networks, to strongly coupled systems such as local area networks, to very strongly coupled systems such as multiprocessor systems 4
  • 5. Features of DS No common physical clock: This is an important assumption because it introduces the element of “distribution” in the system and gives rise to the inherent asynchrony amongst the processors. No shared memory : This is a key feature that requires message-passing for communication. This feature implies the absence of the common physical clock Geographical separation: The geographically wider apart that the processors are, the more representative is the system of a distributed system. WAN NOW/COW(network/cluster of workstations)--- eg, Google search engine Autonomy and heterogeneity: The processors are “loosely coupled” in that they have different speeds and each can be running a different operating system, cooperate with one another by offering services or solving a problem jointly. 5
  • 6. Relation to computer system components Each computer has a memory-processing unit and the computers are connected by a communication network 6
  • 7. Relationships of the software components that run on each of the computers and use the local operating system and network protocol stack for functioning 7
  • 8. ●The distributed software is also termed as middleware. ●A distributed execution is the execution of processes across the distributed system to collaboratively achieve a common goal.An execution is also sometimes termed a computation or a run. ●The distributed system uses a layered architecture to break down the complexity of system design. ●The middleware is the distributed software that drives the distributed system, while providing transparency of heterogeneity at the platform level 8
  • 9. ●The middleware layer does not contain the traditional application layer functions of the network protocol stack, such as http, mail, ftp, and telnet. ●Various primitives and calls to functions defined in various libraries of the middleware layer are embedded in the user program code. ●There exist several libraries to choose from to invoke primitives for the more common functions – such as reliable and ordered multicasting – of the middleware layer ●There are several standards such as Object Management Group’s (OMG) common object request broker architecture (CORBA) and the remote procedure call (RPC) mechanism. 9
  • 10. Motivation/Benefits of DS 1. Inherently distributed computations: money transfer in banking, or reaching consensus among parties that are geographically distant- computation is inherently distributed. 2. Resource sharing: eg-distributed databases such as DB2 partition the data sets across several servers, in addition to replicating them at a few sites for rapid access as well as reliability 3. Access to geographically remote data and resources: data cannot be replicated at every site participating in the distributed execution because it may be too large or too sensitive to be replicated 4. Enhanced reliability: ◦ Availability :- The resource should be accessible at all times. ◦ Integrity:- the value/state of the resource should be correct, in the face of concurrent access from multiple processors, as per the semantics expected by the application. ◦ Fault-tolerance :- The ability to recover from system failures. 10
  • 11. Motivation/Benefits of DS 5. Increased performance/cost ratio: By resource sharing and accessing geographically remote data and resources, the performance/cost ratio is increased. Any task can be partitioned across the various computers in the distributed system. 6. Scalability 7. Modularity and incremental expandability 11
  • 12. Distributed Vs Parallel computing 12
  • 13. 13
  • 14. Primitives for distributed communication Blocking/non-blocking, synchronous/asynchronous primitives Processor synchrony Libraries and standards 14
  • 15. Blocking/non-blocking, synchronous/asynchronous primitives Message send and message receive communication primitives are denoted Send() and Receive(), respectively. 15
  • 16. There are two ways of sending data when the Send primitive is invoked : ◦ Buffered option : ◦ The buffered option which is the standard option copies the data from the user buffer to the kernel buffer. ◦ The data later gets copied from the kernel buffer onto the network. ◦ Unbuffered option : ◦ In the unbuffered option, the data gets copied directly from the user buffer onto the network. For the Receive primitive, the buffered option is usually required because the data may already have arrived when the primitive is invoked, and needs a storage place in the kernel. 16
  • 18. Let’s Understand with Example - Synchronous 18
  • 19. Synchronous primitives ● A Send or a Receive primitive is synchronous if both the Send() and Receive() handshake with each other. ● The processing for the Send primitive completes only after the invoking processor learns that the other corresponding Receive primitive has also been invoked and that the receive operation has been completed. ● The processing for the Receive primitive completes when the data to be received is copied into the receiver’s user buffer. 19
  • 20. Let’s Understand with Example- Asynchronous 20
  • 21. Asynchronous primitives A Send primitive is said to be asynchronous if control returns back to the invoking process after the data item to be sent has been copied out of the user-specified buffer. It does not make sense to define asynchronous Receive primitives. 21
  • 22. Let’s Understand with Example- Blocking 22
  • 23. Blocking primitives A primitive is blocking if control returns to the invoking process after the processing for the primitive (whether in synchronous or asynchronous mode) completes. 23
  • 24. Let’s Understand with Example- Non Blocking 24
  • 25. Non-blocking primitives A primitive is non-blocking if control returns back to the invoking process immediately after invocation, even though the operation has not completed. For a non-blocking Send, control returns to the process even before the data is copied out of the user buffer. For a non-blocking Receive, control returns to the process even before the data may have arrived from the sender. 25
  • 26. For non-blocking primitives, a return parameter on the primitive call returns a system-generated handle which can be later used to check the status of completion of the call. The process can check for the completion of the call in two ways. 1. First, it can keep checking (in a loop or periodically) if the handle has been flagged or posted. 2. Second, it can issue a Wait with a list of handles as parameters. 26
  • 27. The Wait call usually blocks until one of the parameter handles is posted. Presumably after issuing the primitive in non-blocking mode, the process has done whatever actions it could and now needs to know the status of completion of the call, therefore using a blocking Wait() call is usual programming practice. 27
  • 28. ● If at the time that Wait() is issued, the processing for the primitive has completed, the Wait() returns immediately ● The completion of the processing of the primitive is detectable by checking the value of handleK . ● If the processing of the primitive has not completed, the Wait blocks and waits for a signal to wake it up. ● When the processing for the primitive completes, the communication subsystem software sets the value of handleK and wakes up (signals) any process with a Wait call blocked on this handleK . ● This is called posting the completion of the operation. 28
  • 29. Versions of the Send and Receive primitive Four versions of Send primitive : 1. Blocking synchronous Send 2. Non-blocking synchronous Send 3. Blocking asynchronous Send 4. Non-blocking asynchronous Send Two versions of Receive primitive : 5. Blocking (synchronous)Receive 6. Non-blocking (synchronous)Receive 29
  • 30. 30
  • 31. Blocking synchronous Send ● The data gets copied from the user buffer to the kernel buffer and is then sent over the network. ● After the data is copied to the receiver’s system buffer and a Receive call has been issued, an acknowledgement back to the sender causes control to return to the process that invoked the Send operation and completes the Send 31
  • 32. Blocking Receive ● The Receive call blocks until the data expected arrives and is written in the specified user buffer. ● Then control is returned to the user process. 32
  • 33. Non-blocking Synchronous Send ●Control returns back to the invoking process as soon as the copy of data from the user buffer to the kernel buffer is initiated. ●A parameter in the non-blocking call also gets set with the handle of a location that the user process can later check for the completion of the synchronous send operation. ●The location gets posted after an acknowledgement returns from the receiver ●The user process can keep checking for the completion of the non-blocking synchronous Send by testing the returned handle, or it can invoke the blocking Wait operation on the returned handle 33
  • 34. Non-blocking Receive ●The Receive call will cause the kernel to register the call and return the handle of a location that the user process can later check for the completion of the non-blocking Receive operation. ●This location gets posted by the kernel after the expected data arrives and is copied to the user- specified buffer. ●The user process can check for the completion of the non-blocking Receive by invoking the Wait operation on the returned handle. 34
  • 35. 35
  • 36. Blocking asynchronous Send ●The user process that invokes the Send is blocked until the data is copied from the user’s buffer to the kernel buffer. ●For the unbuffered option, the user process that invokes the Send is blocked until the data is copied from the user’s buffer to the network. 36
  • 37. Non-blocking asynchronous Send ●The user process that invokes the Send is blocked until the transfer of the data from the user’s buffer to the kernel buffer is initiated. ●Control returns to the user process as soon as this transfer is initiated, and a handle is given back. ●The asynchronous Send completes when the data has been copied out of the user’s buffer. ●The checking for the completion may be necessary if the user wants to reuse the buffer from which the data was sent. 37
  • 38. 38
  • 39. ●A synchronous Send is easier to use from a programmer’s perspective because the handshake between the Send and the Receive makes the communication appear instantaneous, thereby simplifying the program logic. ●The Receive may not get issued until much after the data arrives at Pj, in which case the data arrived would have to be buffered in the system buffer at Pj and not in the user buffer. At the same time, the sender would remain blocked. Thus, a synchronous Send lowers the efficiency within process Pi. ●The non-blocking asynchronous Send is useful when a large data item is being sent because it allows the process to perform other instructions in parallel with the completion of the Send. ●The non-blocking synchronous Send also avoids the potentially large delays for handshaking, particularly when the receiver has not yet issued the Receive call. 39
  • 40. ●The non-blocking Receive is useful when a large data item is being received and/or when the sender has not yet issued the Send call, ○because it allows the process to perform other instructions in parallel with the completion of the Receive. ○If the data has already arrived, it is stored in the kernel buffer, and it may take a while to copy it to the user buffer specified in the Receive call. ●For non-blocking calls, however, the burden on the programmer increases because he or she has to keep track of the completion of such operations in order to meaningfully reuse (write to or read from) the user buffers. Thus, conceptually, blocking primitives are easier to use. 40
  • 41. Processor synchrony Processor synchrony indicates that all the processors execute in lock-step with their clocks synchronized. As this synchrony is not attainable in a distributed system, what is more generally indicated is that for a large granularity of code, usually termed as a step, the processors are synchronized. This abstraction is implemented using some form of barrier synchronization to ensure that no processor begins executing the next step of code until all the processors have completed executing the previous steps of code assigned to each of the processors. 41
  • 42. Processor synchrony 1. Processor Synchrony : This means that all the computers or processors in a system work together perfectly, like synchronized dancers following the same rhythm. Their internal clocks are all perfectly in sync. 2. In Distributed systems: In reality, achieving perfect synchrony among all processors in a distributed system is very difficult or impossible. So, what we do instead is synchronize them in a different way. 3. Synchronization at a higher level: Instead of making every little action perfectly synchronized, we group many actions into larger chunks called steps. Think of these steps like dance routines. 4. Barrier synchronization: To make sure these steps are performed in sync, we use a mechanism called barrier synchronization. It’s like a checkpoint in a dance routine. No dancer can move to the next step until everyone has completed the current one. Similarly, in a distributed system, no processor can move on to the next step of their work until all processors have finished their current step. 42
  • 43. Design issues and challenges We describe design issues and challenges after categorizing them as 1. having a greater component related to systems design and operating systems design ( from system perspective) 2. having a greater component related to algorithm design ( algorithmic challenges) 3. emerging from recent technology advances and/or driven by new applications (application or technology driven) 43
  • 44. Distributed systems challenges from a system perspective The following functions must be addressed when designing and building a distributed system: 1. Communication 2. Processes 3. Naming 4. Synchronization 5. Data storage and access 6. Consistency and replication 7. Fault tolerance 8. Security 9. Applications Programming Interface (API) and transparency 10. Scalability and modularity 44
  • 45. 45
  • 46. 46
  • 47. 47
  • 48. 48
  • 49. 49
  • 50. Algorithmic challenges in distributed computing ❑Designing useful execution models and frameworks ❑Dynamic distributed graph algorithms and distributed routing algorithms ❑Time and global state in a distributed system ❑Synchronization/coordination mechanisms ❑Group communication, multicast, and ordered message delivery ❑Monitoring distributed events and predicates ❑Distributed program design and verification tools ❑Debugging distributed programs ❑Data replication, consistency models, and caching 50
  • 51. Designing useful execution models and frameworks ●The interleaving model and partial order model are two widely adopted models of distributed system executions. ●They have proved to be particularly useful for operational reasoning and the design of distributed algorithms. ●The input/output automata model and the TLA (temporal logic of actions) are two other examples of models that provide different degrees of infrastructure for reasoning more formally with and proving the correctness of distributed programs 51
  • 52. Dynamic distributed graph algorithms and distributed routing algorithms ● The distributed system is modeled as a distributed graph, and the graph algorithms form the building blocks for a large number of higher level communication, data dissemination, object location, and object search functions. ● The algorithms need to deal with dynamically changing graph characteristics, such as to model varying link loads in a routing algorithm. ● The efficiency of these algorithms impacts not only the user-perceived latency but also the traffic and hence the load or congestion in the network. ● Hence, the design of efficient distributed graph algorithms is of paramount importance 52
  • 53. Time and global state in a distributed system ●The challenges pertain to providing accurate physical time, and to providing a variant of time, called logical time. ●Logical time is relative time, and eliminates the overheads of providing physical time for applications where physical time is not required. More importantly, logical time can ○ capture the logic and inter-process dependencies within the distributed program, and also ○ track the relative progress at each process. It is not possible for any one process to directly observe a meaningful global state across all the processes, without using extra state-gathering effort which needs to be done in a coordinated manner 53
  • 54. Synchronization/coordination mechanisms ●The processes must be allowed to execute concurrently, except when they need to synchronize to exchange information, i.e., communicate about shared data. ●Synchronization is essential for the distributed processes to overcome the limited observation of the system state from the viewpoint of any one process. ●Overcoming this limited observation is necessary for taking any actions that would impact other processes. ●The synchronization mechanisms can also be viewed as resource management and concurrency management mechanisms to streamline the behavior of the processes that would otherwise act independently. 54
  • 55. Examples of Problems Requiring Synchronization ● Physical clock synchronization ● Leader election ● Mutual exclusion ● Deadlock detection and resolution ● Termination detection ● Garbage collection 55
  • 56. 56
  • 57. Group communication, multicast, and ordered message delivery ● A group is a collection of processes that share a common context and collaborate on a common task within an application domain. ● Specific algorithms need to be designed to enable efficient group communication and group management wherein processes can join and leave groups dynamically, or even fail. ● When multiple processes send messages concurrently, different recipients may receive the messages in different orders, possibly violating the semantics of the distributed program. ● Hence, formal specifications of the semantics of ordered delivery need to be formulated, and then implemented. 57
  • 58. Monitoring distributed events and predicates ● Predicates defined on program variables that are local to different processes are used for specifying conditions on the global system state, and are useful for applications such as debugging, sensing the environment, and in industrial process control. ● On-line algorithms for monitoring such predicates are hence important. ● An important paradigm for monitoring distributed events is that of event streaming, wherein streams of relevant events reported from different processes are examined collectively to detect predicates. ● Typically, the specification of such predicates uses physical or logical time relationships. 58
  • 59. Distributed program design and verification tools ● Methodically designed and verifiably correct programs can greatly reduce the overhead of software design, debugging, and engineering. ● Designing mechanisms to achieve these design and verification goals is a challenge. 59
  • 60. Debugging distributed programs ● Debugging sequential programs is hard; debugging distributed programs is that much harder because of the concurrency in actions and the ensuing uncertainty due to the large number of possible executions defined by the interleaved concurrent actions. ● Adequate debugging mechanisms and tools need to be designed to meet this challenge. 60
  • 61. Data replication, consistency models, and caching ● Fast access to data and other resources requires them to be replicated in the distributed system. ● Managing such replicas in the face of updates introduces the problems of ensuring consistency among the replicas and cached copies. ● Additionally, placement of the replicas in the systems is also a challenge because resources usually cannot be freely replicated. 61
  • 62. World Wide Web design – caching, searching, scheduling ● Minimizing response time to minimize user-perceived latencies is an important challenge. ● Object search and navigation on the web are important functions in the operation of the web, and are very resource-intensive. ● Designing mechanisms to do this efficiently and accurately is a great challenge. 62
  • 63. Distributed shared memory abstraction ● A shared memory abstraction simplifies the task of the programmer because he or she has to deal only with read and write operations, and no message communication primitives. ● However, under the covers in the middleware layer, the abstraction of a shared address space has to be implemented by using message-passing. ● Hence, in terms of overheads, the shared memory abstraction is not less expensive. 63
  • 64. 64
  • 65. Reliable and fault-tolerant distributed systems A reliable and fault-tolerant environment has multiple requirements and aspects, and these can be addressed using various strategies: ● Consensus algorithms ● Replication and replica management ● Voting and quorum systems ● Distributed databases and distributed commit ● Self-stabilizing systems ● Checkpointing and recovery algorithms ● Failure detectors 65
  • 66. 66
  • 67. 67
  • 68. Load balancing ● The goal of load balancing is to gain higher throughput, and reduce the user-perceived latency. ● Load balancing may be necessary because of a variety of factors such as high network traffic or high request rate causing the network connection to be a bottleneck, or high computational load. ● A common situation where load balancing is used is in server farms, where the objective is to service incoming client requests with the least turnaround time. The following are some forms of load balancing: ● Data migration: The ability to move data (which may be replicated) around in the system, based on the access pattern of the users. ● Computation migration: The ability to relocate processes in order to perform a redistribution of the workload. ● Distributed scheduling: This achieves a better turnaround time for the users by using idle processing power in the system more efficiently. 68
  • 69. Applications of distributed computing and newer challenges 1. Mobile systems 2. Sensor networks 3. Ubiquitous or pervasive computing 4. Peer-to-peer computing 5. Publish-subscribe, content distribution, and multimedia 6. Distributed agents 7. Distributed data mining 8. Grid computing 9. Security in distributed systems 69
  • 71. 2. Sensor networks & 3. Ubiquitous or pervasive computing A sensor is a processor with an electro-mechanical interface that is capable of sensing physical parameters, such as temperature, velocity, pressure, humidity, and chemicals. Sensors may be mobile or static; 71
  • 72. 4. Peer-to-peer computing & 5. Publish-subscribe, content distribution, and multimedia 72
  • 73. 6. Distributed agents ● Agents collect and process information, and can exchange such information with other agents. ● Often, the agents cooperate as in an ant colony, but they can also have friendly competition, as in a free market economy. ● Challenges in distributed agent systems include coordination mechanisms among the agents, controlling the mobility of the agents, and their software design and interfaces. ● Research in agents is interdisciplinary: spanning artificial intelligence, mobile computing, economic market models, software engineering, and distributed computing. 73
  • 74. 7. Distributed data mining ● The data is necessarily distributed and cannot be collected in a single repository, as in banking applications where the data is private and sensitive, ● or in atmospheric weather prediction where the data sets are far too massive to collect and process at a single repository in real-time. 74
  • 75. 8. Grid computing ○Grid Computing is a subset of distributed computing, where a virtual supercomputer comprises machines on a network connected by some bus, mostly Ethernet or sometimes the Internet. ○It can also be seen as a form of Parallel Computing where instead of many CPU cores on a single machine, it contains multiple cores spread across various locations. ○Many challenges in making grid computing a reality include: ○scheduling jobs in such a distributed environment, ○a framework for implementing quality of service and real-time guarantees, ○Security of individual machines as well as of jobs being executed in this setting. 75
  • 76. 9. Security in distributed systems ● The traditional challenges of security in a distributed setting include: ◦ Confidentiality (ensuring that only authorized processes can access certain information), ◦ Authentication (ensuring the source of received information and the identity of the sending process), and ◦ Availability (maintaining allowed access to services despite malicious actions). ● The goal is to meet these challenges with efficient and scalable solutions. ● These basic challenges have been addressed in traditional distributed settings. 76
  • 77. A model of distributed computations ❑A distributed system consists of a set of processors that are connected by a communication network. ❑The communication network provides the facility of information exchange among processors. ❑The communication delay is finite but unpredictable. ❑The processors do not share a common global memory and communicate solely by passing messages over the communication network. 77
  • 78. 78 ❑There is no physical global clock in the system to which processes have instantaneous access. ❑The communication medium may deliver messages out of order, messages may be lost, garbled, or duplicated due to timeout and retransmission, processors may fail, and communication links may go down. ❑The system can be modeled as a directed graph in which vertices represent the processes and edges represent unidirectional communication channels. ❑A distributed application runs as a collection of processes on a distributed system
  • 79. A distributed program A distributed program is composed of a set of n asynchronous processes p1, p2,... , pi,... , pn that communicate by message passing over the communication network. ◦ we assume that each process is running on a different processor The processes do not share a global memory and communicate solely by passing messages. ❑Cij: denote the channel from process pi to process pj ❑mij: denote a message sent by pi to pj 79
  • 80. ◆The communication delay is finite and unpredictable. ◆Also, these processes do not share a global clock that is instantaneously accessible to these processes ◆Process execution and message transfer are asynchronous ◆ a process may execute an action spontaneously and a process sending a message does not wait for the delivery of the message to be complete. 80
  • 81. ❖The global state of a distributed computation is composed of the states of the processes and the communication channels ● The state of a process is characterized by the state of its local memory and depends upon the context. ● The state of a channel is characterized by the set of messages in transit in the channel. 81
  • 82. A model of distributed executions ● The execution of a process consists of a sequential execution of its actions. ● The actions are atomic and the actions of a process are modeled as three types of events: ● Internal events ● Message send events, and ● Message receive events 82
  • 83. 83
  • 84. 84
  • 85. Space–Time Diagram of a Distributed Execution involving Three Processes 85 1. A horizontal line represents the progress of the process. 2. A dot indicates an event. 3. A slant arrow indicates a message transfer.
  • 87. 87 For example, in Figure 2.1, For example, in Figure 2.1,event e2 6 has the knowledge of all other events shown in the figure.
  • 88. 88 For any two events ei and ej, denotes: ◦ Event ej does not directly or transitively dependent on event ei ◦ ie. Event ei does not causally affect event ej. ◦ Event ej is not aware of the execution of ei or any event executed after ei on the same process. For example, in Figure 2.1
  • 89. Note the following two rules: For any two events ei and ej, if and then events ei and ej are said to be concurrent and the relation is denoted as ei || ej Note that relation || is not transitive, ie. For example, in Figure 2.1, however, Note that for any two events ei and ej in a distributed execution, ei → ej or ej → ei, or ei || ej 89
  • 90. Logical vs. Physical Concurrency In a distributed computation, two events are logically concurrent if and only if they do not causally affect each other. Physical concurrency, on the other hand, has a meaning that the events occur at the same instant in physical time. Two or more events may be logically concurrent even though they do not occur at the same instant in physical time. However, if processor speed and message delays would have been different, the execution of these events could have very well coincided in physical time. Whether a set of logically concurrent events coincide in the physical time or not, does not change the outcome of the computation. Therefore, even though a set of logically concurrent events may not have occurred at the same instant in physical time, we can assume that these events occurred at the same instant in physical time. 90
  • 91. Models of communication networks There are several models of the service provided by communication networks, namely, ◦ FIFO (first-in, first-out): each channel acts as a first-in first-out message queue and thus, message ordering is preserved by a channel ◦ non-FIFO: a channel acts like a set in which the sender process adds messages and the receiver process removes messages from it in a random order ◦ causal ordering: is based on Lamport’s “happens before” relation. A system that supports the causal ordering model satisfies the following property: ◦ This property ensures that causally related messages destined to the same destination are delivered in an order that is consistent with their causality relation. ◦ Causally ordered delivery of messages implies FIFO message delivery.Furthermore, note that 91
  • 92. ● Causal ordering model is useful in developing distributed algorithms. ● Generally, it considerably simplifies the design of distributed algorithms because it provides a built-in synchronization. ● For example, in replicated database systems, it is important that every process responsible for updating a replica receives the updates in the same order to maintain database consistency. ● Without causal ordering, each update must be checked to ensure that database consistency is not being violated. Causal ordering eliminates the need for such checks. 92
  • 93. Global state of a distributed system ● The global state of a distributed system is a collection of the local states of its components, namely, the processes and the communication channels ● The state of a process at any time is defined by the contents of processor registers, stacks, local memory, etc. and depends on the local context of the distributed application. ● The state of a channel is given by the set of messages in transit in the channel. ● The occurrence of events changes the states of respective processes and channels, thus causing transitions in global system state. For eg, ⮚an internal event changes the state of the process at which it occurs. ⮚A send event (or a receive event) changes the state of the process that sends (or receives) the message and the state of the channel on which the message is sent (or received) 93
  • 94. 94 Let denote the xth event at process pi Let denote the state of process pi after the occurrence of event and before the event denotes the initial state of process pi is a result of the execution of all the events executed by process pi till . Let
  • 95. The state of a channel is difficult to state formally because a channel is a distributed entity and its state depends upon the states of the processes it connects. Thus, channel state denotes all messages that pi sent up to event ei x and which process pj had not received until event ej y 95
  • 96. Global state 96 ● For a global snapshot to be meaningful, the states of all the components of the distributed system must be recorded at the same instant. This will be possible if the local clocks at processes were perfectly synchronized or there was a global system clock that could be instantaneously read by the processes. However, both are impossible. ● Even if the state of all the components in a distributed system has not been recorded at the same instant, such a state will be meaningful provided every message that is recorded as received is also recorded as sent. ● Basic idea is that an effect should not be present without its cause. States Channels
  • 97. Global state • A message cannot be received if it was not sent; that is, the state should not violate causality. • Such states are called consistent global states and are meaningful global states. • Inconsistent global states are not meaningful in the sense that a distributed system can never be in an inconsistent state. 97
  • 99. A global state GS1 consisting of local states is inconsistent because ◦ the state of p2 has recorded the receipt of message m12, ◦ however, the state of p1 has not recorded its send. On the contrary, a global state GS2 consisting of local states is consistent; ◦ all the channels are empty except C21 that contains message m21 99 The space–time diagram of a distributed execution.( Fig.2.2)
  • 100. 100
  • 101. Cuts of a distributed computation ● In the space–time diagram of a distributed computation, a zigzag line joining one arbitrary point on each process line is termed a cut in the computation. ● Such a line slices the space–time diagram, and thus the set of events in the distributed computation, into a PAST and a FUTURE. ● The PAST contains all the events to the left of the cut and the FUTURE contains all the events to the right of the cut. ● For a cut C, let PAST(C) and FUTURE(C) denote the set of events in the PAST and FUTURE of C, respectively. ● Every cut corresponds to a global state and every global state can be graphically represented as a cut in the computation’s space–time diagram 101
  • 102. A consistent global state corresponds to a cut in which every message received in the PAST of the cut was sent in the PAST of that cut. Such a cut is known as a consistent cut. All messages that cross the cut from the PAST to the FUTURE are in transit in the corresponding consistent global state. A cut is inconsistent if a message crosses the cut from the FUTURE to the PAST 102
  • 103. 103 C1 is an inconsistent cut, whereas C2 is a consistent cut.
  • 104. Past and future cones of an event an event ej could have been affected only by all events ei such that ei → ej and all the information available at ei could be made accessible at ej. All such events ei belong to the past of ej. Let Past(ej) denote all events in the past of ej in a computation (H, →). Then, 104
  • 105. 105
  • 106. Models of process communications There are two basic models of process communications ◦ synchronous ◦ asynchronous. 106
  • 107. The synchronous communication model is a blocking type where on a message send, the sender process blocks until the message has been received by the receiver process. ◦ The sender process resumes execution only after it learns that the receiver process has accepted the message. ◦ Thus, the sender and the receiver processes must synchronize to exchange a message. 107
  • 108. Asynchronous communication model is a non-blocking type where the sender and the receiver do not synchronize to exchange a message. After having sent a message, the sender process does not wait for the message to be delivered to the receiver process. The message is buffered by the system and is delivered to the receiver process when it is ready to accept the message. A buffer overflow may occur if a process sends a large a number of messages in a burst to another process. Asynchronous communication provides higher parallelism because the sender process can execute while the message is in transit to the receiver due to higher degree of parallelism and non-determinism, it is much more difficult to design, verify, and implement distributed algorithms for asynchronous communications 108