CST 402 Distributed Computing Module 1 Notes

1
CST402 DISTRIBUTED COMPUTING
MODULE 1

Syllabus- Distributed systems basics and
Computation model
Distributed System – Definition, Relation to computer system
components, Motivation, Primitives for distributed communication,
Design issues, Challenges and applications.
A model of distributed computations – Distributed program, Model of
distributed executions, Models of communication networks, Global
state of a distributed system, Cuts of a distributed computation, Past
and future cones of an event, Models of process communications.
2

Distributed System
A distributed system is a collection of independent entities that
cooperate to solve a problem that cannot be individually solved
A distributed system can be characterized as a collection of mostly
autonomous processors communicating over a communication
network
3

Distributed system has been
characterized in one of several ways
1. You know you are using one when the crash of a computer you have never heard of
prevents you from doing work--- prevents losing data in a computer crash
2. A collection of computers that do not share common memory or a common physical
clock, that communicate by a messages passing over a communication network, and
where each computer has its own memory and runs its own operating system
3. A collection of independent computers that appears to the users of the system as a
single coherent computer
4. A term that describes a wide range of computers, from weakly coupled systems such
as wide-area networks, to strongly coupled systems such as local area networks, to
very strongly coupled systems such as multiprocessor systems
4

Features of DS
No common physical clock: This is an important assumption because it introduces the element
of “distribution” in the system and gives rise to the inherent asynchrony amongst the processors.
No shared memory : This is a key feature that requires message-passing for communication.
This feature implies the absence of the common physical clock
Geographical separation: The geographically wider apart that the processors are, the more
representative is the system of a distributed system.
WAN
NOW/COW(network/cluster of workstations)--- eg, Google search engine
Autonomy and heterogeneity: The processors are “loosely coupled” in that they have different
speeds and each can be running a different operating system, cooperate with one another by
offering services or solving a problem jointly.
5

Relation to computer system
components
Each computer has a memory-processing unit and the computers are connected by a
communication network
6

Relationships of the software components that run on each of the
computers and use the local operating system and network protocol
stack for functioning
7

●The distributed software is also termed as middleware.
●A distributed execution is the execution of processes across the
distributed system to collaboratively achieve a common goal.An
execution is also sometimes termed a computation or a run.
●The distributed system uses a layered architecture to break down the complexity
of system design.
●The middleware is the distributed software that drives the distributed system,
while providing transparency of heterogeneity at the platform level
8

●The middleware layer does not contain the traditional application layer functions
of the network protocol stack, such as http, mail, ftp, and telnet.
●Various primitives and calls to functions defined in various libraries of the
middleware layer are embedded in the user program code.
●There exist several libraries to choose from to invoke primitives for the more
common functions – such as reliable and ordered multicasting – of the
middleware layer
●There are several standards such as Object Management Group’s (OMG)
common object request broker architecture (CORBA) and the remote procedure
call (RPC) mechanism.
9

Motivation/Benefits of DS
1. Inherently distributed computations: money transfer in banking, or reaching consensus among parties
that are geographically distant- computation is inherently distributed.
2. Resource sharing: eg-distributed databases such as DB2 partition the data sets across several servers,
in addition to replicating them at a few sites for rapid access as well as reliability
3. Access to geographically remote data and resources: data cannot be replicated at every site
participating in the distributed execution because it may be too large or too sensitive to be replicated
4. Enhanced reliability:
◦ Availability :- The resource should be accessible at all times.
◦ Integrity:- the value/state of the resource should be correct, in the face of concurrent access from
multiple processors, as per the semantics expected by the application.
◦ Fault-tolerance :- The ability to recover from system failures.
10

Motivation/Benefits of DS
5. Increased performance/cost ratio: By resource sharing and accessing geographically remote data
and resources, the performance/cost ratio is increased. Any task can be partitioned across the various
computers in the distributed system.
6. Scalability
7. Modularity and incremental expandability
11

Distributed Vs Parallel computing
12

Primitives for distributed
communication
Blocking/non-blocking, synchronous/asynchronous primitives
Processor synchrony
Libraries and standards
14

Blocking/non-blocking,
synchronous/asynchronous primitives
Message send and message receive communication primitives are denoted Send() and
Receive(), respectively.
15

There are two ways of sending data when the Send primitive is
invoked :
◦ Buffered option :
◦ The buffered option which is the standard option copies the data from the user buffer to
the kernel buffer.
◦ The data later gets copied from the kernel buffer onto the network.
◦ Unbuffered option :
◦ In the unbuffered option, the data gets copied directly from the user buffer onto the
network.
For the Receive primitive, the buffered option is usually required because the data may already
have arrived when the primitive is invoked, and needs a storage place in the kernel.
16

Blocking/non-blocking, synchronous/asynchronous
primitives - shortcut
17

Let’s Understand with Example - Synchronous
18

Synchronous primitives
● A Send or a Receive primitive is synchronous if both
the Send() and Receive() handshake with each other.
● The processing for the Send primitive completes
only after the invoking processor learns that the
other corresponding Receive primitive has also been
invoked and that the receive operation has been
completed.
● The processing for the Receive primitive completes
when the data to be received is copied into the
receiver’s user buffer.
19

Let’s Understand with Example- Asynchronous
20

Asynchronous primitives
A Send primitive is said to be asynchronous if
control returns back to the invoking process
after the data item to be sent has been copied
out of the user-specified buffer.
It does not make sense to define asynchronous
Receive primitives.
21

Let’s Understand with Example- Blocking
22

Blocking primitives
A primitive is blocking if control returns to the invoking process after
the processing for the primitive (whether in synchronous or
asynchronous mode) completes.
23

Let’s Understand with Example- Non Blocking
24

Non-blocking primitives
A primitive is non-blocking if control returns back to the invoking process
immediately after invocation, even though the operation has not completed.
For a non-blocking Send, control returns to the process even before the data is
copied out of the user buffer.
For a non-blocking Receive, control returns to the process even before the data
may have arrived from the sender.
25

For non-blocking primitives, a return parameter on the primitive call returns a
system-generated handle which can be later used to check the status of
completion of the call.
The process can check for the completion of the call in two ways.
1. First, it can keep checking (in a loop or periodically) if the handle has been
flagged or posted.
2. Second, it can issue a Wait with a list of handles as parameters.
26

The Wait call usually blocks until one of the parameter handles is posted.
Presumably after issuing the primitive in non-blocking mode, the process has
done whatever actions it could and now needs to know the status of completion
of the call, therefore using a blocking Wait() call is usual programming practice.
27

● If at the time that Wait() is issued, the processing for the primitive has
completed, the Wait() returns immediately
● The completion of the processing of the primitive is detectable by checking the
value of handleK .
● If the processing of the primitive has not completed, the Wait blocks and waits
for a signal to wake it up.
● When the processing for the primitive completes, the communication subsystem
software sets the value of handleK and wakes up (signals) any process with a
Wait call blocked on this handleK .
● This is called posting the completion of the operation.
28

Versions of the Send and Receive
primitive
Four versions of Send primitive :
1. Blocking synchronous Send
2. Non-blocking synchronous Send
3. Blocking asynchronous Send
4. Non-blocking asynchronous Send
Two versions of Receive primitive :
5. Blocking (synchronous)Receive
6. Non-blocking (synchronous)Receive
29

Blocking synchronous Send
● The data gets copied from the user
buffer to the kernel buffer and is then
sent over the network.
● After the data is copied to the
receiver’s system buffer and a
Receive call has been issued, an
acknowledgement back to the sender
causes control to return to the
process that invoked the Send
operation and completes the Send
31

Blocking Receive
● The Receive call blocks until the
data expected arrives and is
written in the specified user buffer.
● Then control is returned to the user
process.
32

Non-blocking Synchronous Send
●Control returns back to the invoking process as soon as the
copy of data from the user buffer to the kernel buffer is
initiated.
●A parameter in the non-blocking call also gets set with the
handle of a location that the user process can later check for
the completion of the synchronous send operation.
●The location gets posted after an acknowledgement returns
from the receiver
●The user process can keep checking for the completion of
the non-blocking synchronous Send by testing the returned
handle, or it can invoke the blocking Wait operation on the
returned handle
33

Non-blocking Receive
●The Receive call will cause the kernel to register
the call and return the handle of a location that the
user process can later check for the completion of
the non-blocking Receive operation.
●This location gets posted by the kernel after the
expected data arrives and is copied to the user-
specified buffer.
●The user process can check for the completion of
the non-blocking Receive by invoking the Wait
operation on the returned handle.
34

Blocking asynchronous Send
●The user process that invokes the Send is
blocked until the data is copied from the
user’s buffer to the kernel buffer.
●For the unbuffered option, the user process
that invokes the Send is blocked until the
data is copied from the user’s buffer to the
network.
36

Non-blocking asynchronous Send
●The user process that invokes the Send is blocked
until the transfer of the data from the user’s buffer to
the kernel buffer is initiated.
●Control returns to the user process as soon as this
transfer is initiated, and a handle is given back.
●The asynchronous Send completes when the data has
been copied out of the user’s buffer.
●The checking for the completion may be necessary if
the user wants to reuse the buffer from which the data
was sent.
37

●A synchronous Send is easier to use from a programmer’s perspective because the
handshake between the Send and the Receive makes the communication appear
instantaneous, thereby simplifying the program logic.
●The Receive may not get issued until much after the data arrives at Pj, in which case
the data arrived would have to be buffered in the system buffer at Pj and not in the
user buffer. At the same time, the sender would remain blocked. Thus, a synchronous
Send lowers the efficiency within process Pi.
●The non-blocking asynchronous Send is useful when a large data item is being sent because
it allows the process to perform other instructions in parallel with the completion of the Send.
●The non-blocking synchronous Send also avoids the potentially large delays for
handshaking, particularly when the receiver has not yet issued the Receive call.
39

●The non-blocking Receive is useful when a large data item is being received and/or when
the sender has not yet issued the Send call,
○because it allows the process to perform other instructions in parallel with the completion
of the Receive.
○If the data has already arrived, it is stored in the kernel buffer, and it may take a while to
copy it to the user buffer specified in the Receive call.
●For non-blocking calls, however, the burden on the programmer increases because he or she
has to keep track of the completion of such operations in order to meaningfully reuse (write
to or read from) the user buffers. Thus, conceptually, blocking primitives are easier to use.
40

Processor synchrony
Processor synchrony indicates that all the processors execute in lock-step with
their clocks synchronized.
As this synchrony is not attainable in a distributed system, what is more
generally indicated is that for a large granularity of code, usually termed as a
step, the processors are synchronized.
This abstraction is implemented using some form of barrier synchronization to
ensure that no processor begins executing the next step of code until all the
processors have completed executing the previous steps of code assigned to
each of the processors.
41

Processor synchrony
1. Processor Synchrony : This means that all the computers or processors in a system
work together perfectly, like synchronized dancers following the same rhythm. Their
internal clocks are all perfectly in sync.
2. In Distributed systems: In reality, achieving perfect synchrony among all processors in
a distributed system is very difficult or impossible. So, what we do instead is
synchronize them in a different way.
3. Synchronization at a higher level: Instead of making every little action perfectly
synchronized, we group many actions into larger chunks called steps. Think of these
steps like dance routines.
4. Barrier synchronization: To make sure these steps are performed in sync, we use a
mechanism called barrier synchronization. It’s like a checkpoint in a dance routine. No
dancer can move to the next step until everyone has completed the current one.
Similarly, in a distributed system, no processor can move on to the next step of their
work until all processors have finished their current step.
42

Design issues and challenges
We describe design issues and challenges after categorizing them as
1. having a greater component related to systems design and operating
systems design ( from system perspective)
2. having a greater component related to algorithm design ( algorithmic
challenges)
3. emerging from recent technology advances and/or driven by new
applications (application or technology driven)
43

Distributed systems challenges from a system
perspective
The following functions must be addressed when designing and building a distributed system:
1. Communication
2. Processes
3. Naming
4. Synchronization
5. Data storage and access
6. Consistency and replication
7. Fault tolerance
8. Security
9. Applications Programming Interface (API) and transparency
10. Scalability and modularity
44

Algorithmic challenges in distributed computing
❑Designing useful execution models and frameworks
❑Dynamic distributed graph algorithms and distributed routing algorithms
❑Time and global state in a distributed system
❑Synchronization/coordination mechanisms
❑Group communication, multicast, and ordered message delivery
❑Monitoring distributed events and predicates
❑Distributed program design and verification tools
❑Debugging distributed programs
❑Data replication, consistency models, and caching
50

Designing useful execution models and frameworks
●The interleaving model and partial order model are two widely
adopted models of distributed system executions.
●They have proved to be particularly useful for operational reasoning
and the design of distributed algorithms.
●The input/output automata model and the TLA (temporal logic of
actions) are two other examples of models that provide different
degrees of infrastructure for reasoning more formally with and
proving the correctness of distributed programs
51

Dynamic distributed graph algorithms and distributed
routing algorithms
● The distributed system is modeled as a distributed graph, and the graph
algorithms form the building blocks for a large number of higher level
communication, data dissemination, object location, and object search
functions.
● The algorithms need to deal with dynamically changing graph
characteristics, such as to model varying link loads in a routing
algorithm.
● The efficiency of these algorithms impacts not only the user-perceived
latency but also the traffic and hence the load or congestion in the
network.
● Hence, the design of efficient distributed graph algorithms is of
paramount importance
52

Time and global state in a distributed system
●The challenges pertain to providing accurate physical time, and to providing a
variant of time, called logical time.
●Logical time is relative time, and eliminates the overheads of providing physical
time for applications where physical time is not required. More importantly,
logical time can
○ capture the logic and inter-process dependencies within the distributed
program, and also
○ track the relative progress at each process.
It is not possible for any one process to directly observe a meaningful global state
across all the processes, without using extra state-gathering effort which needs to
be done in a coordinated manner
53

Synchronization/coordination mechanisms
●The processes must be allowed to execute concurrently, except when they
need to synchronize to exchange information, i.e., communicate about shared
data.
●Synchronization is essential for the distributed processes to overcome the
limited observation of the system state from the viewpoint of any one process.
●Overcoming this limited observation is necessary for taking any actions that
would impact other processes.
●The synchronization mechanisms can also be viewed as resource
management and concurrency management mechanisms to streamline the
behavior of the processes that would otherwise act independently.
54

Examples of Problems Requiring Synchronization
● Physical clock synchronization
● Leader election
● Mutual exclusion
● Deadlock detection and resolution
● Termination detection
● Garbage collection
55

Group communication, multicast, and ordered
message delivery
● A group is a collection of processes that share a common context and
collaborate on a common task within an application domain.
● Specific algorithms need to be designed to enable efficient group
communication and group management wherein processes can join and
leave groups dynamically, or even fail.
● When multiple processes send messages concurrently, different
recipients may receive the messages in different orders, possibly
violating the semantics of the distributed program.
● Hence, formal specifications of the semantics of ordered delivery need to
be formulated, and then implemented.
57

Monitoring distributed events and predicates
● Predicates defined on program variables that are local to different
processes are used for specifying conditions on the global system state,
and are useful for applications such as debugging, sensing the
environment, and in industrial process control.
● On-line algorithms for monitoring such predicates are hence important.
● An important paradigm for monitoring distributed events is that of event
streaming, wherein streams of relevant events reported from different
processes are examined collectively to detect predicates.
● Typically, the specification of such predicates uses physical or logical
time relationships.
58

Distributed program design and verification tools
● Methodically designed and verifiably correct programs can greatly
reduce the overhead of software design, debugging, and engineering.
● Designing mechanisms to achieve these design and verification goals
is a challenge.
59

Debugging distributed programs
● Debugging sequential programs is hard; debugging distributed
programs is that much harder because of the concurrency in actions
and the ensuing uncertainty due to the large number of possible
executions defined by the interleaved concurrent actions.
● Adequate debugging mechanisms and tools need to be designed to
meet this challenge.
60

Data replication, consistency models, and caching
● Fast access to data and other resources requires them to be
replicated in the distributed system.
● Managing such replicas in the face of updates introduces the
problems of ensuring consistency among the replicas and cached
copies.
● Additionally, placement of the replicas in the systems is also a
challenge because resources usually cannot be freely replicated.
61

World Wide Web design – caching, searching,
scheduling
● Minimizing response time to minimize user-perceived latencies is an
important challenge.
● Object search and navigation on the web are important functions in the
operation of the web, and are very resource-intensive.
● Designing mechanisms to do this efficiently and accurately is a great
challenge.
62

Distributed shared memory abstraction
● A shared memory abstraction simplifies the task of the programmer
because he or she has to deal only with read and write operations, and
no message communication primitives.
● However, under the covers in the middleware layer, the abstraction of a
shared address space has to be implemented by using message-passing.
● Hence, in terms of overheads, the shared memory abstraction is not less
expensive.
63

Reliable and fault-tolerant distributed systems
A reliable and fault-tolerant environment has multiple requirements and aspects,
and these can be addressed using various strategies:
● Consensus algorithms
● Replication and replica management
● Voting and quorum systems
● Distributed databases and distributed commit
● Self-stabilizing systems
● Checkpointing and recovery algorithms
● Failure detectors
65

Load balancing
● The goal of load balancing is to gain higher throughput, and reduce the user-perceived latency.
● Load balancing may be necessary because of a variety of factors such as high network traffic or
high request rate causing the network connection to be a bottleneck, or high computational load.
● A common situation where load balancing is used is in server farms, where the objective is to
service incoming client requests with the least turnaround time.
The following are some forms of load balancing:
● Data migration: The ability to move data (which may be replicated) around in the system, based
on the access pattern of the users.
● Computation migration: The ability to relocate processes in order to perform a redistribution of
the workload.
● Distributed scheduling: This achieves a better turnaround time for the users by using idle
processing power in the system more efficiently.
68

Applications of distributed computing and newer
challenges
1. Mobile systems
2. Sensor networks
3. Ubiquitous or pervasive computing
4. Peer-to-peer computing
5. Publish-subscribe, content distribution, and multimedia
6. Distributed agents
7. Distributed data mining
8. Grid computing
9. Security in distributed systems
69

2. Sensor networks &
3. Ubiquitous or pervasive computing
A sensor is a processor with an electro-mechanical interface that is capable of
sensing physical parameters, such as temperature, velocity, pressure, humidity,
and chemicals. Sensors may be mobile or static;
71

4. Peer-to-peer computing &
5. Publish-subscribe, content distribution, and multimedia
72

6. Distributed agents
● Agents collect and process information, and can exchange such
information with other agents.
● Often, the agents cooperate as in an ant colony, but they can also have
friendly competition, as in a free market economy.
● Challenges in distributed agent systems include coordination mechanisms
among the agents, controlling the mobility of the agents, and their software
design and interfaces.
● Research in agents is interdisciplinary: spanning artificial intelligence,
mobile computing, economic market models, software engineering, and
distributed computing.
73

7. Distributed data mining
● The data is necessarily distributed and cannot be collected in a
single repository, as in banking applications where the data is
private and sensitive,
● or in atmospheric weather prediction where the data sets are far too
massive to collect and process at a single repository in real-time.
74

8. Grid computing
○Grid Computing is a subset of distributed computing, where a virtual
supercomputer comprises machines on a network connected by some bus,
mostly Ethernet or sometimes the Internet.
○It can also be seen as a form of Parallel Computing where instead of many
CPU cores on a single machine, it contains multiple cores spread across
various locations.
○Many challenges in making grid computing a reality include:
○scheduling jobs in such a distributed environment,
○a framework for implementing quality of service and real-time guarantees,
○Security of individual machines as well as of jobs being executed in this
setting.
75

9. Security in distributed systems
● The traditional challenges of security in a distributed setting include:
◦ Confidentiality (ensuring that only authorized processes can access
certain information),
◦ Authentication (ensuring the source of received information and the
identity of the sending process), and
◦ Availability (maintaining allowed access to services despite malicious
actions).
● The goal is to meet these challenges with efficient and scalable
solutions.
● These basic challenges have been addressed in traditional distributed
settings.
76

A model of distributed computations
❑A distributed system consists of a set of processors that are
connected by a communication network.
❑The communication network provides the facility of information
exchange among processors.
❑The communication delay is finite but unpredictable.
❑The processors do not share a common global memory and
communicate solely by passing messages over the communication
network.
77

78
❑There is no physical global clock in the system to which processes have
instantaneous access.
❑The communication medium may deliver messages out of order, messages may
be lost, garbled, or duplicated due to timeout and retransmission, processors
may fail, and communication links may go down.
❑The system can be modeled as a directed graph in which vertices represent the
processes and edges represent unidirectional communication channels.
❑A distributed application runs as a collection of processes on a distributed
system

A distributed program
A distributed program is composed of a set of n asynchronous processes
p1, p2,... , pi,... , pn that communicate by message passing over the
communication network.
◦ we assume that each process is running on a different processor
The processes do not share a global memory and communicate solely by
passing messages.
❑Cij: denote the channel from process pi to process pj
❑mij: denote a message sent by pi to pj
79

◆The communication delay is finite and unpredictable.
◆Also, these processes do not share a global clock that is
instantaneously accessible to these processes
◆Process execution and message transfer are asynchronous
◆ a process may execute an action spontaneously and a process sending a
message does not wait for the delivery of the message to be complete.
80

❖The global state of a distributed computation is composed of
the states of the processes and the communication channels
● The state of a process is characterized by the state of its local
memory and depends upon the context.
● The state of a channel is characterized by the set of messages
in transit in the channel.
81

A model of distributed executions
● The execution of a process consists of a sequential execution
of its actions.
● The actions are atomic and the actions of a process are
modeled as three types of events:
● Internal events
● Message send events, and
● Message receive events
82

Space–Time Diagram of a Distributed Execution involving Three
Processes
85
1. A horizontal line represents the progress of the process.
2. A dot indicates an event.
3. A slant arrow indicates a message transfer.

87
For example, in Figure 2.1,
For example, in Figure 2.1,event e2
6
has the knowledge of all other
events shown in the figure.

88
For any two events ei and ej, denotes:
◦ Event ej does not directly or transitively dependent on event ei
◦ ie. Event ei does not causally affect event ej.
◦ Event ej is not aware of the execution of ei or any event executed
after ei on the same process.
For example, in Figure 2.1

Note the following two rules:
For any two events ei and ej, if and then events ei and
ej are said to be concurrent and the relation is denoted as ei || ej
Note that relation || is not transitive, ie.
For example, in Figure 2.1, however,
Note that for any two events ei and ej in a distributed execution, ei → ej
or ej → ei, or ei || ej
89

Logical vs. Physical Concurrency
In a distributed computation, two events are logically concurrent if and only if they do not
causally affect each other.
Physical concurrency, on the other hand, has a meaning that the events occur at the same
instant in physical time.
Two or more events may be logically concurrent even though they do not occur at the same
instant in physical time.
However, if processor speed and message delays would have been different, the execution of
these events could have very well coincided in physical time.
Whether a set of logically concurrent events coincide in the physical time or not, does not
change the outcome of the computation.
Therefore, even though a set of logically concurrent events may not have occurred at the same
instant in physical time, we can assume that these events occurred at the same instant in
physical time.
90

Models of communication networks
There are several models of the service provided by communication
networks, namely,
◦ FIFO (first-in, first-out): each channel acts as a first-in first-out message queue and
thus, message ordering is preserved by a channel
◦ non-FIFO: a channel acts like a set in which the sender process adds messages and
the receiver process removes messages from it in a random order
◦ causal ordering: is based on Lamport’s “happens before” relation. A system that
supports the causal ordering model satisfies the following property:
◦ This property ensures that causally related messages destined to the same
destination are delivered in an order that is consistent with their causality relation.
◦ Causally ordered delivery of messages implies FIFO message delivery.Furthermore,
note that
91

● Causal ordering model is useful in developing distributed algorithms.
● Generally, it considerably simplifies the design of distributed algorithms
because it provides a built-in synchronization.
● For example, in replicated database systems, it is important that every process
responsible for updating a replica receives the updates in the same order to
maintain database consistency.
● Without causal ordering, each update must be checked to ensure that database
consistency is not being violated. Causal ordering eliminates the need for such
checks.
92

Global state of a distributed system
● The global state of a distributed system is a collection of the local states of
its components, namely, the processes and the communication channels
● The state of a process at any time is defined by the contents of processor
registers, stacks, local memory, etc. and depends on the local context of the
distributed application.
● The state of a channel is given by the set of messages in transit in the
channel.
● The occurrence of events changes the states of respective processes and
channels, thus causing transitions in global system state. For eg,
⮚an internal event changes the state of the process at which it occurs.
⮚A send event (or a receive event) changes the state of the process that sends
(or receives) the message and the state of the channel on which the message
is sent (or received)
93

94
Let denote the xth
event at process pi
Let denote the state of process pi after the occurrence of event and
before the event
denotes the initial state of process pi
is a result of the execution of all the events executed by process pi till .
Let

The state of a channel is difficult to state formally because a channel is a distributed
entity and its state depends upon the states of the processes it connects.
Thus, channel state denotes all messages that pi sent up to event ei
x
and which
process pj had not received until event ej
y
95

Global state
96
● For a global snapshot to be meaningful, the states of all the components of the
distributed system must be recorded at the same instant. This will be possible if the
local clocks at processes were perfectly synchronized or there was a global system
clock that could be instantaneously read by the processes. However, both are
impossible.
● Even if the state of all the components in a distributed system has not been recorded at
the same instant, such a state will be meaningful provided every message that is
recorded as received is also recorded as sent.
● Basic idea is that an effect should not be present without its cause.
States Channels

Global state
• A message cannot be received if it was not sent; that is, the state should not
violate causality.
• Such states are called consistent global states and are meaningful global
states.
• Inconsistent global states are not meaningful in the sense that a distributed
system can never be in an inconsistent state.
97

A global state GS1 consisting of local states is inconsistent because
◦ the state of p2 has recorded the receipt of message m12,
◦ however, the state of p1 has not recorded its send.
On the contrary, a global state GS2 consisting of local states is consistent;
◦ all the channels are empty except C21 that contains message m21
99
The space–time diagram of a distributed execution.( Fig.2.2)

Cuts of a distributed computation
● In the space–time diagram of a distributed computation, a zigzag line
joining one arbitrary point on each process line is termed a cut in the
computation.
● Such a line slices the space–time diagram, and thus the set of events in the
distributed computation, into a PAST and a FUTURE.
● The PAST contains all the events to the left of the cut and the FUTURE
contains all the events to the right of the cut.
● For a cut C, let PAST(C) and FUTURE(C) denote the set of events in the
PAST and FUTURE of C, respectively.
● Every cut corresponds to a global state and every global state can be
graphically represented as a cut in the computation’s space–time diagram
101

A consistent global state corresponds to a cut in which every
message received in the PAST of the cut was sent in the PAST
of that cut.
Such a cut is known as a consistent cut.
All messages that cross the cut from the PAST to the FUTURE
are in transit in the corresponding consistent global state.
A cut is inconsistent if a message crosses the cut from the
FUTURE to the PAST
102

103
C1 is an inconsistent cut, whereas C2 is a consistent
cut.

Past and future cones of an event
an event ej could have been affected only by all events ei such that ei →
ej and all the information available at ei could be made accessible at ej.
All such events ei belong to the past of ej.
Let Past(ej) denote all events in the past of ej in a computation (H, →).
Then,
104

Models of process communications
There are two basic models of process communications
◦ synchronous
◦ asynchronous.
106

The synchronous communication model is a blocking type where on
a message send, the sender process blocks until the message has
been received by the receiver process.
◦ The sender process resumes execution only after it learns that the receiver
process has accepted the message.
◦ Thus, the sender and the receiver processes must synchronize to exchange a
message.
107

Asynchronous communication model is a non-blocking type where the sender and the receiver do
not synchronize to exchange a message.
After having sent a message, the sender process does not wait for the message to be delivered to
the receiver process.
The message is buffered by the system and is delivered to the receiver process when it is ready to
accept the message.
A buffer overflow may occur if a process sends a large a number of messages in a burst to another
process.
Asynchronous communication provides higher parallelism because the sender process can execute
while the message is in transit to the receiver
due to higher degree of parallelism and non-determinism, it is much more difficult to design,
verify, and implement distributed algorithms for asynchronous communications
108

CST 402 Distributed Computing Module 1 Notes

More Related Content

Similar to CST 402 Distributed Computing Module 1 Notes (20)

Recently uploaded (20)

CST 402 Distributed Computing Module 1 Notes