SlideShare a Scribd company logo
Programming using
MPI
&
OpenMP
HIGH PERFORMANCE COMPUTING
MODULE 4
DIVYA TIWARI
MEIT
TERNA ENGINEERING COLLEGE
INTRODUCTION
• Parallel computing has made a tremendous impact on a variety of areas from computational
simulations for scientific and engineering applications to commercial applications in data
mining and transaction processing.
• Numerous programming languages and libraries have been developed for explicit parallel
programming.
• They differ - in their view of the address space that they make available to the programmer
- the degree of synchronization imposed on concurrent activities
- the multiplicity of programs.
• The message-passing programming (MPI) is one of the oldest and most widely used
approaches for programming parallel computers.
Principles
Key Atrributes of MPI
Assumes partitioned
address space
Supports only explicit
parallelization
• Implications of Partitioned address space:
1. Each data must belong to one of the partition of the space.
2. Interaction requires co-operation of two processes (the process that has the data
and process that wants to access the data.)
• Explicit Parallelization:
Programmer is responsible for analysing underlying serial algorithm/application and
identifying ways by which he or she can decompose the computations and extract
concurrency.
Structure of Message Passing Program
• Message-passing programs are often written using:
1. Asynchronous
2. Loosely Synchronous
• In its most general form, the message-passing paradigm supports execution of a different
program on each of the p processes.
• Most message-passing programs are written using the single program multiple data (SPMD)
approach.
Building Blocks: Send and Receive Operations
• Communication between the processes are accomplished by sending and receiving
messages.
• Basic operations in message passing program are send and receive.
• Example:
send(void *sendbuf, int nelems, int dest)
receive(void *recvbuf, int nelems, int source)
sendbuf - points to a buffer that stores the data to be sent.
recvbuf - points to a buffer that stores the data to be received.
nelems - is the number of data units to be sent and received.
dest - is the identifier of the process that receives the data.
source - is the identifier of the process that sends the data.
• Example: A process sending a piece of data to another process
send(void *sendbuf, int nelems, int dest)
receive(void *recvbuf, int nelems, int source)
1 P0 P1
2
3 a = 100; receive(&a, 1, 0)
4 send(&a, 1, 1); printf("%dn", a);
5 a=0;
• In message passing operation there are two types:
1. Blocking Message Passing Operations.
i. Blocking Non-Buffered Send/Receive
ii. Blocking Buffered Send/Receive
2. Non-Blocking Message Passing Operations.
Blocking Message Passing Operations
i. Blocking Non-Buffered Send/Receive
ii. Blocking Buffered Send/Receive
a. In the presence of communication
hardware with buffers at send and
receive ends
b. In the absence of communication
hardware sender interrupts receiver and
deposits data in buffer at receiver end
Non-Blocking Message Passing Operations
Space of possible protocols for send and receive operations.
Non-blocking non-buffered send and receive operations
(a) in absence of communication hardware
(b) in presence of communication hardware
MPI: the Message Passing Interface
• Message-passing architecture is used in parallel computer due to its lower cost relative to
shared-address-space architectures.
• Message-passing is the natural programming paradigm for these machines leads to
development of different message-passing libraries.
• These message-passing libraries works well in vendors own hardware but was incompatible
with parallel computers offered by other vendors.
• Difference between libraries are mostly syntactic but have some serious semantic
differences which require significant re-engineering to port a message-passing program
from one library to another.
• The message-passing interface, or MPI as it is commonly known, was created to essentially
solve this problem.
• The MPI library contains over 125 routines, but the number of key concepts is much smaller.
• It is possible to write fully-functional message-passing programs by using only the six
routines shown below:
1. MPI_Init : Initializes MPI.
2. MPI_Finalize : Terminates MPI.
3. MPI_Comm_size : Determines the number of processes.
4. MPI_Comm_rank : Determines the label of the calling process.
5. MPI_Send : Sends a message.
6. MPI_Recv : Receives a message.
Overlapping Communication with
Computation
• MPI programs are developed so far used blocking send and receive operations whenever
they needed to perform point-to-point communication.
• For Example:
• Consider Cannon's matrix-matrix multiplication program.
• During each iteration of its main computational loop (lines 47– 57), it first computes the matrix
multiplication of the sub-matrices stored in a and b, and then shifts the blocks of a and b, using
MPI_Sendrecv_replace which blocks until the specified matrix block has been sent and received by the
corresponding processes.
• In each iteration, each process spends O (n3/ p1.5) time for performing the matrix-matrix multiplication
and O(n2/p) time for shifting the blocks of matrices A and B.
• Now, since the blocks of matrices A and B do not change as they are shifted among the processors, it will
be preferable if we can overlap the transmission of these blocks with the computation for the matrix-
matrix multiplication, as many recent distributed-memory parallel computers have dedicated
communication controllers that can perform the transmission of messages without interrupting the CPUs.
Non-Blocking Communication Operations
• MPI provide pairs of functions for performing non-blocking send and receive operations to
overlap communication with computation.
• These functions are:
• MPI_Isend:
MPI_Isend starts a send operation but does not complete, that is, it returns before the data is copied out
of the buffer.
• MPI_Irecv:
MPI_Irecv starts a receive operation but returns before the data has been received and copied into the
buffer.
• MPI_Test:
MPI_TEST tests whether or not a non-blocking operation has finished.
• MPI_Wait:
waits (i.e., gets blocked) until a non-blocking operation actually finishes.
• With the support of appropriate hardware, the transmission and reception of messages can
proceed concurrently with the computations performed by the program upon the return of
the above functions.
Collective Communication and Computation
Operation
• MPI provides an extensive set of functions for performing many commonly used collective
communication operations.
• All of the collective communication functions provided by MPI take as an argument a
communicator that defines the group of processes that participate in the collective
operation.
• All the processes that belong to this communicator participate in the operation, and all of
them must call the collective communication function.
• Even though collective communication operations do not act like barriers (i.e., it is possible
for a processor to go past its call for the collective communication operation even before
other processes have reached it), it acts like a virtual synchronization step in the following
sense: the parallel program should be written such that it behaves correctly even if a global
synchronization is performed before and after the collective call.
• Since the operations are virtually synchronous, they do not require tags. In some of the
collective functions data is required to be sent from a single process (source-process) or to
be received by a single process (target-process).
• In these functions, the source- or target-process is one of the arguments supplied to the
routines. All the processes in the group (i.e., communicator) must specify the same source-
or target-process.
• For most collective communication operations, MPI provides two different variants. The
first transfers equal-size data to or from each process, and the second transfers data that can
be of different sizes.
1. Barrier
• The barrier synchronization operation is performed in MPI using the MPI_Barrier function.
int MPI_Barrier(MPI_Comm comm)
2. Broadcast
• The one-to-all broadcast operation is performed in MPI using the MPI_Bcast function.
int MPI_Bcast(void *buf, int count, MPI_Datatype datatype,
int source, MPI_Comm comm)
3. Reduction
• The all-to-one reduction operation is performed in MPI using the MPI_Reduce function.
int MPI_Reduce(void *sendbuf, void *recvbuf, int count,
MPI_Datatype datatype, MPI_Op op, int target,
MPI_Comm comm)
4. Prefix
• The prefix-sum operation is performed in MPI using the MPI_Scan function.
int MPI_Scan(void *sendbuf, void *recvbuf, int count,
MPI_Datatype datatype, MPI_Op op, MPI_Comm comm)
5. Gather
• The gather operation is performed in MPI using the MPI_Gather function.
int MPI_Gather(void *sendbuf, int sendcount,
MPI_Datatype senddatatype, void *recvbuf, int recvcount,
MPI_Datatype recvdatatype, int target, MPI_Comm comm)
6. Scatter
• The scatter operation described is performed in MPI using the MPI_Scatter function.
int MPI_Scatter(void *sendbuf, int sendcount,
MPI_Datatype senddatatype, void *recvbuf, int recvcount,
MPI_Datatype recvdatatype, int source, MPI_Comm comm)
7. All-to-All
• The all-to-all personalized communication operation described in Section 4.5 is performed
in MPI by using the MPI_Alltoall function.
int MPI_Alltoall(void *sendbuf, int sendcount,
MPI_Datatype senddatatype, void *recvbuf, int recvcount,
MPI_Datatype recvdatatype, MPI_Comm comm)
OpenMP Parallel Programming Model
• OpenMP is an API that can be used with FORTRAN, C, and C++ for programming shared
address space machines.
• OpenMP directives provide support for concurrency, synchronization and data handling
while obviating the need for explicitly setting up mutexes, condition variables, data scope,
and initialization.
• OpenMP directives in C and C++ are based on the #pragma compiler directives. The
directive itself consists of a directive name followed by clauses.
1 #pragma omp directive [clause list]
• OpenMP programs execute serially until they encounter the parallel directive.
• parallel directive is responsible for creating a group of threads.
• The main thread that encounters the parallel directive becomes the master of this group of
threads and is assigned the thread id 0 within the group.
• The parallel directive has the following prototype:
1 #pragma omp parallel [clause list]
2 /* structured block */
3
• Each thread created by this directive executes the structured block specified by the parallel
directive.
• The clause list is used to specify conditional parallelization, number of threads, and data
handling.
• Conditional Parallelization: The clause if (scalar expression) determines whether the
parallel construct results in creation of threads. Only one if clause can be used with a
parallel directive.
• Degree of Concurrency: The clause num_threads (integer expression) specifies the
number of threads that are created by the parallel directive.
• Data Handling: The clause private (variable list) indicates that the set of variables
specified is local to each thread – i.e., each thread has its own copy of each variable in the
list.
• The clause firstprivate (variable list) is similar to the private clause, except the values of
variables on entering the threads are initialized to corresponding values before the parallel
directive.
• The clause shared (variable list) indicates that all variables in the list are shared across all
the threads, i.e., there is only one copy. Special care must be taken while handling these
variables by threads to ensure serializability.
A sample OpenMP program along with its Pthreads translation that might be performed
by an OpenMP compiler.
• Given below are powerful set of OpenMP compiler directives:
1. parallel: which precedes a block of code to be executed in parallel by multiple threads.
2. for: which precedes a for loop with independent iterations that may be divided among
threads executing in parallel.
3. parallel for: a combination of the parallel and for directives.
4. sections: which precedes a series of blocks that may be executed in parallel.
5. parallel sections: a combination of the parallel and sections directives.
6. critical: which precedes a critical section.
7. single: which precedes a code block to be executed by a single thread.
Shared Memory Model
• Processors interact and synchronize with each other through shared variables.
Fork/Join Parallelism
• Initially only master thread is active.
• Master thread executes sequential code.
• Fork: Master thread creates or awakens additional threads to execute parallel code.
• Join: At end of parallel code created threads die or are suspended.
Parallel for Loops
• C programs often express data-parallel operations as for loops
for (i = first; i < size; i += prime)
marked[i] = 1;
• OpenMP makes it easy to indicate when the iterations of a loop may execute in
parallel.
• Compiler takes care of generating code that forks/joins threads and allocates the
iterations to threads
Shared and Private Variables
• Shared variable: has same address in execution context of every thread.
• Private variable: has different address in execution context of every thread.
• A thread cannot access the private variables of another thread.
Function omp_get_num_procs
• Returns number of physical processors available for use by the parallel program
int omp_get_num_procs (void)
Function omp_set_num_threads
• Uses the parameter value to set the number of threads to be active in parallel
sections of code.
• May be called at multiple points in a program.
void omp_set_num_threads (int t)
MU Exam Questions
May 2018
• Explain the concept of shared memory programming. 5 marks
• Explain in brief about Performance bottleneck, Data Race and Determinism, Data Race
Avoidance and Deadlock Avoidance. 10 marks
Dec 2018
• Discuss the term collective communication in MPI. 5 marks
• Differentiate between buffered blocking and non-buffered blocking message passing
operation in MPI. 10 marks
May 2019
• Discuss the term collective communication in MPI. 5 marks
• Differentiate between buffered blocking and non-buffered blocking message passing
operation in MPI. 10 marks
• Write a small program demonstrating functional and compiler directives in OpenMP
paradigm and MP paradigm.
Research Paper
Programming using MPI and OpenMP
Ad

Recommended

Naming in Distributed Systems
Naming in Distributed Systems
Nandakumar P
 
Message and Stream Oriented Communication
Message and Stream Oriented Communication
Dilum Bandara
 
Message passing in Distributed Computing Systems
Message passing in Distributed Computing Systems
Alagappa Govt Arts College, Karaikudi
 
VTU 6th Sem Elective CSE - Module 3 cloud computing
VTU 6th Sem Elective CSE - Module 3 cloud computing
Sachin Gowda
 
Distributed Systems Naming
Distributed Systems Naming
Ahmed Magdy Ezzeldin, MSc.
 
Multiprocessors(performance and synchronization issues)
Multiprocessors(performance and synchronization issues)
Gaurav Dalvi
 
Operating System Overview
Operating System Overview
Anas Ebrahim
 
process and thread.pptx
process and thread.pptx
HamzaxTv
 
chapter 2 architecture
chapter 2 architecture
Sharda University Greater Noida
 
Os presentation
Os presentation
Daffodil International University
 
Multiprocessor Architecture (Advanced computer architecture)
Multiprocessor Architecture (Advanced computer architecture)
vani261
 
Unit 3
Unit 3
Assistant Professor
 
distributed shared memory
distributed shared memory
Ashish Kumar
 
GUI components in Java
GUI components in Java
kirupasuchi1996
 
Unit 4
Unit 4
Ravi Kumar
 
Multiprocessor Scheduling
Multiprocessor Scheduling
Khadija Saleem
 
Case study of amazon EC2 by Akash Badone
Case study of amazon EC2 by Akash Badone
Akash Badone
 
Performance Evaluation of Different Data Mining Classification Algorithm and ...
Performance Evaluation of Different Data Mining Classification Algorithm and ...
IOSR Journals
 
INTER PROCESS COMMUNICATION (IPC).pptx
INTER PROCESS COMMUNICATION (IPC).pptx
LECO9
 
Clock synchronization in distributed system
Clock synchronization in distributed system
Sunita Sahu
 
Network Management Principles and Practice - 2nd Edition (2010)_2.pdf
Network Management Principles and Practice - 2nd Edition (2010)_2.pdf
Smt. Indira Gandhi College of Engineering, Navi Mumbai, Mumbai
 
General purpose simulation System (GPSS)
General purpose simulation System (GPSS)
Tushar Aneyrao
 
Distributed computing
Distributed computing
Subhash Basistha
 
SYNCHRONIZATION IN MULTIPROCESSING
SYNCHRONIZATION IN MULTIPROCESSING
Aparna Bhadran
 
Computer network switching
Computer network switching
Shivani Godha
 
Program Threats
Program Threats
guestab0ee0
 
OOAD
OOAD
yndaravind
 
Multicast Routing Protocols
Multicast Routing Protocols
Ram Dutt Shukla
 
Chap6 slides
Chap6 slides
BaliThorat1
 
My ppt hpc u4
My ppt hpc u4
Vidyalankar Institute of Technology
 

More Related Content

What's hot (20)

chapter 2 architecture
chapter 2 architecture
Sharda University Greater Noida
 
Os presentation
Os presentation
Daffodil International University
 
Multiprocessor Architecture (Advanced computer architecture)
Multiprocessor Architecture (Advanced computer architecture)
vani261
 
Unit 3
Unit 3
Assistant Professor
 
distributed shared memory
distributed shared memory
Ashish Kumar
 
GUI components in Java
GUI components in Java
kirupasuchi1996
 
Unit 4
Unit 4
Ravi Kumar
 
Multiprocessor Scheduling
Multiprocessor Scheduling
Khadija Saleem
 
Case study of amazon EC2 by Akash Badone
Case study of amazon EC2 by Akash Badone
Akash Badone
 
Performance Evaluation of Different Data Mining Classification Algorithm and ...
Performance Evaluation of Different Data Mining Classification Algorithm and ...
IOSR Journals
 
INTER PROCESS COMMUNICATION (IPC).pptx
INTER PROCESS COMMUNICATION (IPC).pptx
LECO9
 
Clock synchronization in distributed system
Clock synchronization in distributed system
Sunita Sahu
 
Network Management Principles and Practice - 2nd Edition (2010)_2.pdf
Network Management Principles and Practice - 2nd Edition (2010)_2.pdf
Smt. Indira Gandhi College of Engineering, Navi Mumbai, Mumbai
 
General purpose simulation System (GPSS)
General purpose simulation System (GPSS)
Tushar Aneyrao
 
Distributed computing
Distributed computing
Subhash Basistha
 
SYNCHRONIZATION IN MULTIPROCESSING
SYNCHRONIZATION IN MULTIPROCESSING
Aparna Bhadran
 
Computer network switching
Computer network switching
Shivani Godha
 
Program Threats
Program Threats
guestab0ee0
 
OOAD
OOAD
yndaravind
 
Multicast Routing Protocols
Multicast Routing Protocols
Ram Dutt Shukla
 
Multiprocessor Architecture (Advanced computer architecture)
Multiprocessor Architecture (Advanced computer architecture)
vani261
 
distributed shared memory
distributed shared memory
Ashish Kumar
 
Multiprocessor Scheduling
Multiprocessor Scheduling
Khadija Saleem
 
Case study of amazon EC2 by Akash Badone
Case study of amazon EC2 by Akash Badone
Akash Badone
 
Performance Evaluation of Different Data Mining Classification Algorithm and ...
Performance Evaluation of Different Data Mining Classification Algorithm and ...
IOSR Journals
 
INTER PROCESS COMMUNICATION (IPC).pptx
INTER PROCESS COMMUNICATION (IPC).pptx
LECO9
 
Clock synchronization in distributed system
Clock synchronization in distributed system
Sunita Sahu
 
General purpose simulation System (GPSS)
General purpose simulation System (GPSS)
Tushar Aneyrao
 
SYNCHRONIZATION IN MULTIPROCESSING
SYNCHRONIZATION IN MULTIPROCESSING
Aparna Bhadran
 
Computer network switching
Computer network switching
Shivani Godha
 
Multicast Routing Protocols
Multicast Routing Protocols
Ram Dutt Shukla
 

Similar to Programming using MPI and OpenMP (20)

Chap6 slides
Chap6 slides
BaliThorat1
 
My ppt hpc u4
My ppt hpc u4
Vidyalankar Institute of Technology
 
Parallel and Distributed Computing Chapter 10
Parallel and Distributed Computing Chapter 10
AbdullahMunir32
 
Smalland Survive the Wilds v1.6.2 Free Download
Smalland Survive the Wilds v1.6.2 Free Download
elonbuda
 
Cricket 07 Download For Pc Windows 7,10,11 Free
Cricket 07 Download For Pc Windows 7,10,11 Free
michaelsatle759
 
TVersity Pro Media Server Free CRACK Download
TVersity Pro Media Server Free CRACK Download
softcover72
 
ScreenHunter Pro 7 Free crack Download
ScreenHunter Pro 7 Free crack Download
sgabar822
 
Arcsoft TotalMedia Theatre crack Free 2025 Download
Arcsoft TotalMedia Theatre crack Free 2025 Download
gangpage308
 
CyberLink MediaShow Ultra Free CRACK 6.0.10019 Download
CyberLink MediaShow Ultra Free CRACK 6.0.10019 Download
castp261
 
AutoCAD 2025 Crack By Autodesk Free Serial Number
AutoCAD 2025 Crack By Autodesk Free Serial Number
fizaabbas585
 
Wondershare Filmora Crack 2025 For Windows Free
Wondershare Filmora Crack 2025 For Windows Free
tanveerkhansahabkp027
 
Smalland Survive the Wilds v1.6.2 Free Download
Smalland Survive the Wilds v1.6.2 Free Download
mohsinrazakpa43
 
ScreenHunter Pro 7 Free crack Download 2025
ScreenHunter Pro 7 Free crack Download 2025
mohsinrazakpa43
 
CyberLink MediaShow Ultra Free CRACK 6.0.10019 Download
CyberLink MediaShow Ultra Free CRACK 6.0.10019 Download
mohsinrazakpa43
 
Wondershare Filmora Crack 2025 For Windows Free
Wondershare Filmora Crack 2025 For Windows Free
mohsinrazakpa43
 
Arcsoft TotalMedia Theatre crack Free 2025 Download
Arcsoft TotalMedia Theatre crack Free 2025 Download
mohsinrazakpa43
 
TVersity Pro Media Server Free CRACK Download
TVersity Pro Media Server Free CRACK Download
mohsinrazakpa43
 
Wondershare Filmora Crack 2025 For Windows Free
Wondershare Filmora Crack 2025 For Windows Free
blouch10kp
 
MPI Presentation
MPI Presentation
Tayfun Sen
 
Introduction to MPI Basics easy way.pptx
Introduction to MPI Basics easy way.pptx
imareebkhan25
 
Parallel and Distributed Computing Chapter 10
Parallel and Distributed Computing Chapter 10
AbdullahMunir32
 
Smalland Survive the Wilds v1.6.2 Free Download
Smalland Survive the Wilds v1.6.2 Free Download
elonbuda
 
Cricket 07 Download For Pc Windows 7,10,11 Free
Cricket 07 Download For Pc Windows 7,10,11 Free
michaelsatle759
 
TVersity Pro Media Server Free CRACK Download
TVersity Pro Media Server Free CRACK Download
softcover72
 
ScreenHunter Pro 7 Free crack Download
ScreenHunter Pro 7 Free crack Download
sgabar822
 
Arcsoft TotalMedia Theatre crack Free 2025 Download
Arcsoft TotalMedia Theatre crack Free 2025 Download
gangpage308
 
CyberLink MediaShow Ultra Free CRACK 6.0.10019 Download
CyberLink MediaShow Ultra Free CRACK 6.0.10019 Download
castp261
 
AutoCAD 2025 Crack By Autodesk Free Serial Number
AutoCAD 2025 Crack By Autodesk Free Serial Number
fizaabbas585
 
Wondershare Filmora Crack 2025 For Windows Free
Wondershare Filmora Crack 2025 For Windows Free
tanveerkhansahabkp027
 
Smalland Survive the Wilds v1.6.2 Free Download
Smalland Survive the Wilds v1.6.2 Free Download
mohsinrazakpa43
 
ScreenHunter Pro 7 Free crack Download 2025
ScreenHunter Pro 7 Free crack Download 2025
mohsinrazakpa43
 
CyberLink MediaShow Ultra Free CRACK 6.0.10019 Download
CyberLink MediaShow Ultra Free CRACK 6.0.10019 Download
mohsinrazakpa43
 
Wondershare Filmora Crack 2025 For Windows Free
Wondershare Filmora Crack 2025 For Windows Free
mohsinrazakpa43
 
Arcsoft TotalMedia Theatre crack Free 2025 Download
Arcsoft TotalMedia Theatre crack Free 2025 Download
mohsinrazakpa43
 
TVersity Pro Media Server Free CRACK Download
TVersity Pro Media Server Free CRACK Download
mohsinrazakpa43
 
Wondershare Filmora Crack 2025 For Windows Free
Wondershare Filmora Crack 2025 For Windows Free
blouch10kp
 
MPI Presentation
MPI Presentation
Tayfun Sen
 
Introduction to MPI Basics easy way.pptx
Introduction to MPI Basics easy way.pptx
imareebkhan25
 
Ad

More from Divya Tiwari (13)

Digital stick by Divya & Kanti
Digital stick by Divya & Kanti
Divya Tiwari
 
Predicting house price
Predicting house price
Divya Tiwari
 
Testing strategies -2
Testing strategies -2
Divya Tiwari
 
Testing strategies part -1
Testing strategies part -1
Divya Tiwari
 
Performance measures
Performance measures
Divya Tiwari
 
IoT applications and use cases part-2
IoT applications and use cases part-2
Divya Tiwari
 
Io t applications and use cases part-1
Io t applications and use cases part-1
Divya Tiwari
 
Planning for security and security audit process
Planning for security and security audit process
Divya Tiwari
 
Security management concepts and principles
Security management concepts and principles
Divya Tiwari
 
Web services
Web services
Divya Tiwari
 
Responsive web design with html5 and css3
Responsive web design with html5 and css3
Divya Tiwari
 
Mac protocols for ad hoc wireless networks
Mac protocols for ad hoc wireless networks
Divya Tiwari
 
Routing protocols for ad hoc wireless networks
Routing protocols for ad hoc wireless networks
Divya Tiwari
 
Digital stick by Divya & Kanti
Digital stick by Divya & Kanti
Divya Tiwari
 
Predicting house price
Predicting house price
Divya Tiwari
 
Testing strategies -2
Testing strategies -2
Divya Tiwari
 
Testing strategies part -1
Testing strategies part -1
Divya Tiwari
 
Performance measures
Performance measures
Divya Tiwari
 
IoT applications and use cases part-2
IoT applications and use cases part-2
Divya Tiwari
 
Io t applications and use cases part-1
Io t applications and use cases part-1
Divya Tiwari
 
Planning for security and security audit process
Planning for security and security audit process
Divya Tiwari
 
Security management concepts and principles
Security management concepts and principles
Divya Tiwari
 
Responsive web design with html5 and css3
Responsive web design with html5 and css3
Divya Tiwari
 
Mac protocols for ad hoc wireless networks
Mac protocols for ad hoc wireless networks
Divya Tiwari
 
Routing protocols for ad hoc wireless networks
Routing protocols for ad hoc wireless networks
Divya Tiwari
 
Ad

Recently uploaded (20)

最新版美国圣莫尼卡学院毕业证(SMC毕业证书)原版定制
最新版美国圣莫尼卡学院毕业证(SMC毕业证书)原版定制
Taqyea
 
Understanding Amplitude Modulation : A Guide
Understanding Amplitude Modulation : A Guide
CircuitDigest
 
WIRELESS COMMUNICATION SECURITY AND IT’S PROTECTION METHODS
WIRELESS COMMUNICATION SECURITY AND IT’S PROTECTION METHODS
samueljackson3773
 
02 - Ethics & Professionalism - BEM, IEM, MySET.PPT
02 - Ethics & Professionalism - BEM, IEM, MySET.PPT
SharinAbGhani1
 
60 Years and Beyond eBook 1234567891.pdf
60 Years and Beyond eBook 1234567891.pdf
waseemalazzeh
 
How Binning Affects LED Performance & Consistency.pdf
How Binning Affects LED Performance & Consistency.pdf
Mina Anis
 
362 Alec Data Center Solutions-Slysium Data Center-AUH-Glands & Lugs, Simplex...
362 Alec Data Center Solutions-Slysium Data Center-AUH-Glands & Lugs, Simplex...
djiceramil
 
Cadastral Maps
Cadastral Maps
Google
 
COMPOSITE COLUMN IN STEEL CONCRETE COMPOSITES.ppt
COMPOSITE COLUMN IN STEEL CONCRETE COMPOSITES.ppt
ravicivil
 
Fundamentals of Digital Design_Class_21st May - Copy.pptx
Fundamentals of Digital Design_Class_21st May - Copy.pptx
drdebarshi1993
 
Low Power SI Class E Power Amplifier and Rf Switch for Health Care
Low Power SI Class E Power Amplifier and Rf Switch for Health Care
ieijjournal
 
4th International Conference on Computer Science and Information Technology (...
4th International Conference on Computer Science and Information Technology (...
ijait
 
NALCO Green Anode Plant,Compositions of CPC,Pitch
NALCO Green Anode Plant,Compositions of CPC,Pitch
arpitprachi123
 
Machine Learning - Classification Algorithms
Machine Learning - Classification Algorithms
resming1
 
Industry 4.o the fourth revolutionWeek-2.pptx
Industry 4.o the fourth revolutionWeek-2.pptx
KNaveenKumarECE
 
Decoding Kotlin - Your Guide to Solving the Mysterious in Kotlin - Devoxx PL ...
Decoding Kotlin - Your Guide to Solving the Mysterious in Kotlin - Devoxx PL ...
João Esperancinha
 
Center Enamel can Provide Aluminum Dome Roofs for diesel tank.docx
Center Enamel can Provide Aluminum Dome Roofs for diesel tank.docx
CenterEnamel
 
IPL_Logic_Flow.pdf Mainframe IPLMainframe IPL
IPL_Logic_Flow.pdf Mainframe IPLMainframe IPL
KhadijaKhadijaAouadi
 
362 Alec Data Center Solutions-Slysium Data Center-AUH-ABB Furse.pdf
362 Alec Data Center Solutions-Slysium Data Center-AUH-ABB Furse.pdf
djiceramil
 
Impurities of Water and their Significance.pptx
Impurities of Water and their Significance.pptx
dhanashree78
 
最新版美国圣莫尼卡学院毕业证(SMC毕业证书)原版定制
最新版美国圣莫尼卡学院毕业证(SMC毕业证书)原版定制
Taqyea
 
Understanding Amplitude Modulation : A Guide
Understanding Amplitude Modulation : A Guide
CircuitDigest
 
WIRELESS COMMUNICATION SECURITY AND IT’S PROTECTION METHODS
WIRELESS COMMUNICATION SECURITY AND IT’S PROTECTION METHODS
samueljackson3773
 
02 - Ethics & Professionalism - BEM, IEM, MySET.PPT
02 - Ethics & Professionalism - BEM, IEM, MySET.PPT
SharinAbGhani1
 
60 Years and Beyond eBook 1234567891.pdf
60 Years and Beyond eBook 1234567891.pdf
waseemalazzeh
 
How Binning Affects LED Performance & Consistency.pdf
How Binning Affects LED Performance & Consistency.pdf
Mina Anis
 
362 Alec Data Center Solutions-Slysium Data Center-AUH-Glands & Lugs, Simplex...
362 Alec Data Center Solutions-Slysium Data Center-AUH-Glands & Lugs, Simplex...
djiceramil
 
Cadastral Maps
Cadastral Maps
Google
 
COMPOSITE COLUMN IN STEEL CONCRETE COMPOSITES.ppt
COMPOSITE COLUMN IN STEEL CONCRETE COMPOSITES.ppt
ravicivil
 
Fundamentals of Digital Design_Class_21st May - Copy.pptx
Fundamentals of Digital Design_Class_21st May - Copy.pptx
drdebarshi1993
 
Low Power SI Class E Power Amplifier and Rf Switch for Health Care
Low Power SI Class E Power Amplifier and Rf Switch for Health Care
ieijjournal
 
4th International Conference on Computer Science and Information Technology (...
4th International Conference on Computer Science and Information Technology (...
ijait
 
NALCO Green Anode Plant,Compositions of CPC,Pitch
NALCO Green Anode Plant,Compositions of CPC,Pitch
arpitprachi123
 
Machine Learning - Classification Algorithms
Machine Learning - Classification Algorithms
resming1
 
Industry 4.o the fourth revolutionWeek-2.pptx
Industry 4.o the fourth revolutionWeek-2.pptx
KNaveenKumarECE
 
Decoding Kotlin - Your Guide to Solving the Mysterious in Kotlin - Devoxx PL ...
Decoding Kotlin - Your Guide to Solving the Mysterious in Kotlin - Devoxx PL ...
João Esperancinha
 
Center Enamel can Provide Aluminum Dome Roofs for diesel tank.docx
Center Enamel can Provide Aluminum Dome Roofs for diesel tank.docx
CenterEnamel
 
IPL_Logic_Flow.pdf Mainframe IPLMainframe IPL
IPL_Logic_Flow.pdf Mainframe IPLMainframe IPL
KhadijaKhadijaAouadi
 
362 Alec Data Center Solutions-Slysium Data Center-AUH-ABB Furse.pdf
362 Alec Data Center Solutions-Slysium Data Center-AUH-ABB Furse.pdf
djiceramil
 
Impurities of Water and their Significance.pptx
Impurities of Water and their Significance.pptx
dhanashree78
 

Programming using MPI and OpenMP

  • 1. Programming using MPI & OpenMP HIGH PERFORMANCE COMPUTING MODULE 4 DIVYA TIWARI MEIT TERNA ENGINEERING COLLEGE
  • 2. INTRODUCTION • Parallel computing has made a tremendous impact on a variety of areas from computational simulations for scientific and engineering applications to commercial applications in data mining and transaction processing. • Numerous programming languages and libraries have been developed for explicit parallel programming. • They differ - in their view of the address space that they make available to the programmer - the degree of synchronization imposed on concurrent activities - the multiplicity of programs. • The message-passing programming (MPI) is one of the oldest and most widely used approaches for programming parallel computers.
  • 3. Principles Key Atrributes of MPI Assumes partitioned address space Supports only explicit parallelization • Implications of Partitioned address space: 1. Each data must belong to one of the partition of the space. 2. Interaction requires co-operation of two processes (the process that has the data and process that wants to access the data.) • Explicit Parallelization: Programmer is responsible for analysing underlying serial algorithm/application and identifying ways by which he or she can decompose the computations and extract concurrency.
  • 4. Structure of Message Passing Program • Message-passing programs are often written using: 1. Asynchronous 2. Loosely Synchronous • In its most general form, the message-passing paradigm supports execution of a different program on each of the p processes. • Most message-passing programs are written using the single program multiple data (SPMD) approach.
  • 5. Building Blocks: Send and Receive Operations • Communication between the processes are accomplished by sending and receiving messages. • Basic operations in message passing program are send and receive. • Example: send(void *sendbuf, int nelems, int dest) receive(void *recvbuf, int nelems, int source) sendbuf - points to a buffer that stores the data to be sent. recvbuf - points to a buffer that stores the data to be received. nelems - is the number of data units to be sent and received. dest - is the identifier of the process that receives the data. source - is the identifier of the process that sends the data.
  • 6. • Example: A process sending a piece of data to another process send(void *sendbuf, int nelems, int dest) receive(void *recvbuf, int nelems, int source) 1 P0 P1 2 3 a = 100; receive(&a, 1, 0) 4 send(&a, 1, 1); printf("%dn", a); 5 a=0;
  • 7. • In message passing operation there are two types: 1. Blocking Message Passing Operations. i. Blocking Non-Buffered Send/Receive ii. Blocking Buffered Send/Receive 2. Non-Blocking Message Passing Operations.
  • 8. Blocking Message Passing Operations i. Blocking Non-Buffered Send/Receive
  • 9. ii. Blocking Buffered Send/Receive a. In the presence of communication hardware with buffers at send and receive ends b. In the absence of communication hardware sender interrupts receiver and deposits data in buffer at receiver end
  • 10. Non-Blocking Message Passing Operations Space of possible protocols for send and receive operations.
  • 11. Non-blocking non-buffered send and receive operations (a) in absence of communication hardware (b) in presence of communication hardware
  • 12. MPI: the Message Passing Interface • Message-passing architecture is used in parallel computer due to its lower cost relative to shared-address-space architectures. • Message-passing is the natural programming paradigm for these machines leads to development of different message-passing libraries. • These message-passing libraries works well in vendors own hardware but was incompatible with parallel computers offered by other vendors. • Difference between libraries are mostly syntactic but have some serious semantic differences which require significant re-engineering to port a message-passing program from one library to another. • The message-passing interface, or MPI as it is commonly known, was created to essentially solve this problem.
  • 13. • The MPI library contains over 125 routines, but the number of key concepts is much smaller. • It is possible to write fully-functional message-passing programs by using only the six routines shown below: 1. MPI_Init : Initializes MPI. 2. MPI_Finalize : Terminates MPI. 3. MPI_Comm_size : Determines the number of processes. 4. MPI_Comm_rank : Determines the label of the calling process. 5. MPI_Send : Sends a message. 6. MPI_Recv : Receives a message.
  • 14. Overlapping Communication with Computation • MPI programs are developed so far used blocking send and receive operations whenever they needed to perform point-to-point communication. • For Example: • Consider Cannon's matrix-matrix multiplication program. • During each iteration of its main computational loop (lines 47– 57), it first computes the matrix multiplication of the sub-matrices stored in a and b, and then shifts the blocks of a and b, using MPI_Sendrecv_replace which blocks until the specified matrix block has been sent and received by the corresponding processes. • In each iteration, each process spends O (n3/ p1.5) time for performing the matrix-matrix multiplication and O(n2/p) time for shifting the blocks of matrices A and B. • Now, since the blocks of matrices A and B do not change as they are shifted among the processors, it will be preferable if we can overlap the transmission of these blocks with the computation for the matrix- matrix multiplication, as many recent distributed-memory parallel computers have dedicated communication controllers that can perform the transmission of messages without interrupting the CPUs.
  • 15. Non-Blocking Communication Operations • MPI provide pairs of functions for performing non-blocking send and receive operations to overlap communication with computation. • These functions are: • MPI_Isend: MPI_Isend starts a send operation but does not complete, that is, it returns before the data is copied out of the buffer. • MPI_Irecv: MPI_Irecv starts a receive operation but returns before the data has been received and copied into the buffer. • MPI_Test: MPI_TEST tests whether or not a non-blocking operation has finished. • MPI_Wait: waits (i.e., gets blocked) until a non-blocking operation actually finishes. • With the support of appropriate hardware, the transmission and reception of messages can proceed concurrently with the computations performed by the program upon the return of the above functions.
  • 16. Collective Communication and Computation Operation • MPI provides an extensive set of functions for performing many commonly used collective communication operations. • All of the collective communication functions provided by MPI take as an argument a communicator that defines the group of processes that participate in the collective operation. • All the processes that belong to this communicator participate in the operation, and all of them must call the collective communication function. • Even though collective communication operations do not act like barriers (i.e., it is possible for a processor to go past its call for the collective communication operation even before other processes have reached it), it acts like a virtual synchronization step in the following sense: the parallel program should be written such that it behaves correctly even if a global synchronization is performed before and after the collective call.
  • 17. • Since the operations are virtually synchronous, they do not require tags. In some of the collective functions data is required to be sent from a single process (source-process) or to be received by a single process (target-process). • In these functions, the source- or target-process is one of the arguments supplied to the routines. All the processes in the group (i.e., communicator) must specify the same source- or target-process. • For most collective communication operations, MPI provides two different variants. The first transfers equal-size data to or from each process, and the second transfers data that can be of different sizes. 1. Barrier • The barrier synchronization operation is performed in MPI using the MPI_Barrier function. int MPI_Barrier(MPI_Comm comm) 2. Broadcast • The one-to-all broadcast operation is performed in MPI using the MPI_Bcast function. int MPI_Bcast(void *buf, int count, MPI_Datatype datatype, int source, MPI_Comm comm)
  • 18. 3. Reduction • The all-to-one reduction operation is performed in MPI using the MPI_Reduce function. int MPI_Reduce(void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int target, MPI_Comm comm) 4. Prefix • The prefix-sum operation is performed in MPI using the MPI_Scan function. int MPI_Scan(void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm) 5. Gather • The gather operation is performed in MPI using the MPI_Gather function. int MPI_Gather(void *sendbuf, int sendcount, MPI_Datatype senddatatype, void *recvbuf, int recvcount, MPI_Datatype recvdatatype, int target, MPI_Comm comm)
  • 19. 6. Scatter • The scatter operation described is performed in MPI using the MPI_Scatter function. int MPI_Scatter(void *sendbuf, int sendcount, MPI_Datatype senddatatype, void *recvbuf, int recvcount, MPI_Datatype recvdatatype, int source, MPI_Comm comm) 7. All-to-All • The all-to-all personalized communication operation described in Section 4.5 is performed in MPI by using the MPI_Alltoall function. int MPI_Alltoall(void *sendbuf, int sendcount, MPI_Datatype senddatatype, void *recvbuf, int recvcount, MPI_Datatype recvdatatype, MPI_Comm comm)
  • 20. OpenMP Parallel Programming Model • OpenMP is an API that can be used with FORTRAN, C, and C++ for programming shared address space machines. • OpenMP directives provide support for concurrency, synchronization and data handling while obviating the need for explicitly setting up mutexes, condition variables, data scope, and initialization. • OpenMP directives in C and C++ are based on the #pragma compiler directives. The directive itself consists of a directive name followed by clauses. 1 #pragma omp directive [clause list] • OpenMP programs execute serially until they encounter the parallel directive.
  • 21. • parallel directive is responsible for creating a group of threads. • The main thread that encounters the parallel directive becomes the master of this group of threads and is assigned the thread id 0 within the group. • The parallel directive has the following prototype: 1 #pragma omp parallel [clause list] 2 /* structured block */ 3 • Each thread created by this directive executes the structured block specified by the parallel directive. • The clause list is used to specify conditional parallelization, number of threads, and data handling. • Conditional Parallelization: The clause if (scalar expression) determines whether the parallel construct results in creation of threads. Only one if clause can be used with a parallel directive. • Degree of Concurrency: The clause num_threads (integer expression) specifies the number of threads that are created by the parallel directive.
  • 22. • Data Handling: The clause private (variable list) indicates that the set of variables specified is local to each thread – i.e., each thread has its own copy of each variable in the list. • The clause firstprivate (variable list) is similar to the private clause, except the values of variables on entering the threads are initialized to corresponding values before the parallel directive. • The clause shared (variable list) indicates that all variables in the list are shared across all the threads, i.e., there is only one copy. Special care must be taken while handling these variables by threads to ensure serializability.
  • 23. A sample OpenMP program along with its Pthreads translation that might be performed by an OpenMP compiler.
  • 24. • Given below are powerful set of OpenMP compiler directives: 1. parallel: which precedes a block of code to be executed in parallel by multiple threads. 2. for: which precedes a for loop with independent iterations that may be divided among threads executing in parallel. 3. parallel for: a combination of the parallel and for directives. 4. sections: which precedes a series of blocks that may be executed in parallel. 5. parallel sections: a combination of the parallel and sections directives. 6. critical: which precedes a critical section. 7. single: which precedes a code block to be executed by a single thread.
  • 25. Shared Memory Model • Processors interact and synchronize with each other through shared variables.
  • 26. Fork/Join Parallelism • Initially only master thread is active. • Master thread executes sequential code. • Fork: Master thread creates or awakens additional threads to execute parallel code. • Join: At end of parallel code created threads die or are suspended.
  • 27. Parallel for Loops • C programs often express data-parallel operations as for loops for (i = first; i < size; i += prime) marked[i] = 1; • OpenMP makes it easy to indicate when the iterations of a loop may execute in parallel. • Compiler takes care of generating code that forks/joins threads and allocates the iterations to threads
  • 28. Shared and Private Variables • Shared variable: has same address in execution context of every thread. • Private variable: has different address in execution context of every thread. • A thread cannot access the private variables of another thread.
  • 29. Function omp_get_num_procs • Returns number of physical processors available for use by the parallel program int omp_get_num_procs (void) Function omp_set_num_threads • Uses the parameter value to set the number of threads to be active in parallel sections of code. • May be called at multiple points in a program. void omp_set_num_threads (int t)
  • 30. MU Exam Questions May 2018 • Explain the concept of shared memory programming. 5 marks • Explain in brief about Performance bottleneck, Data Race and Determinism, Data Race Avoidance and Deadlock Avoidance. 10 marks Dec 2018 • Discuss the term collective communication in MPI. 5 marks • Differentiate between buffered blocking and non-buffered blocking message passing operation in MPI. 10 marks May 2019 • Discuss the term collective communication in MPI. 5 marks • Differentiate between buffered blocking and non-buffered blocking message passing operation in MPI. 10 marks • Write a small program demonstrating functional and compiler directives in OpenMP paradigm and MP paradigm.