SlideShare a Scribd company logo
1
Distributed Memory
Programming with MPI
Slides extended from
An Introduction to Parallel Programming by Peter
Pacheco
Dilum Bandara
Dilum.Bandara@uom.lk
2
Distributed Memory Systems
๏ฎ We discuss about developing programs for these
systems using MPI
๏ฎ MPI โ€“ Message Passing Interface
๏ฎ Are set of libraries that can be called from C, C++, &
Fortran
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
3
Why MPI?
๏ฎ Standardized & portable message-passing
system
๏ฎ One of the oldest libraries
๏ฎ Wide-spread adoption
๏ฎ Minimal requirements on underlying hardware
๏ฎ Explicit parallelization
๏ฎ Achieves high performance
๏ฎ Scales to a large no of processors
๏ฎ Intellectually demanding
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
4
Our First MPI Program
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
5
Compilation
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
mpicc -g -Wall -o mpi_hello mpi_hello.c
wrapper script to compile
turns on all warnings
source file
create this executable file name
(as opposed to default a.out)
produce
debugging
information
6
Execution
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
mpiexec -n <no of processes> <executable>
mpiexec -n 1 ./mpi_hello
mpiexec -n 4 ./mpi_hello
run with 1 process
run with 4 processes
7
Execution
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
mpiexec -n 1 ./mpi_hello
mpiexec -n 4 ./mpi_hello
Greetings from process 0 of 1 !
Greetings from process 0 of 4 !
Greetings from process 1 of 4 !
Greetings from process 2 of 4 !
Greetings from process 3 of 4 !
8
MPI Programs
๏ฎ Need to add mpi.h header file
๏ฎ Identifiers defined by MPI start with โ€œMPI_โ€
๏ฎ 1st letter following underscore is uppercase
๏ฎ For function names & MPI-defined types
๏ฎ Helps to avoid confusion
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
9
6 Golden MPI Functions
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
10
MPI Components
๏ฎ MPI_Init
๏ฎ Tells MPI to do all necessary setup
๏ฎ e.g., allocate storage for message buffers, decide rank of a
process
๏ฎ argc_p & argv_p are pointers to argc & argv
arguments in main( )
๏ฎ Function returns error codes
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
11
๏ฎ MPI_Finalize
๏ฎ Tells MPI weโ€™re done, so clean up anything allocated
for this program
MPI Components (Cont.)
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
12
Communicators
๏ฎ Collection of processes that can send messages
to each other
๏ฎ Messages from others communicators are ignored
๏ฎ MPI_Init defines a communicator that consists of
all processes created when the program is
started
๏ฎ Called MPI_COMM_WORLD
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
13
Communicators (Cont.)
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
My rank
(process making this call)
No of processes in the communicator
14
Single-Program Multiple-Data (SPMD)
๏ฎ We compile 1 program
๏ฎ Process 0 does something different
๏ฎ Receives messages & prints them while the
other processes do the work
๏ฎ if-else construct makes our program
SPMD
๏ฎ We can run this program on any no of
processors
๏ฎ e.g., 4, 8, 32, 1000, โ€ฆ
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
15
Communication
๏ฎ msg_buf_p, msg_size, msg_type
๏ฎ Determines content of message
๏ฎ dest โ€“ destination processorโ€™s rank
๏ฎ tag โ€“ use to distinguish messages that are identical in
content
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
16
Data Types
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
17
Communication (Cont.)
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
MPI_ANY_SOURCE to receive messages (from any
source) in order for arrival
18
Message Matching
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
MPI_Send
src = q
MPI_Recv
dest = r
r
q
19
Receiving Messages
๏ฎ Receiver can get a message without
knowing
๏ฎ Amount of data in message
๏ฎ Sender of message
๏ฎ Tag of message
๏ฎ How can those be found out?
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
20
How Much Data am I Receiving?
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
21
๏ฎ Exact behavior is determined by MPI
implementation
๏ฎ MPI_Send may behave differently with regard to
buffer size, cutoffs, & blocking
๏ฎ Cutoff
๏ฎ if message size < cutoff ๏ƒ  buffer
๏ฎ if message size โ‰ฅ cutoff ๏ƒ  MPI_Send will block
๏ฎ MPI_Recv always blocks until a matching
message is received
๏ฎ Preserve message ordering from a sender
๏ฎ Know your implementation
๏ฎ Donโ€™t make assumptions!
Issues With Send & Receive
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
22
Trapezoidal Rule
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
23
Trapezoidal Rule (Cont.)
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
24
Serial Pseudo-code
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
25
Parallel Pseudo-Code
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
26
Tasks & Communications for
Trapezoidal Rule
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
27
First Version
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
28
First Version (Cont.)
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
29
First Version (Cont.)
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
30
COLLECTIVE
COMMUNICATION
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
31
Collective Communication
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
A tree-structured global sum
32
Alternative Tree-Structured Global Sum
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
Which is most optimum?
Can we do better?
33
MPI_Reduce
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
34
Predefined Reduction Operators
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
35
Collective vs. Point-to-Point Communications
๏ฎ All processes in the communicator must call the
same collective function
๏ฎ e.g., a program that attempts to match a call to
MPI_Reduce on 1 process with a call to MPI_Recv on
another process is erroneous
๏ฎ Program will hang or crash
๏ฎ Arguments passed by each process to an MPI
collective communication must be โ€œcompatibleโ€
๏ฎ e.g., if 1 process passes in 0 as dest_process &
another passes in 1, then the outcome of a call to
MPI_Reduce is erroneous
๏ฎ Program is likely to hang or crash
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
36
Collective vs. P-to-P Communications (Cont.)
๏ฎ output_data_p argument is only used on
dest_process
๏ฎ However, all of the processes still need to pass in an
actual argument corresponding to output_data_p,
even if itโ€™s just NULL
๏ฎ Point-to-point communications are matched on
the basis of tags & communicators
๏ฎ Collective communications donโ€™t use tags
๏ฎ Matched solely on the basis of communicator & order
in which theyโ€™re called
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
37
MPI_Allreduce
๏ฎ Useful when all processes need result of a global
sum to complete some larger computation
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
38
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
Global sum followed
by distribution of result
MPI_Allreduce (Cont.)
39
Butterfly-Structured Global Sum
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
Processes exchange partial results
40
Broadcast
๏ฎ Data belonging to a single process is sent to all
of the processes in communicator
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
41
Tree-Structured Broadcast
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
42
Data Distributions โ€“ Compute a Vector Sum
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
Serial implementation
43
Partitioning Options
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
๏ฎ Block partitioning
๏ฎ Assign blocks of consecutive components to each process
๏ฎ Cyclic partitioning
๏ฎ Assign components in a round robin fashion
๏ฎ Block-cyclic partitioning
๏ฎ Use a cyclic distribution of blocks of components
44
Parallel Implementation
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
45
MPI_Scatter
๏ฎ Can be used in a function that reads in an entire
vector on process 0 but only sends needed
components to other processes
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
46
Reading & Distributing a Vector
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
47
MPI_Gather
๏ฎ Collect all components of a vector onto process 0
๏ฎ Then process 0 can process all of components
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
48
MPI_Allgather
๏ฎ Concatenates contents of each processโ€™
send_buf_p & stores this in each processโ€™
recv_buf_p
๏ฎ recv_count is the amount of data being received from
each process
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
49
Summary
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
Source: https://computing.llnl.gov/tutorials/mpi/

More Related Content

What's hot (20)

PPTX
Backtracking
subhradeep mitra
ย 
PPTX
Polygon filling algorithm
Aparna Joshi
ย 
PPTX
DeadLock in Operating-Systems
Venkata Sreeram
ย 
PPTX
CA301_CG_Filled Area Primitives-New.pptx
KaushikiJha3
ย 
PPTX
MULTI THREADING IN JAVA
VINOTH R
ย 
PPTX
Code Optimization
Akhil Kaushik
ย 
PPTX
Procedural programming
Ankit92Chitnavis
ย 
PDF
Inter Process Communication
Anil Kumar Pugalia
ย 
PPTX
Common language runtime clr
SanSan149
ย 
PPTX
ANIMATION SEQUENCE
KABILESH RAMAR
ย 
PPT
Seed filling algorithm
Mani Kanth
ย 
PPTX
Predicate logic
Harini Balamurugan
ย 
PDF
OS - Process Concepts
Mukesh Chinta
ย 
PPTX
Visual Programming
Bagzzz
ย 
PPTX
Grasp patterns and its types
Syed Hassan Ali
ย 
PPTX
Distributed Systems Introduction and Importance
SHIKHA GAUTAM
ย 
PPTX
Compilers
Bense Tony
ย 
PPTX
Lecture 6- Deadlocks.pptx
Amanuelmergia
ย 
PPTX
Relationship Among Token, Lexeme & Pattern
Bharat Rathore
ย 
Backtracking
subhradeep mitra
ย 
Polygon filling algorithm
Aparna Joshi
ย 
DeadLock in Operating-Systems
Venkata Sreeram
ย 
CA301_CG_Filled Area Primitives-New.pptx
KaushikiJha3
ย 
MULTI THREADING IN JAVA
VINOTH R
ย 
Code Optimization
Akhil Kaushik
ย 
Procedural programming
Ankit92Chitnavis
ย 
Inter Process Communication
Anil Kumar Pugalia
ย 
Common language runtime clr
SanSan149
ย 
ANIMATION SEQUENCE
KABILESH RAMAR
ย 
Seed filling algorithm
Mani Kanth
ย 
Predicate logic
Harini Balamurugan
ย 
OS - Process Concepts
Mukesh Chinta
ย 
Visual Programming
Bagzzz
ย 
Grasp patterns and its types
Syed Hassan Ali
ย 
Distributed Systems Introduction and Importance
SHIKHA GAUTAM
ย 
Compilers
Bense Tony
ย 
Lecture 6- Deadlocks.pptx
Amanuelmergia
ย 
Relationship Among Token, Lexeme & Pattern
Bharat Rathore
ย 

Similar to Distributed Memory Programming with MPI (20)

PDF
MPI - 2
Shah Zaib
ย 
PDF
MPI - 3
Shah Zaib
ย 
PPT
What is [Open] MPI?
Jeff Squyres
ย 
PDF
MPI - 1
Shah Zaib
ย 
PDF
High Performance Computing using MPI
Ankit Mahato
ย 
PDF
MPI - 4
Shah Zaib
ย 
PDF
Safetty systems intro_embedded_c
Maria Cida Rosa
ย 
PPTX
6-9-2017-slides-vFinal.pptx
SimRelokasi2
ย 
PPTX
MPI message passing interface
Mohit Raghuvanshi
ย 
PDF
Advanced Scalable Decomposition Method with MPICH Environment for HPC
IJSRD
ย 
PPT
Introduction to MPI
Hanif Durad
ย 
PPT
Chapter 1.ppt
BakhitaSalman
ย 
PPTX
Programming using MPI and OpenMP
Divya Tiwari
ย 
PPT
Tutorial on Parallel Computing and Message Passing Model - C2
Marcirio Chaves
ย 
PPTX
Chapter 2: Basics of programming pyton programming
biniyamtiktok
ย 
PDF
cs556-2nd-tutorial.pdf
ssuserada6a9
ย 
PPTX
Rgk cluster computing project
OstopD
ย 
PPTX
Intro to MPI
jbp4444
ย 
PDF
Pivotal Cloud Foundry 2.3: A First Look
VMware Tanzu
ย 
MPI - 2
Shah Zaib
ย 
MPI - 3
Shah Zaib
ย 
What is [Open] MPI?
Jeff Squyres
ย 
MPI - 1
Shah Zaib
ย 
High Performance Computing using MPI
Ankit Mahato
ย 
MPI - 4
Shah Zaib
ย 
Safetty systems intro_embedded_c
Maria Cida Rosa
ย 
6-9-2017-slides-vFinal.pptx
SimRelokasi2
ย 
MPI message passing interface
Mohit Raghuvanshi
ย 
Advanced Scalable Decomposition Method with MPICH Environment for HPC
IJSRD
ย 
Introduction to MPI
Hanif Durad
ย 
Chapter 1.ppt
BakhitaSalman
ย 
Programming using MPI and OpenMP
Divya Tiwari
ย 
Tutorial on Parallel Computing and Message Passing Model - C2
Marcirio Chaves
ย 
Chapter 2: Basics of programming pyton programming
biniyamtiktok
ย 
cs556-2nd-tutorial.pdf
ssuserada6a9
ย 
Rgk cluster computing project
OstopD
ย 
Intro to MPI
jbp4444
ย 
Pivotal Cloud Foundry 2.3: A First Look
VMware Tanzu
ย 
Ad

More from Dilum Bandara (20)

PPTX
Designing for Multiple Blockchains in Industry Ecosystems
Dilum Bandara
ย 
PPTX
Introduction to Machine Learning
Dilum Bandara
ย 
PPTX
Time Series Analysis and Forecasting in Practice
Dilum Bandara
ย 
PPTX
Introduction to Dimension Reduction with PCA
Dilum Bandara
ย 
PPTX
Introduction to Descriptive & Predictive Analytics
Dilum Bandara
ย 
PPTX
Introduction to Concurrent Data Structures
Dilum Bandara
ย 
PPTX
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Dilum Bandara
ย 
PPTX
Introduction to Map-Reduce Programming with Hadoop
Dilum Bandara
ย 
PPTX
Embarrassingly/Delightfully Parallel Problems
Dilum Bandara
ย 
PPTX
Introduction to Warehouse-Scale Computers
Dilum Bandara
ย 
PPTX
Introduction to Thread Level Parallelism
Dilum Bandara
ย 
PPTX
CPU Memory Hierarchy and Caching Techniques
Dilum Bandara
ย 
PPTX
Data-Level Parallelism in Microprocessors
Dilum Bandara
ย 
PDF
Instruction Level Parallelism โ€“ Hardware Techniques
Dilum Bandara
ย 
PPTX
Instruction Level Parallelism โ€“ Compiler Techniques
Dilum Bandara
ย 
PPTX
CPU Pipelining and Hazards - An Introduction
Dilum Bandara
ย 
PPTX
Advanced Computer Architecture โ€“ An Introduction
Dilum Bandara
ย 
PPTX
High Performance Networking with Advanced TCP
Dilum Bandara
ย 
PPTX
Introduction to Content Delivery Networks
Dilum Bandara
ย 
PPTX
Peer-to-Peer Networking Systems and Streaming
Dilum Bandara
ย 
Designing for Multiple Blockchains in Industry Ecosystems
Dilum Bandara
ย 
Introduction to Machine Learning
Dilum Bandara
ย 
Time Series Analysis and Forecasting in Practice
Dilum Bandara
ย 
Introduction to Dimension Reduction with PCA
Dilum Bandara
ย 
Introduction to Descriptive & Predictive Analytics
Dilum Bandara
ย 
Introduction to Concurrent Data Structures
Dilum Bandara
ย 
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Dilum Bandara
ย 
Introduction to Map-Reduce Programming with Hadoop
Dilum Bandara
ย 
Embarrassingly/Delightfully Parallel Problems
Dilum Bandara
ย 
Introduction to Warehouse-Scale Computers
Dilum Bandara
ย 
Introduction to Thread Level Parallelism
Dilum Bandara
ย 
CPU Memory Hierarchy and Caching Techniques
Dilum Bandara
ย 
Data-Level Parallelism in Microprocessors
Dilum Bandara
ย 
Instruction Level Parallelism โ€“ Hardware Techniques
Dilum Bandara
ย 
Instruction Level Parallelism โ€“ Compiler Techniques
Dilum Bandara
ย 
CPU Pipelining and Hazards - An Introduction
Dilum Bandara
ย 
Advanced Computer Architecture โ€“ An Introduction
Dilum Bandara
ย 
High Performance Networking with Advanced TCP
Dilum Bandara
ย 
Introduction to Content Delivery Networks
Dilum Bandara
ย 
Peer-to-Peer Networking Systems and Streaming
Dilum Bandara
ย 
Ad

Recently uploaded (20)

PPTX
Perfecting XM Cloud for Multisite Setup.pptx
Ahmed Okour
ย 
PPTX
IObit Driver Booster Pro Crack Download Latest Version
chaudhryakashoo065
ย 
PDF
TEASMA: A Practical Methodology for Test Adequacy Assessment of Deep Neural N...
Lionel Briand
ย 
PDF
Power BI vs Tableau vs Looker - Which BI Tool is Right for You?
MagnusMinds IT Solution LLP
ย 
PDF
Designing Accessible Content Blocks (1).pdf
jaclynmennie1
ย 
PDF
AWS Consulting Services: Empowering Digital Transformation with Nlineaxis
Nlineaxis IT Solutions Pvt Ltd
ย 
PPTX
IObit Driver Booster Pro 12.4-12.5 license keys 2025-2026
chaudhryakashoo065
ย 
PPTX
CV-Project_2024 version 01222222222.pptx
MohammadSiddiqui70
ย 
PDF
WholeClear Split vCard Software for Split large vCard file
markwillsonmw004
ย 
PDF
Laboratory Workflows Digitalized and live in 90 days with Scifeonยดs SAPPA P...
info969686
ย 
PPTX
Avast Premium Security crack 25.5.6162 + License Key 2025
HyperPc soft
ย 
PDF
IObit Uninstaller Pro 14.3.1.8 Crack for Windows Latest
utfefguu
ย 
PPTX
Wondershare Filmora Crack 14.5.18 + Key Full Download [Latest 2025]
HyperPc soft
ย 
PPTX
CONCEPT OF PROGRAMMING in language .pptx
tamim41
ย 
PDF
The Rise of Sustainable Mobile App Solutions by New York Development Firms
ostechnologies16
ย 
PDF
Automated Test Case Repair Using Language Models
Lionel Briand
ย 
PDF
LPS25 - Operationalizing MLOps in GEP - Terradue.pdf
terradue
ย 
PPTX
Seamless-Image-Conversion-From-Raster-to-wrt-rtx-rtx.pptx
Quick Conversion Services
ย 
PPTX
NeuroStrata: Harnessing Neuro-Symbolic Paradigms for Improved Testability and...
Ivan Ruchkin
ย 
PPTX
Introduction to web development | MERN Stack
JosephLiyon
ย 
Perfecting XM Cloud for Multisite Setup.pptx
Ahmed Okour
ย 
IObit Driver Booster Pro Crack Download Latest Version
chaudhryakashoo065
ย 
TEASMA: A Practical Methodology for Test Adequacy Assessment of Deep Neural N...
Lionel Briand
ย 
Power BI vs Tableau vs Looker - Which BI Tool is Right for You?
MagnusMinds IT Solution LLP
ย 
Designing Accessible Content Blocks (1).pdf
jaclynmennie1
ย 
AWS Consulting Services: Empowering Digital Transformation with Nlineaxis
Nlineaxis IT Solutions Pvt Ltd
ย 
IObit Driver Booster Pro 12.4-12.5 license keys 2025-2026
chaudhryakashoo065
ย 
CV-Project_2024 version 01222222222.pptx
MohammadSiddiqui70
ย 
WholeClear Split vCard Software for Split large vCard file
markwillsonmw004
ย 
Laboratory Workflows Digitalized and live in 90 days with Scifeonยดs SAPPA P...
info969686
ย 
Avast Premium Security crack 25.5.6162 + License Key 2025
HyperPc soft
ย 
IObit Uninstaller Pro 14.3.1.8 Crack for Windows Latest
utfefguu
ย 
Wondershare Filmora Crack 14.5.18 + Key Full Download [Latest 2025]
HyperPc soft
ย 
CONCEPT OF PROGRAMMING in language .pptx
tamim41
ย 
The Rise of Sustainable Mobile App Solutions by New York Development Firms
ostechnologies16
ย 
Automated Test Case Repair Using Language Models
Lionel Briand
ย 
LPS25 - Operationalizing MLOps in GEP - Terradue.pdf
terradue
ย 
Seamless-Image-Conversion-From-Raster-to-wrt-rtx-rtx.pptx
Quick Conversion Services
ย 
NeuroStrata: Harnessing Neuro-Symbolic Paradigms for Improved Testability and...
Ivan Ruchkin
ย 
Introduction to web development | MERN Stack
JosephLiyon
ย 

Distributed Memory Programming with MPI

  • 1. 1 Distributed Memory Programming with MPI Slides extended from An Introduction to Parallel Programming by Peter Pacheco Dilum Bandara [email protected]
  • 2. 2 Distributed Memory Systems ๏ฎ We discuss about developing programs for these systems using MPI ๏ฎ MPI โ€“ Message Passing Interface ๏ฎ Are set of libraries that can be called from C, C++, & Fortran Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 3. 3 Why MPI? ๏ฎ Standardized & portable message-passing system ๏ฎ One of the oldest libraries ๏ฎ Wide-spread adoption ๏ฎ Minimal requirements on underlying hardware ๏ฎ Explicit parallelization ๏ฎ Achieves high performance ๏ฎ Scales to a large no of processors ๏ฎ Intellectually demanding Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 4. 4 Our First MPI Program Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 5. 5 Compilation Copyright ยฉ 2010, Elsevier Inc. All rights Reserved mpicc -g -Wall -o mpi_hello mpi_hello.c wrapper script to compile turns on all warnings source file create this executable file name (as opposed to default a.out) produce debugging information
  • 6. 6 Execution Copyright ยฉ 2010, Elsevier Inc. All rights Reserved mpiexec -n <no of processes> <executable> mpiexec -n 1 ./mpi_hello mpiexec -n 4 ./mpi_hello run with 1 process run with 4 processes
  • 7. 7 Execution Copyright ยฉ 2010, Elsevier Inc. All rights Reserved mpiexec -n 1 ./mpi_hello mpiexec -n 4 ./mpi_hello Greetings from process 0 of 1 ! Greetings from process 0 of 4 ! Greetings from process 1 of 4 ! Greetings from process 2 of 4 ! Greetings from process 3 of 4 !
  • 8. 8 MPI Programs ๏ฎ Need to add mpi.h header file ๏ฎ Identifiers defined by MPI start with โ€œMPI_โ€ ๏ฎ 1st letter following underscore is uppercase ๏ฎ For function names & MPI-defined types ๏ฎ Helps to avoid confusion Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 9. 9 6 Golden MPI Functions Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 10. 10 MPI Components ๏ฎ MPI_Init ๏ฎ Tells MPI to do all necessary setup ๏ฎ e.g., allocate storage for message buffers, decide rank of a process ๏ฎ argc_p & argv_p are pointers to argc & argv arguments in main( ) ๏ฎ Function returns error codes Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 11. 11 ๏ฎ MPI_Finalize ๏ฎ Tells MPI weโ€™re done, so clean up anything allocated for this program MPI Components (Cont.) Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 12. 12 Communicators ๏ฎ Collection of processes that can send messages to each other ๏ฎ Messages from others communicators are ignored ๏ฎ MPI_Init defines a communicator that consists of all processes created when the program is started ๏ฎ Called MPI_COMM_WORLD Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 13. 13 Communicators (Cont.) Copyright ยฉ 2010, Elsevier Inc. All rights Reserved My rank (process making this call) No of processes in the communicator
  • 14. 14 Single-Program Multiple-Data (SPMD) ๏ฎ We compile 1 program ๏ฎ Process 0 does something different ๏ฎ Receives messages & prints them while the other processes do the work ๏ฎ if-else construct makes our program SPMD ๏ฎ We can run this program on any no of processors ๏ฎ e.g., 4, 8, 32, 1000, โ€ฆ Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 15. 15 Communication ๏ฎ msg_buf_p, msg_size, msg_type ๏ฎ Determines content of message ๏ฎ dest โ€“ destination processorโ€™s rank ๏ฎ tag โ€“ use to distinguish messages that are identical in content Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 16. 16 Data Types Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 17. 17 Communication (Cont.) Copyright ยฉ 2010, Elsevier Inc. All rights Reserved MPI_ANY_SOURCE to receive messages (from any source) in order for arrival
  • 18. 18 Message Matching Copyright ยฉ 2010, Elsevier Inc. All rights Reserved MPI_Send src = q MPI_Recv dest = r r q
  • 19. 19 Receiving Messages ๏ฎ Receiver can get a message without knowing ๏ฎ Amount of data in message ๏ฎ Sender of message ๏ฎ Tag of message ๏ฎ How can those be found out? Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 20. 20 How Much Data am I Receiving? Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 21. 21 ๏ฎ Exact behavior is determined by MPI implementation ๏ฎ MPI_Send may behave differently with regard to buffer size, cutoffs, & blocking ๏ฎ Cutoff ๏ฎ if message size < cutoff ๏ƒ  buffer ๏ฎ if message size โ‰ฅ cutoff ๏ƒ  MPI_Send will block ๏ฎ MPI_Recv always blocks until a matching message is received ๏ฎ Preserve message ordering from a sender ๏ฎ Know your implementation ๏ฎ Donโ€™t make assumptions! Issues With Send & Receive Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 22. 22 Trapezoidal Rule Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 23. 23 Trapezoidal Rule (Cont.) Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 24. 24 Serial Pseudo-code Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 25. 25 Parallel Pseudo-Code Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 26. 26 Tasks & Communications for Trapezoidal Rule Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 27. 27 First Version Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 28. 28 First Version (Cont.) Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 29. 29 First Version (Cont.) Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 30. 30 COLLECTIVE COMMUNICATION Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 31. 31 Collective Communication Copyright ยฉ 2010, Elsevier Inc. All rights Reserved A tree-structured global sum
  • 32. 32 Alternative Tree-Structured Global Sum Copyright ยฉ 2010, Elsevier Inc. All rights Reserved Which is most optimum? Can we do better?
  • 33. 33 MPI_Reduce Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 34. 34 Predefined Reduction Operators Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 35. 35 Collective vs. Point-to-Point Communications ๏ฎ All processes in the communicator must call the same collective function ๏ฎ e.g., a program that attempts to match a call to MPI_Reduce on 1 process with a call to MPI_Recv on another process is erroneous ๏ฎ Program will hang or crash ๏ฎ Arguments passed by each process to an MPI collective communication must be โ€œcompatibleโ€ ๏ฎ e.g., if 1 process passes in 0 as dest_process & another passes in 1, then the outcome of a call to MPI_Reduce is erroneous ๏ฎ Program is likely to hang or crash Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 36. 36 Collective vs. P-to-P Communications (Cont.) ๏ฎ output_data_p argument is only used on dest_process ๏ฎ However, all of the processes still need to pass in an actual argument corresponding to output_data_p, even if itโ€™s just NULL ๏ฎ Point-to-point communications are matched on the basis of tags & communicators ๏ฎ Collective communications donโ€™t use tags ๏ฎ Matched solely on the basis of communicator & order in which theyโ€™re called Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 37. 37 MPI_Allreduce ๏ฎ Useful when all processes need result of a global sum to complete some larger computation Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 38. 38 Copyright ยฉ 2010, Elsevier Inc. All rights Reserved Global sum followed by distribution of result MPI_Allreduce (Cont.)
  • 39. 39 Butterfly-Structured Global Sum Copyright ยฉ 2010, Elsevier Inc. All rights Reserved Processes exchange partial results
  • 40. 40 Broadcast ๏ฎ Data belonging to a single process is sent to all of the processes in communicator Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 41. 41 Tree-Structured Broadcast Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 42. 42 Data Distributions โ€“ Compute a Vector Sum Copyright ยฉ 2010, Elsevier Inc. All rights Reserved Serial implementation
  • 43. 43 Partitioning Options Copyright ยฉ 2010, Elsevier Inc. All rights Reserved ๏ฎ Block partitioning ๏ฎ Assign blocks of consecutive components to each process ๏ฎ Cyclic partitioning ๏ฎ Assign components in a round robin fashion ๏ฎ Block-cyclic partitioning ๏ฎ Use a cyclic distribution of blocks of components
  • 44. 44 Parallel Implementation Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 45. 45 MPI_Scatter ๏ฎ Can be used in a function that reads in an entire vector on process 0 but only sends needed components to other processes Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 46. 46 Reading & Distributing a Vector Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 47. 47 MPI_Gather ๏ฎ Collect all components of a vector onto process 0 ๏ฎ Then process 0 can process all of components Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 48. 48 MPI_Allgather ๏ฎ Concatenates contents of each processโ€™ send_buf_p & stores this in each processโ€™ recv_buf_p ๏ฎ recv_count is the amount of data being received from each process Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 49. 49 Summary Copyright ยฉ 2010, Elsevier Inc. All rights Reserved Source: https://computing.llnl.gov/tutorials/mpi/

Editor's Notes

  • #2: 8 January 2024
  • #3: Began in Supercomputing โ€™92
  • #16: Tag โ€“ 2 messages โ€“ content in 1 should be printed while other should be used for calculation
  • #34: sendbuf address of send buffer (choice) count no of elements in send buffer (integer) datatype data type of elements of send buffer (handle) op reduce operation (handle) root rank of root process (integer) comm communicator (handle)