SlideShare a Scribd company logo
1
Shared Memory
Programming with
Pthreads & OpenMP
Dilum Bandara
Dilum.Bandara@uom.lk
Slides extended from
An Introduction to Parallel Programming by
Peter Pacheco
2
Shared Memory System
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
3
POSIXยฎ Threads
๏ฎ Also known as Pthreads
๏ฎ Standard for Unix-like operating systems
๏ฎ Library that can be linked with C programs
๏ฎ Specifies an API for multi-threaded
programming
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
4
Hello World!
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
Declares various Pthreads
functions, constants, types, etc.
5
Hello World! (Cont.)
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
6
Hello World! (Cont.)
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
7
Compiling a Pthread program
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
gcc โˆ’g โˆ’Wall โˆ’o pth_hello pth_hello.c โˆ’lpthread
Link Pthreads library
8
Running a Pthreads program
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
. /pth_hello <number of threads>
. /pth_hello 1
Hello from the main thread
Hello from thread 0 of 1
. /pth_hello 4
Hello from the main thread
Hello from thread 0 of 4
Hello from thread 3 of 4
Hello from thread 2 of 4
Hello from thread 1 of 4
9
Running the Threads
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
Main thread forks & joins 2 threads
10
Global Variables
๏ฎ Can introduce subtle & confusing bugs!
๏ฎ Use them only when they are essential
๏ฎ Shared variables
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
11
Starting Threads
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
pthread.h
pthread_t
int pthread_create (
pthread_t* thread_p, /* out */
const pthread_attr_t* attr_p, /* in */
void* (*start_routine) (void), /* in */
void* arg_p); /* in */
One object for
each thread
We ignore return value
from pthread_create
12
Function Started by pthread_create
๏ฎ Function start by pthread_create should have
following prototype
void* thread_function ( void* args_p ) ;
๏ฎ Void* can be cast to any pointer type in C
๏ฎ So args_p can point to a list containing one or more
values needed by thread_function
๏ฎ Similarly, return value of thread_function can
point to a list of one or more values
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
13
Stopping Threads
๏ฎ Single call to pthread_join will wait for
thread associated with pthread_t object to
complete
๏ฎ Suspend execution of calling thread until
target thread terminates, unless it has already
terminated
๏ฎ Call pthread_join once for each thread
int pthread_join(
pthread_t* thread /* in */ ,
void** ret_val_p /* out */ ) ;
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
14
Matrix-Vector Multiplication in
Pthreads
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
15
Serial Pseudo-code
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
16
Using 3 Pthreads
๏ฎ Assign each row to a separate thread
๏ฎ Suppose 6x6 matrix & 3 threads
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
Thread 0
General case
17
Pthreads Matrix-Vector Multiplication
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
18
Estimating ฯ€
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
19
Thread Function for Computing ฯ€
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
20
Using a dual core processor
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
As we increase n, estimate with 1
thread gets better & better
2 thread case produce different
answers in different runs
Why?
21
Pthreads Global Sum with Busy-Waiting
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
Shared variable
22
Mutexes
๏ฎ Make sure only 1 thread in critical region
๏ฎ Pthreads standard includes a special type
for mutexes: pthread_mutex_t
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
23
Mutexes
๏ฎ Lock
๏ฎ To gain access to a critical section
๏ฎ Unlock
๏ฎ When a thread is finished executing code in a
critical section
๏ฎ Termination
๏ฎ When a program finishes using a mutex
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
24
Global Sum Function Using a Mutex
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
25
Global Sum Function Using a Mutex (Cont.)
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
26
Busy-Waiting vs. Mutex
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
Run-times (in seconds) of ฯ€ programs using n = 108
terms on a system with 2x4-core processors
27
Semaphores
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
Semaphores are not part of Pthreads;
you need to add this
28
Read-Write Locks
๏ฎ While controlling access to a large, shared
data structure
๏ฎ Example
๏ฎ Suppose shared data structure is a sorted
linked list of ints, & operations of interest are
Member, Insert, & Delete
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
29
Linked Lists
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
30
Linked List Membership
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
31
Inserting New Node Into a List
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
32
Inserting New Node Into a List (Cont.)
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
33
Deleting a Node From a Linked List
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
34
Deleting a Node From a Linked List (Cont.)
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
35
Multi-Threaded Linked List
๏ฎ To share access to the list, we can define
head_p to be a global variable
๏ฎ This will simplify function headers for Member,
Insert, & Delete
๏ฎ Because we wonโ€™t need to pass in either
head_p or a pointer to head_p: weโ€™ll only need
to pass in the value of interest
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
36
Simultaneous Access by 2 Threads
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
37
Solution #1
๏ฎ Simply lock the list any time that a thread
attempts to access it
๏ฎ Call to each of the 3 functions can be
protected by a mutex
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
In place of calling Member(value).
38
Issues
๏ฎ Serializing access to the list
๏ฎ If vast majority of our operations are calls
to Member
๏ฎ We fail to exploit opportunity for parallelism
๏ฎ If most of our operations are calls to Insert
& Delete
๏ฎ This may be the best solution
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
39
Solution #2
๏ฎ Instead of locking entire list, we could try to
lock individual nodes
๏ฎ A โ€œfiner-grainedโ€ approach
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
40
Issues
๏ฎ Much more complex than original Member
function
๏ฎ Much slower
๏ฎ Because each time a node is accessed, a
mutex must be locked & unlocked
๏ฎ Addition of a mutex field to each node
substantially increase memory needed for the
list
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
41
Pthreads Read-Write Locks
๏ฎ Neither multi-threaded linked lists exploits
potential for simultaneous access to any node by
threads that are executing Member
๏ฎ 1st solution only allows 1 thread to access the entire
list at any instant
๏ฎ 2nd only allows 1 thread to access any given node at
any instant
๏ฎ Read-write lock is somewhat like a mutex except
that it provides 2 lock functions
๏ฎ 1st locks the read-write lock for reading
๏ฎ 2nd locks it for writing
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
42
Pthreads Read-Write Locks (Cont.)
๏ฎ Multiple threads can simultaneously obtain lock
by calling read-lock function
๏ฎ While only 1 thread can obtain lock by calling
write-lock function
๏ฎ Thus
๏ฎ If any thread owns lock for reading, any thread that
wants to obtain a lock for writing will be blocked
๏ฎ If any thread owns lock for writing, any threads that
want to obtain lock for reading or writing will be
blocked
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
43
Protecting Our Linked List Functions
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
44
Linked List Performance
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
100,000 ops/thread
99.9% Member
0.05% Insert
0.05% Delete
100,000 ops/thread
80% Member
10% Insert
10% Delete
45
OpenMP
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
46
OpenMP
๏ฎ High-level API for shared-memory parallel
programming
๏ฎ MP = multiprocessing
๏ฎ Use Pragmas
๏ฎ Special preprocessor instructions
๏ฎ #pragma
๏ฎ Typically added to support behaviors that arenโ€™t
part of the basic C specification
๏ฎ Compilers that donโ€™t support pragmas ignore
them
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
47
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
48
Compiling & Running
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
gcc โˆ’g โˆ’Wall โˆ’fopenmp โˆ’o omp_hello omp_hello.c
. / omp_hello 4
compiling
running with 4 threads
Hello from thread 0 of 4
Hello from thread 1 of 4
Hello from thread 2 of 4
Hello from thread 3 of 4 Hello from thread 1 of 4
Hello from thread 2 of 4
Hello from thread 0 of 4
Hello from thread 3 of 4
Hello from thread 3 of 4
Hello from thread 1 of 4
Hello from thread 2 of 4
Hello from thread 0 of 4
possible
outcomes
49
OpenMp pragmas
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
๏ฎ # pragma omp parallel
๏ฎ Most basic parallel directive
๏ฎ Original thread is called master
๏ฎ Additional threads are called slaves
๏ฎ Original thread & new threads called a team
50
Clause
๏ฎ Text that modifies a directive
๏ฎ num_threads clause can be added to a
parallel directive
๏ฎ Allows programmer to specify no of
threads that should execute following block
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
# pragma omp parallel num_threads ( thread_count )
51
Be Awareโ€ฆ
๏ฎ There may be system-defined limitations on
number of threads that a program can start
๏ฎ OpenMP standard doesnโ€™t guarantee that this
will actually start thread_count threads
๏ฎ Most current systems can start hundreds or even
1,000s of threads
๏ฎ Unless weโ€™re trying to start a lot of threads, we
will almost always get desired no of threads
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
52
Mutual Exclusion
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
# pragma omp critical
{
global_result += my_result ;
}
only 1 thread can execute following
structured block at a time
53
Trapezoidal Rule
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
Serial algorithm
54
Assignment of Trapezoids to Threads
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
55
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
56
Copyright ยฉ 2010, Elsevier Inc. All rights Reserved

More Related Content

Similar to Shared Memory Programming with Pthreads and OpenMP (20)

PDF
System Programming - Threading
HelpWithAssignment.com
ย 
PPTX
Parallel Computing - openMP -- Lecture 5
arnabsahuyspm
ย 
PDF
Introduction to OpenMP
Akhila Prabhakaran
ย 
PDF
chap7_slidesforparallelcomputingananthgrama
doomzday27
ย 
PDF
Parallel and Distributed Computing Chapter 5
AbdullahMunir32
ย 
PPTX
Threads and multi threading
Antonio Cesarano
ย 
PPT
Chap7 slides
BaliThorat1
ย 
PPT
openmp.New.intro-unc.edu.ppt
MALARMANNANA1
ย 
PPT
Lecture6
tt_aljobory
ย 
PPTX
Algoritmi e Calcolo Parallelo 2012/2013 - OpenMP
Pier Luca Lanzi
ย 
PPT
Programming using Open Mp
Anshul Sharma
ย 
PPT
OpenMP-Quinn17_L4bOpen <MP_Open MP_Open MP
Balasubramanian699229
ย 
PPT
Operating System 4 1193308760782240 2
mona_hakmy
ย 
PPT
Operating System 4
tech2click
ย 
PPT
Nbvtalkataitamimageprocessingconf
Nagasuri Bala Venkateswarlu
ย 
PPTX
openmp final2.pptx
GopalPatidar13
ย 
PDF
OpenMP Tutorial for Beginners
Dhanashree Prasad
ย 
PPT
slides8 SharedMemory.ppt
aminnezarat
ย 
PDF
Introduction to OpenMP
Akhila Prabhakaran
ย 
PPT
CS4961-L9.ppt
MarlonMagtibay2
ย 
System Programming - Threading
HelpWithAssignment.com
ย 
Parallel Computing - openMP -- Lecture 5
arnabsahuyspm
ย 
Introduction to OpenMP
Akhila Prabhakaran
ย 
chap7_slidesforparallelcomputingananthgrama
doomzday27
ย 
Parallel and Distributed Computing Chapter 5
AbdullahMunir32
ย 
Threads and multi threading
Antonio Cesarano
ย 
Chap7 slides
BaliThorat1
ย 
openmp.New.intro-unc.edu.ppt
MALARMANNANA1
ย 
Lecture6
tt_aljobory
ย 
Algoritmi e Calcolo Parallelo 2012/2013 - OpenMP
Pier Luca Lanzi
ย 
Programming using Open Mp
Anshul Sharma
ย 
OpenMP-Quinn17_L4bOpen <MP_Open MP_Open MP
Balasubramanian699229
ย 
Operating System 4 1193308760782240 2
mona_hakmy
ย 
Operating System 4
tech2click
ย 
Nbvtalkataitamimageprocessingconf
Nagasuri Bala Venkateswarlu
ย 
openmp final2.pptx
GopalPatidar13
ย 
OpenMP Tutorial for Beginners
Dhanashree Prasad
ย 
slides8 SharedMemory.ppt
aminnezarat
ย 
Introduction to OpenMP
Akhila Prabhakaran
ย 
CS4961-L9.ppt
MarlonMagtibay2
ย 

More from Dilum Bandara (20)

PPTX
Designing for Multiple Blockchains in Industry Ecosystems
Dilum Bandara
ย 
PPTX
Introduction to Machine Learning
Dilum Bandara
ย 
PPTX
Time Series Analysis and Forecasting in Practice
Dilum Bandara
ย 
PPTX
Introduction to Dimension Reduction with PCA
Dilum Bandara
ย 
PPTX
Introduction to Descriptive & Predictive Analytics
Dilum Bandara
ย 
PPTX
Introduction to Concurrent Data Structures
Dilum Bandara
ย 
PPTX
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Dilum Bandara
ย 
PPTX
Introduction to Map-Reduce Programming with Hadoop
Dilum Bandara
ย 
PPTX
Embarrassingly/Delightfully Parallel Problems
Dilum Bandara
ย 
PPTX
Introduction to Warehouse-Scale Computers
Dilum Bandara
ย 
PPTX
Introduction to Thread Level Parallelism
Dilum Bandara
ย 
PPTX
CPU Memory Hierarchy and Caching Techniques
Dilum Bandara
ย 
PPTX
Data-Level Parallelism in Microprocessors
Dilum Bandara
ย 
PDF
Instruction Level Parallelism โ€“ Hardware Techniques
Dilum Bandara
ย 
PPTX
Instruction Level Parallelism โ€“ Compiler Techniques
Dilum Bandara
ย 
PPTX
CPU Pipelining and Hazards - An Introduction
Dilum Bandara
ย 
PPTX
Advanced Computer Architecture โ€“ An Introduction
Dilum Bandara
ย 
PPTX
High Performance Networking with Advanced TCP
Dilum Bandara
ย 
PPTX
Introduction to Content Delivery Networks
Dilum Bandara
ย 
PPTX
Peer-to-Peer Networking Systems and Streaming
Dilum Bandara
ย 
Designing for Multiple Blockchains in Industry Ecosystems
Dilum Bandara
ย 
Introduction to Machine Learning
Dilum Bandara
ย 
Time Series Analysis and Forecasting in Practice
Dilum Bandara
ย 
Introduction to Dimension Reduction with PCA
Dilum Bandara
ย 
Introduction to Descriptive & Predictive Analytics
Dilum Bandara
ย 
Introduction to Concurrent Data Structures
Dilum Bandara
ย 
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Dilum Bandara
ย 
Introduction to Map-Reduce Programming with Hadoop
Dilum Bandara
ย 
Embarrassingly/Delightfully Parallel Problems
Dilum Bandara
ย 
Introduction to Warehouse-Scale Computers
Dilum Bandara
ย 
Introduction to Thread Level Parallelism
Dilum Bandara
ย 
CPU Memory Hierarchy and Caching Techniques
Dilum Bandara
ย 
Data-Level Parallelism in Microprocessors
Dilum Bandara
ย 
Instruction Level Parallelism โ€“ Hardware Techniques
Dilum Bandara
ย 
Instruction Level Parallelism โ€“ Compiler Techniques
Dilum Bandara
ย 
CPU Pipelining and Hazards - An Introduction
Dilum Bandara
ย 
Advanced Computer Architecture โ€“ An Introduction
Dilum Bandara
ย 
High Performance Networking with Advanced TCP
Dilum Bandara
ย 
Introduction to Content Delivery Networks
Dilum Bandara
ย 
Peer-to-Peer Networking Systems and Streaming
Dilum Bandara
ย 
Ad

Recently uploaded (20)

PDF
IDM Crack with Internet Download Manager 6.42 Build 41
utfefguu
ย 
PPTX
ERP - FICO Presentation BY BSL BOKARO STEEL LIMITED.pptx
ravisranjan
ย 
PDF
IObit Uninstaller Pro 14.3.1.8 Crack for Windows Latest
utfefguu
ย 
PDF
What Is an Internal Quality Audit and Why It Matters for Your QMS
BizPortals365
ย 
PPTX
Android Notifications-A Guide to User-Facing Alerts in Android .pptx
Nabin Dhakal
ย 
PDF
Code Once; Run Everywhere - A Beginnerโ€™s Journey with React Native
Hasitha Walpola
ย 
PPTX
CONCEPT OF PROGRAMMING in language .pptx
tamim41
ย 
PDF
WholeClear Split vCard Software for Split large vCard file
markwillsonmw004
ย 
PDF
The Rise of Sustainable Mobile App Solutions by New York Development Firms
ostechnologies16
ย 
PDF
LPS25 - Operationalizing MLOps in GEP - Terradue.pdf
terradue
ย 
PPTX
Wondershare Filmora Crack 14.5.18 + Key Full Download [Latest 2025]
HyperPc soft
ย 
PPTX
EO4EU Ocean Monitoring: Maritime Weather Routing Optimsation Use Case
EO4EU
ย 
PDF
Cloud computing Lec 02 - virtualization.pdf
asokawennawatte
ย 
PDF
>Wondershare Filmora Crack Free Download 2025
utfefguu
ย 
PPTX
Quality on Autopilot: Scaling Testing in Uyuni
Oscar Barrios Torrero
ย 
PPTX
NeuroStrata: Harnessing Neuro-Symbolic Paradigms for Improved Testability and...
Ivan Ruchkin
ย 
PDF
Building scalbale cloud native apps with .NET 8
GillesMathieu10
ย 
PDF
How DeepSeek Beats ChatGPT: Cost Comparison and Key Differences
sumitpurohit810
ย 
PDF
Designing Accessible Content Blocks (1).pdf
jaclynmennie1
ย 
PPTX
Iobit Driver Booster Pro 12 Crack Free Download
chaudhryakashoo065
ย 
IDM Crack with Internet Download Manager 6.42 Build 41
utfefguu
ย 
ERP - FICO Presentation BY BSL BOKARO STEEL LIMITED.pptx
ravisranjan
ย 
IObit Uninstaller Pro 14.3.1.8 Crack for Windows Latest
utfefguu
ย 
What Is an Internal Quality Audit and Why It Matters for Your QMS
BizPortals365
ย 
Android Notifications-A Guide to User-Facing Alerts in Android .pptx
Nabin Dhakal
ย 
Code Once; Run Everywhere - A Beginnerโ€™s Journey with React Native
Hasitha Walpola
ย 
CONCEPT OF PROGRAMMING in language .pptx
tamim41
ย 
WholeClear Split vCard Software for Split large vCard file
markwillsonmw004
ย 
The Rise of Sustainable Mobile App Solutions by New York Development Firms
ostechnologies16
ย 
LPS25 - Operationalizing MLOps in GEP - Terradue.pdf
terradue
ย 
Wondershare Filmora Crack 14.5.18 + Key Full Download [Latest 2025]
HyperPc soft
ย 
EO4EU Ocean Monitoring: Maritime Weather Routing Optimsation Use Case
EO4EU
ย 
Cloud computing Lec 02 - virtualization.pdf
asokawennawatte
ย 
>Wondershare Filmora Crack Free Download 2025
utfefguu
ย 
Quality on Autopilot: Scaling Testing in Uyuni
Oscar Barrios Torrero
ย 
NeuroStrata: Harnessing Neuro-Symbolic Paradigms for Improved Testability and...
Ivan Ruchkin
ย 
Building scalbale cloud native apps with .NET 8
GillesMathieu10
ย 
How DeepSeek Beats ChatGPT: Cost Comparison and Key Differences
sumitpurohit810
ย 
Designing Accessible Content Blocks (1).pdf
jaclynmennie1
ย 
Iobit Driver Booster Pro 12 Crack Free Download
chaudhryakashoo065
ย 
Ad

Shared Memory Programming with Pthreads and OpenMP

  • 1. 1 Shared Memory Programming with Pthreads & OpenMP Dilum Bandara [email protected] Slides extended from An Introduction to Parallel Programming by Peter Pacheco
  • 2. 2 Shared Memory System Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 3. 3 POSIXยฎ Threads ๏ฎ Also known as Pthreads ๏ฎ Standard for Unix-like operating systems ๏ฎ Library that can be linked with C programs ๏ฎ Specifies an API for multi-threaded programming Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 4. 4 Hello World! Copyright ยฉ 2010, Elsevier Inc. All rights Reserved Declares various Pthreads functions, constants, types, etc.
  • 5. 5 Hello World! (Cont.) Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 6. 6 Hello World! (Cont.) Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 7. 7 Compiling a Pthread program Copyright ยฉ 2010, Elsevier Inc. All rights Reserved gcc โˆ’g โˆ’Wall โˆ’o pth_hello pth_hello.c โˆ’lpthread Link Pthreads library
  • 8. 8 Running a Pthreads program Copyright ยฉ 2010, Elsevier Inc. All rights Reserved . /pth_hello <number of threads> . /pth_hello 1 Hello from the main thread Hello from thread 0 of 1 . /pth_hello 4 Hello from the main thread Hello from thread 0 of 4 Hello from thread 3 of 4 Hello from thread 2 of 4 Hello from thread 1 of 4
  • 9. 9 Running the Threads Copyright ยฉ 2010, Elsevier Inc. All rights Reserved Main thread forks & joins 2 threads
  • 10. 10 Global Variables ๏ฎ Can introduce subtle & confusing bugs! ๏ฎ Use them only when they are essential ๏ฎ Shared variables Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 11. 11 Starting Threads Copyright ยฉ 2010, Elsevier Inc. All rights Reserved pthread.h pthread_t int pthread_create ( pthread_t* thread_p, /* out */ const pthread_attr_t* attr_p, /* in */ void* (*start_routine) (void), /* in */ void* arg_p); /* in */ One object for each thread We ignore return value from pthread_create
  • 12. 12 Function Started by pthread_create ๏ฎ Function start by pthread_create should have following prototype void* thread_function ( void* args_p ) ; ๏ฎ Void* can be cast to any pointer type in C ๏ฎ So args_p can point to a list containing one or more values needed by thread_function ๏ฎ Similarly, return value of thread_function can point to a list of one or more values Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 13. 13 Stopping Threads ๏ฎ Single call to pthread_join will wait for thread associated with pthread_t object to complete ๏ฎ Suspend execution of calling thread until target thread terminates, unless it has already terminated ๏ฎ Call pthread_join once for each thread int pthread_join( pthread_t* thread /* in */ , void** ret_val_p /* out */ ) ; Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 14. 14 Matrix-Vector Multiplication in Pthreads Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 15. 15 Serial Pseudo-code Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 16. 16 Using 3 Pthreads ๏ฎ Assign each row to a separate thread ๏ฎ Suppose 6x6 matrix & 3 threads Copyright ยฉ 2010, Elsevier Inc. All rights Reserved Thread 0 General case
  • 17. 17 Pthreads Matrix-Vector Multiplication Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 18. 18 Estimating ฯ€ Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 19. 19 Thread Function for Computing ฯ€ Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 20. 20 Using a dual core processor Copyright ยฉ 2010, Elsevier Inc. All rights Reserved As we increase n, estimate with 1 thread gets better & better 2 thread case produce different answers in different runs Why?
  • 21. 21 Pthreads Global Sum with Busy-Waiting Copyright ยฉ 2010, Elsevier Inc. All rights Reserved Shared variable
  • 22. 22 Mutexes ๏ฎ Make sure only 1 thread in critical region ๏ฎ Pthreads standard includes a special type for mutexes: pthread_mutex_t Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 23. 23 Mutexes ๏ฎ Lock ๏ฎ To gain access to a critical section ๏ฎ Unlock ๏ฎ When a thread is finished executing code in a critical section ๏ฎ Termination ๏ฎ When a program finishes using a mutex Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 24. 24 Global Sum Function Using a Mutex Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 25. 25 Global Sum Function Using a Mutex (Cont.) Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 26. 26 Busy-Waiting vs. Mutex Copyright ยฉ 2010, Elsevier Inc. All rights Reserved Run-times (in seconds) of ฯ€ programs using n = 108 terms on a system with 2x4-core processors
  • 27. 27 Semaphores Copyright ยฉ 2010, Elsevier Inc. All rights Reserved Semaphores are not part of Pthreads; you need to add this
  • 28. 28 Read-Write Locks ๏ฎ While controlling access to a large, shared data structure ๏ฎ Example ๏ฎ Suppose shared data structure is a sorted linked list of ints, & operations of interest are Member, Insert, & Delete Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 29. 29 Linked Lists Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 30. 30 Linked List Membership Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 31. 31 Inserting New Node Into a List Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 32. 32 Inserting New Node Into a List (Cont.) Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 33. 33 Deleting a Node From a Linked List Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 34. 34 Deleting a Node From a Linked List (Cont.) Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 35. 35 Multi-Threaded Linked List ๏ฎ To share access to the list, we can define head_p to be a global variable ๏ฎ This will simplify function headers for Member, Insert, & Delete ๏ฎ Because we wonโ€™t need to pass in either head_p or a pointer to head_p: weโ€™ll only need to pass in the value of interest Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 36. 36 Simultaneous Access by 2 Threads Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 37. 37 Solution #1 ๏ฎ Simply lock the list any time that a thread attempts to access it ๏ฎ Call to each of the 3 functions can be protected by a mutex Copyright ยฉ 2010, Elsevier Inc. All rights Reserved In place of calling Member(value).
  • 38. 38 Issues ๏ฎ Serializing access to the list ๏ฎ If vast majority of our operations are calls to Member ๏ฎ We fail to exploit opportunity for parallelism ๏ฎ If most of our operations are calls to Insert & Delete ๏ฎ This may be the best solution Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 39. 39 Solution #2 ๏ฎ Instead of locking entire list, we could try to lock individual nodes ๏ฎ A โ€œfiner-grainedโ€ approach Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 40. 40 Issues ๏ฎ Much more complex than original Member function ๏ฎ Much slower ๏ฎ Because each time a node is accessed, a mutex must be locked & unlocked ๏ฎ Addition of a mutex field to each node substantially increase memory needed for the list Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 41. 41 Pthreads Read-Write Locks ๏ฎ Neither multi-threaded linked lists exploits potential for simultaneous access to any node by threads that are executing Member ๏ฎ 1st solution only allows 1 thread to access the entire list at any instant ๏ฎ 2nd only allows 1 thread to access any given node at any instant ๏ฎ Read-write lock is somewhat like a mutex except that it provides 2 lock functions ๏ฎ 1st locks the read-write lock for reading ๏ฎ 2nd locks it for writing Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 42. 42 Pthreads Read-Write Locks (Cont.) ๏ฎ Multiple threads can simultaneously obtain lock by calling read-lock function ๏ฎ While only 1 thread can obtain lock by calling write-lock function ๏ฎ Thus ๏ฎ If any thread owns lock for reading, any thread that wants to obtain a lock for writing will be blocked ๏ฎ If any thread owns lock for writing, any threads that want to obtain lock for reading or writing will be blocked Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 43. 43 Protecting Our Linked List Functions Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 44. 44 Linked List Performance Copyright ยฉ 2010, Elsevier Inc. All rights Reserved 100,000 ops/thread 99.9% Member 0.05% Insert 0.05% Delete 100,000 ops/thread 80% Member 10% Insert 10% Delete
  • 45. 45 OpenMP Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 46. 46 OpenMP ๏ฎ High-level API for shared-memory parallel programming ๏ฎ MP = multiprocessing ๏ฎ Use Pragmas ๏ฎ Special preprocessor instructions ๏ฎ #pragma ๏ฎ Typically added to support behaviors that arenโ€™t part of the basic C specification ๏ฎ Compilers that donโ€™t support pragmas ignore them Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 47. 47 Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 48. 48 Compiling & Running Copyright ยฉ 2010, Elsevier Inc. All rights Reserved gcc โˆ’g โˆ’Wall โˆ’fopenmp โˆ’o omp_hello omp_hello.c . / omp_hello 4 compiling running with 4 threads Hello from thread 0 of 4 Hello from thread 1 of 4 Hello from thread 2 of 4 Hello from thread 3 of 4 Hello from thread 1 of 4 Hello from thread 2 of 4 Hello from thread 0 of 4 Hello from thread 3 of 4 Hello from thread 3 of 4 Hello from thread 1 of 4 Hello from thread 2 of 4 Hello from thread 0 of 4 possible outcomes
  • 49. 49 OpenMp pragmas Copyright ยฉ 2010, Elsevier Inc. All rights Reserved ๏ฎ # pragma omp parallel ๏ฎ Most basic parallel directive ๏ฎ Original thread is called master ๏ฎ Additional threads are called slaves ๏ฎ Original thread & new threads called a team
  • 50. 50 Clause ๏ฎ Text that modifies a directive ๏ฎ num_threads clause can be added to a parallel directive ๏ฎ Allows programmer to specify no of threads that should execute following block Copyright ยฉ 2010, Elsevier Inc. All rights Reserved # pragma omp parallel num_threads ( thread_count )
  • 51. 51 Be Awareโ€ฆ ๏ฎ There may be system-defined limitations on number of threads that a program can start ๏ฎ OpenMP standard doesnโ€™t guarantee that this will actually start thread_count threads ๏ฎ Most current systems can start hundreds or even 1,000s of threads ๏ฎ Unless weโ€™re trying to start a lot of threads, we will almost always get desired no of threads Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 52. 52 Mutual Exclusion Copyright ยฉ 2010, Elsevier Inc. All rights Reserved # pragma omp critical { global_result += my_result ; } only 1 thread can execute following structured block at a time
  • 53. 53 Trapezoidal Rule Copyright ยฉ 2010, Elsevier Inc. All rights Reserved Serial algorithm
  • 54. 54 Assignment of Trapezoids to Threads Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 55. 55 Copyright ยฉ 2010, Elsevier Inc. All rights Reserved
  • 56. 56 Copyright ยฉ 2010, Elsevier Inc. All rights Reserved

Editor's Notes

  • #2: 8 January 2024
  • #12: Pthread_tobject Thread attributes Function that thread is to run Pointer to arguments passed to function
  • #23: Actual implementation uses a semaphore
  • #53: Can put brackets