SlideShare a Scribd company logo
1
Copyright © 2010, Elsevier Inc. All rights Reserved
Chapter 4
Shared Memory Programming
with Pthreads
An Introduction to Parallel Programming
Peter Pacheco
2
Copyright © 2010, Elsevier Inc. All rights Reserved
Roadmap
 Problems programming shared memory
systems.
 Controlling access to a critical section.
 Thread synchronization.
 Programming with POSIX threads.
 Mutexes.
 Producer-consumer synchronization and
semaphores.
 Barriers and condition variables.
 Read-write locks.
 Thread safety.
#
Chapter
Subtitle
3
A Shared Memory System
Copyright © 2010, Elsevier Inc. All rights Reserved
4
Processes and Threads
 A process is an instance of a running (or
suspended) program.
 Threads are analogous to a “light-weight”
process.
 In a shared memory program a single
process may have multiple threads of
control.
Copyright © 2010, Elsevier Inc. All rights Reserved
5
POSIX®Threads
 Also known as Pthreads.
 A standard for Unix-like operating systems.
 A library that can be linked with C
programs.
 Specifies an application programming
interface (API) for multi-threaded
programming.
Copyright © 2010, Elsevier Inc. All rights Reserved
6
Caveat
 The Pthreads API is only available on
POSIXR systems — Linux, MacOS X,
Solaris, HPUX, …
Copyright © 2010, Elsevier Inc. All rights Reserved
7
Hello World! (1)
Copyright © 2010, Elsevier Inc. All rights Reserved
declares the various Pthreads
functions, constants, types, etc.
8
Hello World! (2)
Copyright © 2010, Elsevier Inc. All rights Reserved
9
Hello World! (3)
Copyright © 2010, Elsevier Inc. All rights Reserved
10
Compiling a Pthread program
Copyright © 2010, Elsevier Inc. All rights Reserved
gcc −g −Wall −o pth_hello pth_hello . c −lpthread
link in the Pthreads library
11
Running a Pthreads program
Copyright © 2010, Elsevier Inc. All rights Reserved
. / pth_hello <number of threads>
. / pth_hello 1
Hello from the main thread
Hello from thread 0 of 1
. / pth_hello 4
Hello from the main thread
Hello from thread 0 of 4
Hello from thread 1 of 4
Hello from thread 2 of 4
Hello from thread 3 of 4
12
Global variables
 Can introduce subtle and confusing bugs!
 Limit use of global variables to situations in
which they’re really needed.
 Shared variables.
Copyright © 2010, Elsevier Inc. All rights Reserved
13
Starting the Threads
 Processes in MPI are usually started by a
script.
 In Pthreads the threads are started by the
program executable.
Copyright © 2010, Elsevier Inc. All rights Reserved
14
Starting the Threads
Copyright © 2010, Elsevier Inc. All rights Reserved
pthread.h
pthread_t
int pthread_create (
pthread_t* thread_p /* out */ ,
const pthread_attr_t* attr_p /* in */ ,
void* (*start_routine ) ( void ) /* in */ ,
void* arg_p /* in */ ) ;
One object
for each
thread.
15
pthread_t objects
 Opaque
 The actual data that they store is system-
specific.
 Their data members aren’t directly accessible
to user code.
 However, the Pthreads standard guarantees
that a pthread_t object does store enough
information to uniquely identify the thread with
which it’s associated.
Copyright © 2010, Elsevier Inc. All rights Reserved
16
A closer look (1)
Copyright © 2010, Elsevier Inc. All rights Reserved
int pthread_create (
pthread_t* thread_p /* out */ ,
const pthread_attr_t* attr_p /* in */ ,
void* (*start_routine ) ( void ) /* in */ ,
void* arg_p /* in */ ) ;
We won’t be using, so we just pass NULL.
Allocate before calling.
17
A closer look (2)
Copyright © 2010, Elsevier Inc. All rights Reserved
int pthread_create (
pthread_t* thread_p /* out */ ,
const pthread_attr_t* attr_p /* in */ ,
void* (*start_routine ) ( void ) /* in */ ,
void* arg_p /* in */ ) ;
The function that the thread is to run.
Pointer to the argument that should
be passed to the function start_routine.
18
Function started by pthread_create
 Prototype:
void* thread_function ( void* args_p ) ;
 Void* can be cast to any pointer type in C.
 So args_p can point to a list containing one or
more values needed by thread_function.
 Similarly, the return value of thread_function can
point to a list of one or more values.
Copyright © 2010, Elsevier Inc. All rights Reserved
19
Running the Threads
Copyright © 2010, Elsevier Inc. All rights Reserved
Main thread forks and joins two threads.
20
Stopping the Threads
 We call the function pthread_join once for
each thread.
 A single call to pthread_join will wait for the
thread associated with the pthread_t object
to complete.
Copyright © 2010, Elsevier Inc. All rights Reserved
21
MATRIX-VECTOR
MULTIPLICATION IN PTHREADS
Copyright © 2010, Elsevier Inc. All rights Reserved
22
Serial pseudo-code
Copyright © 2010, Elsevier Inc. All rights Reserved
23
Using 3 Pthreads
Copyright © 2010, Elsevier Inc. All rights Reserved
thread 0
general case
24
Pthreads matrix-vector multiplication
Copyright © 2010, Elsevier Inc. All rights Reserved
25
CRITICAL SECTIONS
Copyright © 2010, Elsevier Inc. All rights Reserved
26
Estimating π
Copyright © 2010, Elsevier Inc. All rights Reserved
27
Using a dual core processor
Copyright © 2010, Elsevier Inc. All rights Reserved
Note that as we increase n, the estimate
with one thread gets better and better.
28
A thread function for computing π
Copyright © 2010, Elsevier Inc. All rights Reserved
29
Possible race condition
Copyright © 2010, Elsevier Inc. All rights Reserved
30
Busy-Waiting
 A thread repeatedly tests a condition, but,
effectively, does no useful work until the
condition has the appropriate value.
 Beware of optimizing compilers, though!
Copyright © 2010, Elsevier Inc. All rights Reserved
flag initialized to 0 by main thread
31
Pthreads global sum with busy-waiting
Copyright © 2010, Elsevier Inc. All rights Reserved
32
Global sum function with critical section after loop (1)
Copyright © 2010, Elsevier Inc. All rights Reserved
33
Global sum function with critical section after loop (2)
Copyright © 2010, Elsevier Inc. All rights Reserved
34
Mutexes
 A thread that is busy-waiting may
continually use the CPU accomplishing
nothing.
 Mutex (mutual exclusion) is a special type
of variable that can be used to restrict
access to a critical section to a single
thread at a time.
Copyright © 2010, Elsevier Inc. All rights Reserved
35
Mutexes
 Used to guarantee that one thread
“excludes” all other threads while it
executes the critical section.
 The Pthreads standard includes a special
type for mutexes: pthread_mutex_t.
Copyright © 2010, Elsevier Inc. All rights Reserved
36
Mutexes
 When a Pthreads program finishes using a
mutex, it should call
 In order to gain access to a critical section
a thread calls
Copyright © 2010, Elsevier Inc. All rights Reserved
37
Mutexes
 When a thread is finished executing the
code in a critical section, it should call
Copyright © 2010, Elsevier Inc. All rights Reserved
38
Global sum function that uses a mutex (1)
Copyright © 2010, Elsevier Inc. All rights Reserved
39
Global sum function that uses a mutex (2)
Copyright © 2010, Elsevier Inc. All rights Reserved
40
Copyright © 2010, Elsevier Inc. All rights Reserved
Run-times (in seconds) of π programs using n = 108
terms on a system with two four-core processors.
41
Copyright © 2010, Elsevier Inc. All rights Reserved
Possible sequence of events with busy-waiting
and more threads than cores.
42
PRODUCER-CONSUMER
SYNCHRONIZATION AND
SEMAPHORES
Copyright © 2010, Elsevier Inc. All rights Reserved
43
Issues
 Busy-waiting enforces the order threads
access a critical section.
 Using mutexes, the order is left to chance
and the system.
 There are applications where we need to
control the order threads access the critical
section.
Copyright © 2010, Elsevier Inc. All rights Reserved
44
Problems with a mutex solution
Copyright © 2010, Elsevier Inc. All rights Reserved
45
A first attempt at sending messages using pthreads
Copyright © 2010, Elsevier Inc. All rights Reserved
46
Syntax of the various semaphore functions
Copyright © 2010, Elsevier Inc. All rights Reserved
Semaphores are not part of Pthreads;
you need to add this.
47
BARRIERS AND CONDITION
VARIABLES
Copyright © 2010, Elsevier Inc. All rights Reserved
48
Barriers
 Synchronizing the threads to make sure
that they all are at the same point in a
program is called a barrier.
 No thread can cross the barrier until all the
threads have reached it.
Copyright © 2010, Elsevier Inc. All rights Reserved
49
Using barriers to time the slowest thread
Copyright © 2010, Elsevier Inc. All rights Reserved
50
Using barriers for debugging
Copyright © 2010, Elsevier Inc. All rights Reserved
51
Busy-waiting and a Mutex
 Implementing a barrier using busy-waiting
and a mutex is straightforward.
 We use a shared counter protected by the
mutex.
 When the counter indicates that every
thread has entered the critical section,
threads can leave the critical section.
Copyright © 2010, Elsevier Inc. All rights Reserved
52
Busy-waiting and a Mutex
Copyright © 2010, Elsevier Inc. All rights Reserved
We need one counter
variable for each
instance of the barrier,
otherwise problems
are likely to occur.
53
Implementing a barrier with semaphores
Copyright © 2010, Elsevier Inc. All rights Reserved
54
Condition Variables
 A condition variable is a data object that
allows a thread to suspend execution until
a certain event or condition occurs.
 When the event or condition occurs
another thread can signal the thread to
“wake up.”
 A condition variable is always associated
with a mutex.
Copyright © 2010, Elsevier Inc. All rights Reserved
55
Condition Variables
Copyright © 2010, Elsevier Inc. All rights Reserved
56
Implementing a barrier with condition variables
Copyright © 2010, Elsevier Inc. All rights Reserved
57
READ-WRITE LOCKS
Copyright © 2010, Elsevier Inc. All rights Reserved
58
Controlling access to a large,
shared data structure
 Let’s look at an example.
 Suppose the shared data structure is a
sorted linked list of ints, and the operations
of interest are Member, Insert, and Delete.
Copyright © 2010, Elsevier Inc. All rights Reserved
59
Linked Lists
Copyright © 2010, Elsevier Inc. All rights Reserved
60
Linked List Membership
Copyright © 2010, Elsevier Inc. All rights Reserved
61
Inserting a new node into a list
Copyright © 2010, Elsevier Inc. All rights Reserved
62
Inserting a new node into a list
Copyright © 2010, Elsevier Inc. All rights Reserved
63
Deleting a node from a linked list
Copyright © 2010, Elsevier Inc. All rights Reserved
64
Deleting a node from a linked list
Copyright © 2010, Elsevier Inc. All rights Reserved
65
A Multi-Threaded Linked List
 Let’s try to use these functions in a
Pthreads program.
 In order to share access to the list, we can
define head_p to be a global variable.
 This will simplify the function headers for
Member, Insert, and Delete, since we
won’t need to pass in either head_p or a
pointer to head_p: we’ll only need to pass
in the value of interest.
Copyright © 2010, Elsevier Inc. All rights Reserved
66
Simultaneous access by two threads
Copyright © 2010, Elsevier Inc. All rights Reserved
67
Solution #1
 An obvious solution is to simply lock the
list any time that a thread attempts to
access it.
 A call to each of the three functions can be
protected by a mutex.
Copyright © 2010, Elsevier Inc. All rights Reserved
In place of calling Member(value).
68
Issues
 We’re serializing access to the list.
 If the vast majority of our operations are
calls to Member, we’ll fail to exploit this
opportunity for parallelism.
 On the other hand, if most of our
operations are calls to Insert and Delete,
then this may be the best solution since
we’ll need to serialize access to the list for
most of the operations, and this solution
will certainly be easy to implement.
Copyright © 2010, Elsevier Inc. All rights Reserved
69
Solution #2
 Instead of locking the entire list, we could
try to lock individual nodes.
 A “finer-grained” approach.
Copyright © 2010, Elsevier Inc. All rights Reserved
70
Issues
 This is much more complex than the
original Member function.
 It is also much slower, since, in general,
each time a node is accessed, a mutex
must be locked and unlocked.
 The addition of a mutex field to each node
will substantially increase the amount of
storage needed for the list.
Copyright © 2010, Elsevier Inc. All rights Reserved
71
Implementation of Member with one mutex per list node (1)
Copyright © 2010, Elsevier Inc. All rights Reserved
72
Implementation of Member with one mutex per list node (2)
Copyright © 2010, Elsevier Inc. All rights Reserved
73
Pthreads Read-Write Locks
 Neither of our multi-threaded linked lists
exploits the potential for simultaneous
access to any node by threads that are
executing Member.
 The first solution only allows one thread to
access the entire list at any instant.
 The second only allows one thread to
access any given node at any instant.
Copyright © 2010, Elsevier Inc. All rights Reserved
74
Pthreads Read-Write Locks
 A read-write lock is somewhat like a mutex
except that it provides two lock functions.
 The first lock function locks the read-write
lock for reading, while the second locks it
for writing.
Copyright © 2010, Elsevier Inc. All rights Reserved
75
Pthreads Read-Write Locks
 So multiple threads can simultaneously
obtain the lock by calling the read-lock
function, while only one thread can obtain
the lock by calling the write-lock function.
 Thus, if any threads own the lock for
reading, any threads that want to obtain
the lock for writing will block in the call to
the write-lock function.
Copyright © 2010, Elsevier Inc. All rights Reserved
76
Pthreads Read-Write Locks
 If any thread owns the lock for writing, any
threads that want to obtain the lock for
reading or writing will block in their
respective locking functions.
Copyright © 2010, Elsevier Inc. All rights Reserved
77
Protecting our linked list functions
Copyright © 2010, Elsevier Inc. All rights Reserved
78
Linked List Performance
Copyright © 2010, Elsevier Inc. All rights Reserved
100,000 ops/thread
99.9% Member
0.05% Insert
0.05% Delete
79
Linked List Performance
Copyright © 2010, Elsevier Inc. All rights Reserved
100,000 ops/thread
80% Member
10% Insert
10% Delete
80
Caches, Cache-Coherence, and
False Sharing
 Recall that chip designers have added
blocks of relatively fast memory to
processors called cache memory.
 The use of cache memory can have a
huge impact on shared-memory.
 A write-miss occurs when a core tries to
update a variable that’s not in cache, and it
has to access main memory.
Copyright © 2010, Elsevier Inc. All rights Reserved
81
Pthreads matrix-vector multiplication
Copyright © 2010, Elsevier Inc. All rights Reserved
82
Run-times and efficiencies of
matrix-vector multiplication
Copyright © 2010, Elsevier Inc. All rights Reserved
(times are in seconds)
83
THREAD-SAFETY
Copyright © 2010, Elsevier Inc. All rights Reserved
84
Thread-Safety
 A block of code is thread-safe if it can be
simultaneously executed by multiple
threads without causing problems.
Copyright © 2010, Elsevier Inc. All rights Reserved
85
Example
 Suppose we want to use multiple threads
to “tokenize” a file that consists of ordinary
English text.
 The tokens are just contiguous sequences
of characters separated from the rest of
the text by white-space — a space, a tab,
or a newline.
Copyright © 2010, Elsevier Inc. All rights Reserved
86
Simple approach
 Divide the input file into lines of text and
assign the lines to the threads in a round-
robin fashion.
 The first line goes to thread 0, the second
goes to thread 1, . . . , the tth goes to
thread t, the t +1st goes to thread 0, etc.
Copyright © 2010, Elsevier Inc. All rights Reserved
87
Simple approach
 We can serialize access to the lines of
input using semaphores.
 After a thread has read a single line of
input, it can tokenize the line using the
strtok function.
Copyright © 2010, Elsevier Inc. All rights Reserved
88
The strtok function
 The first time it’s called the string argument
should be the text to be tokenized.
 Our line of input.
 For subsequent calls, the first argument
should be NULL.
Copyright © 2010, Elsevier Inc. All rights Reserved
89
The strtok function
 The idea is that in the first call, strtok
caches a pointer to string, and for
subsequent calls it returns successive
tokens taken from the cached copy.
Copyright © 2010, Elsevier Inc. All rights Reserved
90
Multi-threaded tokenizer (1)
Copyright © 2010, Elsevier Inc. All rights Reserved
91
Multi-threaded tokenizer (2)
Copyright © 2010, Elsevier Inc. All rights Reserved
92
Running with one thread
 It correctly tokenizes the input stream.
Copyright © 2010, Elsevier Inc. All rights Reserved
Pease porridge hot.
Pease porridge cold.
Pease porridge in the pot
Nine days old.
93
Running with two threads
Copyright © 2010, Elsevier Inc. All rights Reserved
Oops!
94
What happened?
 strtok caches the input line by declaring a
variable to have static storage class.
 This causes the value stored in this
variable to persist from one call to the next.
 Unfortunately for us, this cached string is
shared, not private.
Copyright © 2010, Elsevier Inc. All rights Reserved
95
What happened?
 Thus, thread 0’s call to strtok with the third
line of the input has apparently overwritten
the contents of thread 1’s call with the
second line.
 So the strtok function
is not thread-safe.
If multiple threads call
it simultaneously, the
output may not be
correct.
Copyright © 2010, Elsevier Inc. All rights Reserved
96
Other unsafe C library functions
 Regrettably, it’s not uncommon for C
library functions to fail to be thread-safe.
 The random number generator random in
stdlib.h.
 The time conversion function localtime in
time.h.
Copyright © 2010, Elsevier Inc. All rights Reserved
97
“re-entrant” (thread safe) functions
 In some cases, the C standard specifies
an alternate, thread-safe, version of a
function.
Copyright © 2010, Elsevier Inc. All rights Reserved
98
Concluding Remarks (1)
 A thread in shared-memory programming
is analogous to a process in distributed
memory programming.
 However, a thread is often lighter-weight
than a full-fledged process.
 In Pthreads programs, all the threads have
access to global variables, while local
variables usually are private to the thread
running the function.
Copyright © 2010, Elsevier Inc. All rights Reserved
99
Concluding Remarks (2)
 When indeterminacy results from multiple
threads attempting to access a shared
resource such as a shared variable or a
shared file, at least one of the accesses is
an update, and the accesses can result in
an error, we have a race condition.
Copyright © 2010, Elsevier Inc. All rights Reserved
100
Concluding Remarks (3)
 A critical section is a block of code that
updates a shared resource that can only
be updated by one thread at a time.
 So the execution of code in a critical
section should, effectively, be executed as
serial code.
Copyright © 2010, Elsevier Inc. All rights Reserved
101
Concluding Remarks (4)
 Busy-waiting can be used to avoid
conflicting access to critical sections with a
flag variable and a while-loop with an
empty body.
 It can be very wasteful of CPU cycles.
 It can also be unreliable if compiler
optimization is turned on.
Copyright © 2010, Elsevier Inc. All rights Reserved
102
Concluding Remarks (5)
 A mutex can be used to avoid conflicting
access to critical sections as well.
 Think of it as a lock on a critical section,
since mutexes arrange for mutually
exclusive access to a critical section.
Copyright © 2010, Elsevier Inc. All rights Reserved
103
Concluding Remarks (6)
 A semaphore is the third way to avoid
conflicting access to critical sections.
 It is an unsigned int together with two
operations: sem_wait and sem_post.
 Semaphores are more powerful than
mutexes since they can be initialized to
any nonnegative value.
Copyright © 2010, Elsevier Inc. All rights Reserved
104
Concluding Remarks (7)
 A barrier is a point in a program at which
the threads block until all of the threads
have reached it.
 A read-write lock is used when it’s safe for
multiple threads to simultaneously read a
data structure, but if a thread needs to
modify or write to the data structure, then
only that thread can access the data
structure during the modification.
Copyright © 2010, Elsevier Inc. All rights Reserved
105
Concluding Remarks (8)
 Some C functions cache data between
calls by declaring variables to be static,
causing errors when multiple threads call
the function.
 This type of function is not thread-safe.
Copyright © 2010, Elsevier Inc. All rights Reserved

More Related Content

Similar to Shared Memory Programming with Pthreads (1).ppt (20)

PPT
Introto netthreads-090906214344-phpapp01
Aravindharamanan S
 
PDF
Pthread Library
Khemraj Dhondge
 
PDF
ORTE - OCERA Real Time ethernet
Alexandre Chatiron
 
PDF
CMSIS_RTOS_Tutorial.pdf
chau44
 
PPT
Md09 multithreading
Rakesh Madugula
 
PPT
Multithreading
backdoor
 
PDF
20100730 phpstudy
Yusuke Ando
 
PPT
Multithreading Presentation
Neeraj Kaushik
 
PDF
MultiThreading in Python
SRINIVAS KOLAPARTHI
 
PPTX
multithread in multiprocessor architecture
myjuni04
 
PPTX
Distributed Memory Programming with MPI
Dilum Bandara
 
PPT
Chap7 slides
BaliThorat1
 
PDF
Vx works RTOS
Sai Malleswar
 
PPT
ESD_Unit_V_Task synchronization techniques
NagarajuNalluri1
 
PDF
Multithreading Introduction and Lifecyle of thread
Kartik Dube
 
PPTX
Threads
Sameer Shaik
 
PPTX
ch 7 POSIX.pptx
sibokac
 
PPT
Operating System Chapter 4 Multithreaded programming
guesta40f80
 
PDF
Beyond the RTOS: A Better Way to Design Real-Time Embedded Software
Quantum Leaps, LLC
 
PPT
multithreading
Rajkattamuri
 
Introto netthreads-090906214344-phpapp01
Aravindharamanan S
 
Pthread Library
Khemraj Dhondge
 
ORTE - OCERA Real Time ethernet
Alexandre Chatiron
 
CMSIS_RTOS_Tutorial.pdf
chau44
 
Md09 multithreading
Rakesh Madugula
 
Multithreading
backdoor
 
20100730 phpstudy
Yusuke Ando
 
Multithreading Presentation
Neeraj Kaushik
 
MultiThreading in Python
SRINIVAS KOLAPARTHI
 
multithread in multiprocessor architecture
myjuni04
 
Distributed Memory Programming with MPI
Dilum Bandara
 
Chap7 slides
BaliThorat1
 
Vx works RTOS
Sai Malleswar
 
ESD_Unit_V_Task synchronization techniques
NagarajuNalluri1
 
Multithreading Introduction and Lifecyle of thread
Kartik Dube
 
Threads
Sameer Shaik
 
ch 7 POSIX.pptx
sibokac
 
Operating System Chapter 4 Multithreaded programming
guesta40f80
 
Beyond the RTOS: A Better Way to Design Real-Time Embedded Software
Quantum Leaps, LLC
 
multithreading
Rajkattamuri
 

Recently uploaded (20)

PDF
Designing for Tomorrow – Architecture’s Role in the Sustainability Movement
BIM Services
 
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
AsadShad4
 
PDF
bs-en-12390-3 testing hardened concrete.pdf
ADVANCEDCONSTRUCTION
 
PDF
How to Buy Verified CashApp Accounts IN 2025
Buy Verified CashApp Accounts
 
PPT
SF 9_Unit 1.ppt software engineering ppt
AmarrKannthh
 
DOCX
Engineering Geology Field Report to Malekhu .docx
justprashant567
 
PDF
Plant Control_EST_85520-01_en_AllChanges_20220127.pdf
DarshanaChathuranga4
 
PDF
Bayesian Learning - Naive Bayes Algorithm
Sharmila Chidaravalli
 
PDF
CLIP_Internals_and_Architecture.pdf sdvsdv sdv
JoseLuisCahuanaRamos3
 
PPTX
Computer network Computer network Computer network Computer network
Shrikant317689
 
PDF
Python Mini Project: Command-Line Quiz Game for School/College Students
MPREETHI7
 
PPTX
FSE_LLM4SE1_A Tool for In-depth Analysis of Code Execution Reasoning of Large...
cl144
 
PPTX
CST413 KTU S7 CSE Machine Learning Introduction Parameter Estimation MLE MAP ...
resming1
 
PDF
تقرير عن التحليل الديناميكي لتدفق الهواء حول جناح.pdf
محمد قصص فتوتة
 
PDF
輪読会資料_Miipher and Miipher2 .
NABLAS株式会社
 
PPTX
CST413 KTU S7 CSE Machine Learning Neural Networks and Support Vector Machine...
resming1
 
PPTX
Introduction to File Transfer Protocol with commands in FTP
BeulahS2
 
PDF
Authentication Devices in Fog-mobile Edge Computing Environments through a Wi...
ijujournal
 
PDF
PRIZ Academy - Process functional modelling
PRIZ Guru
 
PDF
13th International Conference of Security, Privacy and Trust Management (SPTM...
ijcisjournal
 
Designing for Tomorrow – Architecture’s Role in the Sustainability Movement
BIM Services
 
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
AsadShad4
 
bs-en-12390-3 testing hardened concrete.pdf
ADVANCEDCONSTRUCTION
 
How to Buy Verified CashApp Accounts IN 2025
Buy Verified CashApp Accounts
 
SF 9_Unit 1.ppt software engineering ppt
AmarrKannthh
 
Engineering Geology Field Report to Malekhu .docx
justprashant567
 
Plant Control_EST_85520-01_en_AllChanges_20220127.pdf
DarshanaChathuranga4
 
Bayesian Learning - Naive Bayes Algorithm
Sharmila Chidaravalli
 
CLIP_Internals_and_Architecture.pdf sdvsdv sdv
JoseLuisCahuanaRamos3
 
Computer network Computer network Computer network Computer network
Shrikant317689
 
Python Mini Project: Command-Line Quiz Game for School/College Students
MPREETHI7
 
FSE_LLM4SE1_A Tool for In-depth Analysis of Code Execution Reasoning of Large...
cl144
 
CST413 KTU S7 CSE Machine Learning Introduction Parameter Estimation MLE MAP ...
resming1
 
تقرير عن التحليل الديناميكي لتدفق الهواء حول جناح.pdf
محمد قصص فتوتة
 
輪読会資料_Miipher and Miipher2 .
NABLAS株式会社
 
CST413 KTU S7 CSE Machine Learning Neural Networks and Support Vector Machine...
resming1
 
Introduction to File Transfer Protocol with commands in FTP
BeulahS2
 
Authentication Devices in Fog-mobile Edge Computing Environments through a Wi...
ijujournal
 
PRIZ Academy - Process functional modelling
PRIZ Guru
 
13th International Conference of Security, Privacy and Trust Management (SPTM...
ijcisjournal
 
Ad

Shared Memory Programming with Pthreads (1).ppt

  • 1. 1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 4 Shared Memory Programming with Pthreads An Introduction to Parallel Programming Peter Pacheco
  • 2. 2 Copyright © 2010, Elsevier Inc. All rights Reserved Roadmap  Problems programming shared memory systems.  Controlling access to a critical section.  Thread synchronization.  Programming with POSIX threads.  Mutexes.  Producer-consumer synchronization and semaphores.  Barriers and condition variables.  Read-write locks.  Thread safety. # Chapter Subtitle
  • 3. 3 A Shared Memory System Copyright © 2010, Elsevier Inc. All rights Reserved
  • 4. 4 Processes and Threads  A process is an instance of a running (or suspended) program.  Threads are analogous to a “light-weight” process.  In a shared memory program a single process may have multiple threads of control. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 5. 5 POSIX®Threads  Also known as Pthreads.  A standard for Unix-like operating systems.  A library that can be linked with C programs.  Specifies an application programming interface (API) for multi-threaded programming. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 6. 6 Caveat  The Pthreads API is only available on POSIXR systems — Linux, MacOS X, Solaris, HPUX, … Copyright © 2010, Elsevier Inc. All rights Reserved
  • 7. 7 Hello World! (1) Copyright © 2010, Elsevier Inc. All rights Reserved declares the various Pthreads functions, constants, types, etc.
  • 8. 8 Hello World! (2) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 9. 9 Hello World! (3) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 10. 10 Compiling a Pthread program Copyright © 2010, Elsevier Inc. All rights Reserved gcc −g −Wall −o pth_hello pth_hello . c −lpthread link in the Pthreads library
  • 11. 11 Running a Pthreads program Copyright © 2010, Elsevier Inc. All rights Reserved . / pth_hello <number of threads> . / pth_hello 1 Hello from the main thread Hello from thread 0 of 1 . / pth_hello 4 Hello from the main thread Hello from thread 0 of 4 Hello from thread 1 of 4 Hello from thread 2 of 4 Hello from thread 3 of 4
  • 12. 12 Global variables  Can introduce subtle and confusing bugs!  Limit use of global variables to situations in which they’re really needed.  Shared variables. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 13. 13 Starting the Threads  Processes in MPI are usually started by a script.  In Pthreads the threads are started by the program executable. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 14. 14 Starting the Threads Copyright © 2010, Elsevier Inc. All rights Reserved pthread.h pthread_t int pthread_create ( pthread_t* thread_p /* out */ , const pthread_attr_t* attr_p /* in */ , void* (*start_routine ) ( void ) /* in */ , void* arg_p /* in */ ) ; One object for each thread.
  • 15. 15 pthread_t objects  Opaque  The actual data that they store is system- specific.  Their data members aren’t directly accessible to user code.  However, the Pthreads standard guarantees that a pthread_t object does store enough information to uniquely identify the thread with which it’s associated. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 16. 16 A closer look (1) Copyright © 2010, Elsevier Inc. All rights Reserved int pthread_create ( pthread_t* thread_p /* out */ , const pthread_attr_t* attr_p /* in */ , void* (*start_routine ) ( void ) /* in */ , void* arg_p /* in */ ) ; We won’t be using, so we just pass NULL. Allocate before calling.
  • 17. 17 A closer look (2) Copyright © 2010, Elsevier Inc. All rights Reserved int pthread_create ( pthread_t* thread_p /* out */ , const pthread_attr_t* attr_p /* in */ , void* (*start_routine ) ( void ) /* in */ , void* arg_p /* in */ ) ; The function that the thread is to run. Pointer to the argument that should be passed to the function start_routine.
  • 18. 18 Function started by pthread_create  Prototype: void* thread_function ( void* args_p ) ;  Void* can be cast to any pointer type in C.  So args_p can point to a list containing one or more values needed by thread_function.  Similarly, the return value of thread_function can point to a list of one or more values. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 19. 19 Running the Threads Copyright © 2010, Elsevier Inc. All rights Reserved Main thread forks and joins two threads.
  • 20. 20 Stopping the Threads  We call the function pthread_join once for each thread.  A single call to pthread_join will wait for the thread associated with the pthread_t object to complete. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 21. 21 MATRIX-VECTOR MULTIPLICATION IN PTHREADS Copyright © 2010, Elsevier Inc. All rights Reserved
  • 22. 22 Serial pseudo-code Copyright © 2010, Elsevier Inc. All rights Reserved
  • 23. 23 Using 3 Pthreads Copyright © 2010, Elsevier Inc. All rights Reserved thread 0 general case
  • 24. 24 Pthreads matrix-vector multiplication Copyright © 2010, Elsevier Inc. All rights Reserved
  • 25. 25 CRITICAL SECTIONS Copyright © 2010, Elsevier Inc. All rights Reserved
  • 26. 26 Estimating π Copyright © 2010, Elsevier Inc. All rights Reserved
  • 27. 27 Using a dual core processor Copyright © 2010, Elsevier Inc. All rights Reserved Note that as we increase n, the estimate with one thread gets better and better.
  • 28. 28 A thread function for computing π Copyright © 2010, Elsevier Inc. All rights Reserved
  • 29. 29 Possible race condition Copyright © 2010, Elsevier Inc. All rights Reserved
  • 30. 30 Busy-Waiting  A thread repeatedly tests a condition, but, effectively, does no useful work until the condition has the appropriate value.  Beware of optimizing compilers, though! Copyright © 2010, Elsevier Inc. All rights Reserved flag initialized to 0 by main thread
  • 31. 31 Pthreads global sum with busy-waiting Copyright © 2010, Elsevier Inc. All rights Reserved
  • 32. 32 Global sum function with critical section after loop (1) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 33. 33 Global sum function with critical section after loop (2) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 34. 34 Mutexes  A thread that is busy-waiting may continually use the CPU accomplishing nothing.  Mutex (mutual exclusion) is a special type of variable that can be used to restrict access to a critical section to a single thread at a time. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 35. 35 Mutexes  Used to guarantee that one thread “excludes” all other threads while it executes the critical section.  The Pthreads standard includes a special type for mutexes: pthread_mutex_t. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 36. 36 Mutexes  When a Pthreads program finishes using a mutex, it should call  In order to gain access to a critical section a thread calls Copyright © 2010, Elsevier Inc. All rights Reserved
  • 37. 37 Mutexes  When a thread is finished executing the code in a critical section, it should call Copyright © 2010, Elsevier Inc. All rights Reserved
  • 38. 38 Global sum function that uses a mutex (1) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 39. 39 Global sum function that uses a mutex (2) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 40. 40 Copyright © 2010, Elsevier Inc. All rights Reserved Run-times (in seconds) of π programs using n = 108 terms on a system with two four-core processors.
  • 41. 41 Copyright © 2010, Elsevier Inc. All rights Reserved Possible sequence of events with busy-waiting and more threads than cores.
  • 42. 42 PRODUCER-CONSUMER SYNCHRONIZATION AND SEMAPHORES Copyright © 2010, Elsevier Inc. All rights Reserved
  • 43. 43 Issues  Busy-waiting enforces the order threads access a critical section.  Using mutexes, the order is left to chance and the system.  There are applications where we need to control the order threads access the critical section. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 44. 44 Problems with a mutex solution Copyright © 2010, Elsevier Inc. All rights Reserved
  • 45. 45 A first attempt at sending messages using pthreads Copyright © 2010, Elsevier Inc. All rights Reserved
  • 46. 46 Syntax of the various semaphore functions Copyright © 2010, Elsevier Inc. All rights Reserved Semaphores are not part of Pthreads; you need to add this.
  • 47. 47 BARRIERS AND CONDITION VARIABLES Copyright © 2010, Elsevier Inc. All rights Reserved
  • 48. 48 Barriers  Synchronizing the threads to make sure that they all are at the same point in a program is called a barrier.  No thread can cross the barrier until all the threads have reached it. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 49. 49 Using barriers to time the slowest thread Copyright © 2010, Elsevier Inc. All rights Reserved
  • 50. 50 Using barriers for debugging Copyright © 2010, Elsevier Inc. All rights Reserved
  • 51. 51 Busy-waiting and a Mutex  Implementing a barrier using busy-waiting and a mutex is straightforward.  We use a shared counter protected by the mutex.  When the counter indicates that every thread has entered the critical section, threads can leave the critical section. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 52. 52 Busy-waiting and a Mutex Copyright © 2010, Elsevier Inc. All rights Reserved We need one counter variable for each instance of the barrier, otherwise problems are likely to occur.
  • 53. 53 Implementing a barrier with semaphores Copyright © 2010, Elsevier Inc. All rights Reserved
  • 54. 54 Condition Variables  A condition variable is a data object that allows a thread to suspend execution until a certain event or condition occurs.  When the event or condition occurs another thread can signal the thread to “wake up.”  A condition variable is always associated with a mutex. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 55. 55 Condition Variables Copyright © 2010, Elsevier Inc. All rights Reserved
  • 56. 56 Implementing a barrier with condition variables Copyright © 2010, Elsevier Inc. All rights Reserved
  • 57. 57 READ-WRITE LOCKS Copyright © 2010, Elsevier Inc. All rights Reserved
  • 58. 58 Controlling access to a large, shared data structure  Let’s look at an example.  Suppose the shared data structure is a sorted linked list of ints, and the operations of interest are Member, Insert, and Delete. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 59. 59 Linked Lists Copyright © 2010, Elsevier Inc. All rights Reserved
  • 60. 60 Linked List Membership Copyright © 2010, Elsevier Inc. All rights Reserved
  • 61. 61 Inserting a new node into a list Copyright © 2010, Elsevier Inc. All rights Reserved
  • 62. 62 Inserting a new node into a list Copyright © 2010, Elsevier Inc. All rights Reserved
  • 63. 63 Deleting a node from a linked list Copyright © 2010, Elsevier Inc. All rights Reserved
  • 64. 64 Deleting a node from a linked list Copyright © 2010, Elsevier Inc. All rights Reserved
  • 65. 65 A Multi-Threaded Linked List  Let’s try to use these functions in a Pthreads program.  In order to share access to the list, we can define head_p to be a global variable.  This will simplify the function headers for Member, Insert, and Delete, since we won’t need to pass in either head_p or a pointer to head_p: we’ll only need to pass in the value of interest. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 66. 66 Simultaneous access by two threads Copyright © 2010, Elsevier Inc. All rights Reserved
  • 67. 67 Solution #1  An obvious solution is to simply lock the list any time that a thread attempts to access it.  A call to each of the three functions can be protected by a mutex. Copyright © 2010, Elsevier Inc. All rights Reserved In place of calling Member(value).
  • 68. 68 Issues  We’re serializing access to the list.  If the vast majority of our operations are calls to Member, we’ll fail to exploit this opportunity for parallelism.  On the other hand, if most of our operations are calls to Insert and Delete, then this may be the best solution since we’ll need to serialize access to the list for most of the operations, and this solution will certainly be easy to implement. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 69. 69 Solution #2  Instead of locking the entire list, we could try to lock individual nodes.  A “finer-grained” approach. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 70. 70 Issues  This is much more complex than the original Member function.  It is also much slower, since, in general, each time a node is accessed, a mutex must be locked and unlocked.  The addition of a mutex field to each node will substantially increase the amount of storage needed for the list. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 71. 71 Implementation of Member with one mutex per list node (1) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 72. 72 Implementation of Member with one mutex per list node (2) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 73. 73 Pthreads Read-Write Locks  Neither of our multi-threaded linked lists exploits the potential for simultaneous access to any node by threads that are executing Member.  The first solution only allows one thread to access the entire list at any instant.  The second only allows one thread to access any given node at any instant. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 74. 74 Pthreads Read-Write Locks  A read-write lock is somewhat like a mutex except that it provides two lock functions.  The first lock function locks the read-write lock for reading, while the second locks it for writing. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 75. 75 Pthreads Read-Write Locks  So multiple threads can simultaneously obtain the lock by calling the read-lock function, while only one thread can obtain the lock by calling the write-lock function.  Thus, if any threads own the lock for reading, any threads that want to obtain the lock for writing will block in the call to the write-lock function. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 76. 76 Pthreads Read-Write Locks  If any thread owns the lock for writing, any threads that want to obtain the lock for reading or writing will block in their respective locking functions. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 77. 77 Protecting our linked list functions Copyright © 2010, Elsevier Inc. All rights Reserved
  • 78. 78 Linked List Performance Copyright © 2010, Elsevier Inc. All rights Reserved 100,000 ops/thread 99.9% Member 0.05% Insert 0.05% Delete
  • 79. 79 Linked List Performance Copyright © 2010, Elsevier Inc. All rights Reserved 100,000 ops/thread 80% Member 10% Insert 10% Delete
  • 80. 80 Caches, Cache-Coherence, and False Sharing  Recall that chip designers have added blocks of relatively fast memory to processors called cache memory.  The use of cache memory can have a huge impact on shared-memory.  A write-miss occurs when a core tries to update a variable that’s not in cache, and it has to access main memory. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 81. 81 Pthreads matrix-vector multiplication Copyright © 2010, Elsevier Inc. All rights Reserved
  • 82. 82 Run-times and efficiencies of matrix-vector multiplication Copyright © 2010, Elsevier Inc. All rights Reserved (times are in seconds)
  • 83. 83 THREAD-SAFETY Copyright © 2010, Elsevier Inc. All rights Reserved
  • 84. 84 Thread-Safety  A block of code is thread-safe if it can be simultaneously executed by multiple threads without causing problems. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 85. 85 Example  Suppose we want to use multiple threads to “tokenize” a file that consists of ordinary English text.  The tokens are just contiguous sequences of characters separated from the rest of the text by white-space — a space, a tab, or a newline. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 86. 86 Simple approach  Divide the input file into lines of text and assign the lines to the threads in a round- robin fashion.  The first line goes to thread 0, the second goes to thread 1, . . . , the tth goes to thread t, the t +1st goes to thread 0, etc. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 87. 87 Simple approach  We can serialize access to the lines of input using semaphores.  After a thread has read a single line of input, it can tokenize the line using the strtok function. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 88. 88 The strtok function  The first time it’s called the string argument should be the text to be tokenized.  Our line of input.  For subsequent calls, the first argument should be NULL. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 89. 89 The strtok function  The idea is that in the first call, strtok caches a pointer to string, and for subsequent calls it returns successive tokens taken from the cached copy. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 90. 90 Multi-threaded tokenizer (1) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 91. 91 Multi-threaded tokenizer (2) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 92. 92 Running with one thread  It correctly tokenizes the input stream. Copyright © 2010, Elsevier Inc. All rights Reserved Pease porridge hot. Pease porridge cold. Pease porridge in the pot Nine days old.
  • 93. 93 Running with two threads Copyright © 2010, Elsevier Inc. All rights Reserved Oops!
  • 94. 94 What happened?  strtok caches the input line by declaring a variable to have static storage class.  This causes the value stored in this variable to persist from one call to the next.  Unfortunately for us, this cached string is shared, not private. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 95. 95 What happened?  Thus, thread 0’s call to strtok with the third line of the input has apparently overwritten the contents of thread 1’s call with the second line.  So the strtok function is not thread-safe. If multiple threads call it simultaneously, the output may not be correct. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 96. 96 Other unsafe C library functions  Regrettably, it’s not uncommon for C library functions to fail to be thread-safe.  The random number generator random in stdlib.h.  The time conversion function localtime in time.h. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 97. 97 “re-entrant” (thread safe) functions  In some cases, the C standard specifies an alternate, thread-safe, version of a function. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 98. 98 Concluding Remarks (1)  A thread in shared-memory programming is analogous to a process in distributed memory programming.  However, a thread is often lighter-weight than a full-fledged process.  In Pthreads programs, all the threads have access to global variables, while local variables usually are private to the thread running the function. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 99. 99 Concluding Remarks (2)  When indeterminacy results from multiple threads attempting to access a shared resource such as a shared variable or a shared file, at least one of the accesses is an update, and the accesses can result in an error, we have a race condition. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 100. 100 Concluding Remarks (3)  A critical section is a block of code that updates a shared resource that can only be updated by one thread at a time.  So the execution of code in a critical section should, effectively, be executed as serial code. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 101. 101 Concluding Remarks (4)  Busy-waiting can be used to avoid conflicting access to critical sections with a flag variable and a while-loop with an empty body.  It can be very wasteful of CPU cycles.  It can also be unreliable if compiler optimization is turned on. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 102. 102 Concluding Remarks (5)  A mutex can be used to avoid conflicting access to critical sections as well.  Think of it as a lock on a critical section, since mutexes arrange for mutually exclusive access to a critical section. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 103. 103 Concluding Remarks (6)  A semaphore is the third way to avoid conflicting access to critical sections.  It is an unsigned int together with two operations: sem_wait and sem_post.  Semaphores are more powerful than mutexes since they can be initialized to any nonnegative value. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 104. 104 Concluding Remarks (7)  A barrier is a point in a program at which the threads block until all of the threads have reached it.  A read-write lock is used when it’s safe for multiple threads to simultaneously read a data structure, but if a thread needs to modify or write to the data structure, then only that thread can access the data structure during the modification. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 105. 105 Concluding Remarks (8)  Some C functions cache data between calls by declaring variables to be static, causing errors when multiple threads call the function.  This type of function is not thread-safe. Copyright © 2010, Elsevier Inc. All rights Reserved

Editor's Notes

  • #2: 21 July 2022
  • #3: 21 July 2022