Shared Memory Programming with Pthreads (1).ppt

1
Copyright © 2010, Elsevier Inc. All rights Reserved
Chapter 4
Shared Memory Programming
with Pthreads
An Introduction to Parallel Programming
Peter Pacheco

2
Roadmap
 Problems programming shared memory
systems.
 Controlling access to a critical section.
 Thread synchronization.
 Programming with POSIX threads.
 Mutexes.
 Producer-consumer synchronization and
semaphores.
 Barriers and condition variables.
 Read-write locks.
 Thread safety.
#
Chapter
Subtitle

3
A Shared Memory System

4
Processes and Threads
 A process is an instance of a running (or
suspended) program.
 Threads are analogous to a “light-weight”
process.
 In a shared memory program a single
process may have multiple threads of
control.

5
POSIX®Threads
 Also known as Pthreads.
 A standard for Unix-like operating systems.
 A library that can be linked with C
programs.
 Specifies an application programming
interface (API) for multi-threaded
programming.

6
Caveat
 The Pthreads API is only available on
POSIXR systems — Linux, MacOS X,
Solaris, HPUX, …

7
Hello World! (1)
declares the various Pthreads
functions, constants, types, etc.

8
Hello World! (2)

9
Hello World! (3)

10
Compiling a Pthread program
gcc −g −Wall −o pth_hello pth_hello . c −lpthread
link in the Pthreads library

11
Running a Pthreads program
. / pth_hello <number of threads>
. / pth_hello 1
Hello from the main thread
Hello from thread 0 of 1
. / pth_hello 4
Hello from the main thread

12
Global variables
 Can introduce subtle and confusing bugs!
 Limit use of global variables to situations in
which they’re really needed.
 Shared variables.

13
Starting the Threads
 Processes in MPI are usually started by a
script.
 In Pthreads the threads are started by the
program executable.

14
Starting the Threads
pthread.h
pthread_t
int pthread_create (
pthread_t* thread_p /* out */ ,
const pthread_attr_t* attr_p /* in */ ,
void* (*start_routine ) ( void ) /* in */ ,
void* arg_p /* in */ ) ;
One object
for each
thread.

15
pthread_t objects
 Opaque
 The actual data that they store is system-
specific.
 Their data members aren’t directly accessible
to user code.
 However, the Pthreads standard guarantees
that a pthread_t object does store enough
information to uniquely identify the thread with
which it’s associated.

16
A closer look (1)
We won’t be using, so we just pass NULL.
Allocate before calling.

17
A closer look (2)
The function that the thread is to run.
Pointer to the argument that should
be passed to the function start_routine.

18
Function started by pthread_create
 Prototype:
void* thread_function ( void* args_p ) ;
 Void* can be cast to any pointer type in C.
 So args_p can point to a list containing one or
more values needed by thread_function.
 Similarly, the return value of thread_function can
point to a list of one or more values.

19
Running the Threads
Main thread forks and joins two threads.

20
Stopping the Threads
 We call the function pthread_join once for
each thread.
 A single call to pthread_join will wait for the
thread associated with the pthread_t object
to complete.

21
MATRIX-VECTOR
MULTIPLICATION IN PTHREADS

22
Serial pseudo-code

23
Using 3 Pthreads
thread 0
general case

24
Pthreads matrix-vector multiplication

25
CRITICAL SECTIONS

26
Estimating π

27
Using a dual core processor
Note that as we increase n, the estimate
with one thread gets better and better.

28
A thread function for computing π

29
Possible race condition

30
Busy-Waiting
 A thread repeatedly tests a condition, but,
effectively, does no useful work until the
condition has the appropriate value.
 Beware of optimizing compilers, though!
flag initialized to 0 by main thread

31
Pthreads global sum with busy-waiting

32
Global sum function with critical section after loop (1)

33
Global sum function with critical section after loop (2)

34
Mutexes
 A thread that is busy-waiting may
continually use the CPU accomplishing
nothing.
 Mutex (mutual exclusion) is a special type
of variable that can be used to restrict
access to a critical section to a single
thread at a time.

35
Mutexes
 Used to guarantee that one thread
“excludes” all other threads while it
executes the critical section.
 The Pthreads standard includes a special
type for mutexes: pthread_mutex_t.

36
Mutexes
 When a Pthreads program finishes using a
mutex, it should call
 In order to gain access to a critical section
a thread calls

37
Mutexes
 When a thread is finished executing the
code in a critical section, it should call

38
Global sum function that uses a mutex (1)

39
Global sum function that uses a mutex (2)

40
Run-times (in seconds) of π programs using n = 108
terms on a system with two four-core processors.

41
Possible sequence of events with busy-waiting
and more threads than cores.

42
PRODUCER-CONSUMER
SYNCHRONIZATION AND
SEMAPHORES

43
Issues
 Busy-waiting enforces the order threads
access a critical section.
 Using mutexes, the order is left to chance
and the system.
 There are applications where we need to
control the order threads access the critical
section.

44
Problems with a mutex solution

45
A first attempt at sending messages using pthreads

46
Syntax of the various semaphore functions
Semaphores are not part of Pthreads;
you need to add this.

47
BARRIERS AND CONDITION
VARIABLES

48
Barriers
 Synchronizing the threads to make sure
that they all are at the same point in a
program is called a barrier.
 No thread can cross the barrier until all the
threads have reached it.

49
Using barriers to time the slowest thread

50
Using barriers for debugging

51
Busy-waiting and a Mutex
 Implementing a barrier using busy-waiting
and a mutex is straightforward.
 We use a shared counter protected by the
mutex.
 When the counter indicates that every
thread has entered the critical section,
threads can leave the critical section.

52
Busy-waiting and a Mutex
We need one counter
variable for each
instance of the barrier,
otherwise problems
are likely to occur.

53
Implementing a barrier with semaphores

54
Condition Variables
 A condition variable is a data object that
allows a thread to suspend execution until
a certain event or condition occurs.
 When the event or condition occurs
another thread can signal the thread to
“wake up.”
 A condition variable is always associated
with a mutex.

55
Condition Variables

56
Implementing a barrier with condition variables

57
READ-WRITE LOCKS

58
Controlling access to a large,
shared data structure
 Let’s look at an example.
 Suppose the shared data structure is a
sorted linked list of ints, and the operations
of interest are Member, Insert, and Delete.

59
Linked Lists

60
Linked List Membership

61
Inserting a new node into a list

62
Inserting a new node into a list

63
Deleting a node from a linked list

64
Deleting a node from a linked list

65
A Multi-Threaded Linked List
 Let’s try to use these functions in a
Pthreads program.
 In order to share access to the list, we can
define head_p to be a global variable.
 This will simplify the function headers for
Member, Insert, and Delete, since we
won’t need to pass in either head_p or a
pointer to head_p: we’ll only need to pass
in the value of interest.

66
Simultaneous access by two threads

67
Solution #1
 An obvious solution is to simply lock the
list any time that a thread attempts to
access it.
 A call to each of the three functions can be
protected by a mutex.
In place of calling Member(value).

68
Issues
 We’re serializing access to the list.
 If the vast majority of our operations are
calls to Member, we’ll fail to exploit this
opportunity for parallelism.
 On the other hand, if most of our
operations are calls to Insert and Delete,
then this may be the best solution since
we’ll need to serialize access to the list for
most of the operations, and this solution
will certainly be easy to implement.

69
Solution #2
 Instead of locking the entire list, we could
try to lock individual nodes.
 A “finer-grained” approach.

70
Issues
 This is much more complex than the
original Member function.
 It is also much slower, since, in general,
each time a node is accessed, a mutex
must be locked and unlocked.
 The addition of a mutex field to each node
will substantially increase the amount of
storage needed for the list.

71
Implementation of Member with one mutex per list node (1)

72
Implementation of Member with one mutex per list node (2)

73
Pthreads Read-Write Locks
 Neither of our multi-threaded linked lists
exploits the potential for simultaneous
access to any node by threads that are
executing Member.
 The first solution only allows one thread to
access the entire list at any instant.
 The second only allows one thread to
access any given node at any instant.

74
 A read-write lock is somewhat like a mutex
except that it provides two lock functions.
 The first lock function locks the read-write
lock for reading, while the second locks it
for writing.

75
 So multiple threads can simultaneously
obtain the lock by calling the read-lock
function, while only one thread can obtain
the lock by calling the write-lock function.
 Thus, if any threads own the lock for
reading, any threads that want to obtain
the lock for writing will block in the call to
the write-lock function.

76
 If any thread owns the lock for writing, any
threads that want to obtain the lock for
reading or writing will block in their
respective locking functions.

77
Protecting our linked list functions

78
Linked List Performance
100,000 ops/thread
99.9% Member
0.05% Insert
0.05% Delete

79
Linked List Performance
100,000 ops/thread
80% Member
10% Insert
10% Delete

80
Caches, Cache-Coherence, and
False Sharing
 Recall that chip designers have added
blocks of relatively fast memory to
processors called cache memory.
 The use of cache memory can have a
huge impact on shared-memory.
 A write-miss occurs when a core tries to
update a variable that’s not in cache, and it
has to access main memory.

81
Pthreads matrix-vector multiplication

82
Run-times and efficiencies of
matrix-vector multiplication
(times are in seconds)

83
THREAD-SAFETY

84
Thread-Safety
 A block of code is thread-safe if it can be
simultaneously executed by multiple
threads without causing problems.

85
Example
 Suppose we want to use multiple threads
to “tokenize” a file that consists of ordinary
English text.
 The tokens are just contiguous sequences
of characters separated from the rest of
the text by white-space — a space, a tab,
or a newline.

86
Simple approach
 Divide the input file into lines of text and
assign the lines to the threads in a round-
robin fashion.
 The first line goes to thread 0, the second
goes to thread 1, . . . , the tth goes to
thread t, the t +1st goes to thread 0, etc.

87
Simple approach
 We can serialize access to the lines of
input using semaphores.
 After a thread has read a single line of
input, it can tokenize the line using the
strtok function.

88
The strtok function
 The first time it’s called the string argument
should be the text to be tokenized.
 Our line of input.
 For subsequent calls, the first argument
should be NULL.

89
The strtok function
 The idea is that in the first call, strtok
caches a pointer to string, and for
subsequent calls it returns successive
tokens taken from the cached copy.

90
Multi-threaded tokenizer (1)

91
Multi-threaded tokenizer (2)

92
Running with one thread
 It correctly tokenizes the input stream.
Pease porridge hot.
Pease porridge cold.
Pease porridge in the pot
Nine days old.

93
Running with two threads
Oops!

94
What happened?
 strtok caches the input line by declaring a
variable to have static storage class.
 This causes the value stored in this
variable to persist from one call to the next.
 Unfortunately for us, this cached string is
shared, not private.

95
What happened?
 Thus, thread 0’s call to strtok with the third
line of the input has apparently overwritten
the contents of thread 1’s call with the
second line.
 So the strtok function
is not thread-safe.
If multiple threads call
it simultaneously, the
output may not be
correct.

96
Other unsafe C library functions
 Regrettably, it’s not uncommon for C
library functions to fail to be thread-safe.
 The random number generator random in
stdlib.h.
 The time conversion function localtime in
time.h.

97
“re-entrant” (thread safe) functions
 In some cases, the C standard specifies
an alternate, thread-safe, version of a
function.

98
Concluding Remarks (1)
 A thread in shared-memory programming
is analogous to a process in distributed
memory programming.
 However, a thread is often lighter-weight
than a full-fledged process.
 In Pthreads programs, all the threads have
access to global variables, while local
variables usually are private to the thread
running the function.

99
 When indeterminacy results from multiple
threads attempting to access a shared
resource such as a shared variable or a
shared file, at least one of the accesses is
an update, and the accesses can result in
an error, we have a race condition.

100
 A critical section is a block of code that
updates a shared resource that can only
be updated by one thread at a time.
 So the execution of code in a critical
section should, effectively, be executed as
serial code.

101
 Busy-waiting can be used to avoid
conflicting access to critical sections with a
flag variable and a while-loop with an
empty body.
 It can be very wasteful of CPU cycles.
 It can also be unreliable if compiler
optimization is turned on.

102
 A mutex can be used to avoid conflicting
access to critical sections as well.
 Think of it as a lock on a critical section,
since mutexes arrange for mutually
exclusive access to a critical section.

103
 A semaphore is the third way to avoid
conflicting access to critical sections.
 It is an unsigned int together with two
operations: sem_wait and sem_post.
 Semaphores are more powerful than
mutexes since they can be initialized to
any nonnegative value.

104
 A barrier is a point in a program at which
the threads block until all of the threads
have reached it.
 A read-write lock is used when it’s safe for
multiple threads to simultaneously read a
data structure, but if a thread needs to
modify or write to the data structure, then
only that thread can access the data
structure during the modification.

105
 Some C functions cache data between
calls by declaring variables to be static,
causing errors when multiple threads call
the function.
 This type of function is not thread-safe.

Shared Memory Programming with Pthreads (1).ppt

More Related Content

Similar to Shared Memory Programming with Pthreads (1).ppt (20)

Recently uploaded (20)

Shared Memory Programming with Pthreads (1).ppt

Editor's Notes