SlideShare a Scribd company logo
Concurrent Data Structures
CS5225 Parallel and Concurrent Programming
Dilum Bandara
Dilum.Bandara@uom.lk
Some slides adapted from “The Art of Multiprocessor Programming”
by Maurice Herlihy & Nir Shavit
Motivation
 Many/multi-core processors  multiple threads
working on shared data structures
 Shared-memory multiprocessors
 Threads communicate & synchronize through data
structures
 Locks, semaphores, mutexex, monitors satisfy
Safety & Liveness properties
 But they lack performance
 Make everything serial
2
Example – getAndIncrement()
 From prime number example with a shared counter
3
public class Counter {
private long value;
public long getAndIncrement() {
return value++;
}
}
temp = value;
value = value + 1;
return temp;
Example – Solution
4
public class Counter {
private long value;
public long getAndIncrement() {
Lock()
temp = value;
value = temp + 1;
Unlock()
return temp;
}
}
Synchronized block
Mutual exclusion
Performance Problems With Locking
 Amdahl's law
 Code between Lock() & Unlock() is sequential
 Even a small sequential fraction reduces performance
 Memory contention
 Every processor need to access lock variable(s)
 Low performance on cache-coherent multiprocessors
 Blocking
 Thread holding lock, delay others
 e.g., blocking access to getAndIncrement()
5
Blocking Techniques
 Coarse grained blocking
 e.g., blocking an entire linked list
 Fine grained blocking
 e.g., blocking individual elements of a linked list
 What about our shared counter?
 Only 1 element to access
 A possible solution – combining tree
6
Combining Tree
7
Source: Shang Wang, Taolun Chai and Xiaoming Jia Concurrent Counting using Combining Tree
Combining Tree (Cont.)
 Threads climb tree from leaves towards root,
while combining with other concurrent operations
 Every time operations of 2 threads are combined
in an internal node, one thread wins & other
looses
 Loser – waits at that node until a return value is
delivered to it
 Winner – proceeds towards root carrying sum of all
underlying operations
 Winner that reaches root adds its sum to counter
 Winner descends tree distributing a return value to each
waiting loser
8
Combining Tree (Cont.)
9
Source: http://www.cs.berkeley.edu/~demmel/cs267/lecture14.html
Combining Tree (Cont.)
 Pros
 More parallelism
 Increments are parallel until root is updated
 p leaves take O(log p) to get a value
 Effective speed up O(p / log p )
 Cons
 Parallelism reduces as threads go up the tree
 p is typically predefined
 Require O(p) states/memory
 Losers have to just spin (a.k.a local spinning) to
prevent generation of any unnecessary memory traffic
that may slowdown winner 10
Combining Tree (Cont.)
 Cons (cont.)
 Threads arriving after p need to wait
 Not efficient under low contention
 May lead to deadlocks
 Coordination among ascending winners, losers, & ascending
late threads
11
Nonblocking Techniques
 Delay of a thread doesn’t cause delay of others
 By definition, these algorithms can’t use locks
 Nonblocking progress conditions
1. Wait-freedom
2. Lock-freedom
3. Obstruction-freedom
12
Nonblocking Techniques (Cont.)
 Wait-freedom
 A wait-free operation is guaranteed to complete after
a finite no of its own steps
 Independent of others
 Lock-freedom
 A lock-free operation guarantees that after a finite no
of its own steps, some operation completes
 Not necessarily itself
 Obstruction-freedom
 An obstruction-free operation is guaranteed to
complete within a finite no of its own steps after it
stops encountering interference from other operations
13
Nonblocking Techniques (Cont.)
 Wait-freedom is stronger than Lock-freedom
 Lock-freedom is stronger than Obstruction-
freedom
 Stronger progress conditions are harder to
implement
 Weaker guarantees are generally simpler
 e.g., can compensate for weaker progress conditions by
employing backoff
 Shared counter can’t be implemented to be Wait-
free & Lock-free without hardware support
 e.g., Compare-and-Swap (CaS) 14
Compare-and-Swap (CaS)
 There exists a wait-free implementation for any
concurrent data structure in a system that
supports CaS
15
bool CaS(L, E, N) {
atomically {
if (*L == E) {
*L = N;
return true;
} else
return false;
}
}
Shared Counter Based on CaS
 Implement getAndIncrement() using a CaS
 Solution is only lock-free
 Backoff is needed under heavy contention
 Sequential bottleneck & high contention for a single location
16
public class Counter {
private long value;
public long getAndIncrement() {
temp = value;
while(CaS(value, temp, temp + 1) == false)
temp = value;
return temp;
}
}
Linked Lists, Queues, & Stacks
 Pool of items
 Linked List
 Insert(), delete(), & member()
 Already looked at locking entire linked list, individual
items, & read-write locks
 Queue
 First-in-first-out (FIFO) order
 enq() & deq()
 Stack
 Last-in-first-out (LIFO) order
 push() & pop() 17
Bounded vs. Unbounded
 Bounded
 Fixed capacity
 Good when resources are an issue
 e.g., bounded buffer, queue
 Unbounded
 Holds any number of objects
18
Blocking vs. Non-Blocking
 What to do when
 Removing from an empty pool?
 Adding to a full (bounded) pool?
 Blocking
 Caller waits until state changes
 Non-blocking
 Method throws exception
 e.g., retuning an error message or just dropping
request
19
Queues & Stacks
 We’ll look at
 Bounded, blocking, lock-based queue
 Unbounded, non-blocking, lock-free queue
 Objective
 To enable modification of multiple memory locations
atomically
20
Queue – Concurrency
enq(x) y=deq()
enq() & deq() work at
different ends
tail head
21
Concurrency
enq(x)
What if the queue is
empty or full?
y=deq()
22
Bounded Queue
23
head
tail
First actual item
Bounded Queue – Locking 2 Ends
24
head
tail
deqLock
enqLock
At most 1 deq() & enq()
Need to also tell whether
queue is full or empty
Bounded Queue
25
head
tail
deqLock
enqLock
Permission to enqueue 8 items
Free slots
8
Enqueuer
26
head
tail
deqLock
enqLock
Free slots
8
Enqueue Node
Enqueuer (Cont.)
27
head
tail
deqLock
enqLock
Free slots
8
7
getAndDecrement()
Unsuccesful Enqueuer
28
head
tail
deqLock
enqLock
Free slots
0
Uh-oh
Read permits
Dequeuer
29
head
tail
deqLock
enqLock
Free slots
7
Lock deqLock
Dequeuer
30
head
tail
deqLock
enqLock
Free slots
7
Make first node new
sentinel
8
Unsuccesful Dequeuer
31
head
tail
deqLock
enqLock
Free slots
8
Read sentinel’s next
field
uh-oh
Bounded Queue
32
public class BoundedQueue<T> {
ReentrantLock enqLock, deqLock;
Condition notEmptyCondition, notFullCondition;
AtomicInteger freeSlots;
Node head;
Node tail;
int capacity;
enqLock = new ReentrantLock();
notFullCondition = enqLock.newCondition();
deqLock = new ReentrantLock();
notEmptyCondition = deqLock.newCondition();
}
Enq() & Deq() Methods
 Share no locks
 That’s good
 But do share an atomic counter
 Accessed on every method call
 That’s not so good as sequential bottleneck
 How to alleviate this bottleneck?
33
Solution – Split Counter
 enq() method
 Decrements only
 Cares only if value is zero
 deq() method
 Increments only
 Cares only if value is capacity
 Enqueuer decrements enqSideFreeSlots
 Dequeuer increments deqSideFreeSlots
 Then wakeup other party when only 0 or
capacity is reached
34
Lock-Free Unbounded Queue
 Enqueue
35
head
tail
Enq( )
Logical Enqueue
36
head
tail
CaS
Physical Enqueue
37
head
tail
Enqueue Node
CaS
Enqueue
 These 2 steps aren’t atomic
 Tail field refers to either
 Actual last Node (good)
 Penultimate Node (not so good)
 What do you do if you find a trailing tail?
 Stop & help fix it
 If tail node has non-null next field
 CaS the queue’s tail field to tail.next
38
When CaSs Fail
 During logical enqueue
 Abandon & restart
 Still solution is lock-free
 During physical enqueue
 Ignore it as some other thread has executed in on
behalf of current thread
39
Dequeuer
40
head
tail
Make first Node new
sentinel
CaS
Java Concurrency Utilities
 Package contains a set of classes that makes it
easier to develop concurrent applications
 Lock
 ReadWriteLock
 BlockingQueue
 ArrayBlockingQueue
 DelayQueue
 LinkedBlockingQueue
 PriorityBlockingQueue
 SynchronousQueue
 BlockingDeque 41
BlockingQueue
public class BlockingQueueExample {
public static void main(String[] args) throws Exception {
BlockingQueue queue = new ArrayBlockingQueue(1024);
Producer producer = new Producer(queue);
Consumer consumer = new Consumer(queue);
new Thread(producer).start();
new Thread(consumer).start();
Thread.sleep(4000);
}
} 42
BlockingQueue (Cont.)
public class Producer implements Runnable{
protected BlockingQueue queue = null;
public Producer(BlockingQueue queue) {
this.queue = queue;
}
public void run() {
try {
for (int i = 0; i < 1000; i++){
queue.put(i);
Thread.sleep(1000); }
} catch (InterruptedException e) {
e.printStackTrace();
} } } 43

More Related Content

PPTX
Concurrency in Java
PPTX
Introduction to Concurrent Programming
PDF
Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...
PDF
The Need for Async @ ScalaWorld
PDF
Need for Async: Hot pursuit for scalable applications
PPT
Java concurrency
PDF
Non-blocking Michael-Scott queue algorithm
ODP
Java concurrency
Concurrency in Java
Introduction to Concurrent Programming
Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...
The Need for Async @ ScalaWorld
Need for Async: Hot pursuit for scalable applications
Java concurrency
Non-blocking Michael-Scott queue algorithm
Java concurrency

Similar to Introduction to Concurrent Data Structures (20)

PDF
Non-blocking synchronization — what is it and why we (don't?) need it
PDF
Lock free programming- pro tips
PDF
Lockless
PDF
Understanding the Disruptor
PDF
Simon Peyton Jones: Managing parallelism
PDF
Peyton jones-2011-parallel haskell-the_future
DOCX
Java 5 concurrency
DOC
Concurrency Learning From Jdk Source
KEY
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
KEY
Modern Java Concurrency
PPTX
Mutual Exclusion
PPT
cs2110Concurrency1.ppt
PPTX
Parallel & Concurrent ...
PDF
Concurrency
PDF
Ppl for students unit 4 and 5
PDF
Ppl for students unit 4 and 5
ODP
Java Concurrency, Memory Model, and Trends
ODP
Java Concurrency
PPTX
stacks and queues for public
PDF
Java Collections API
Non-blocking synchronization — what is it and why we (don't?) need it
Lock free programming- pro tips
Lockless
Understanding the Disruptor
Simon Peyton Jones: Managing parallelism
Peyton jones-2011-parallel haskell-the_future
Java 5 concurrency
Concurrency Learning From Jdk Source
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
Modern Java Concurrency
Mutual Exclusion
cs2110Concurrency1.ppt
Parallel & Concurrent ...
Concurrency
Ppl for students unit 4 and 5
Ppl for students unit 4 and 5
Java Concurrency, Memory Model, and Trends
Java Concurrency
stacks and queues for public
Java Collections API
Ad

More from Dilum Bandara (20)

PPTX
Designing for Multiple Blockchains in Industry Ecosystems
PPTX
Introduction to Machine Learning
PPTX
Time Series Analysis and Forecasting in Practice
PPTX
Introduction to Dimension Reduction with PCA
PPTX
Introduction to Descriptive & Predictive Analytics
PPTX
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
PPTX
Introduction to Map-Reduce Programming with Hadoop
PPTX
Embarrassingly/Delightfully Parallel Problems
PPTX
Introduction to Warehouse-Scale Computers
PPTX
Introduction to Thread Level Parallelism
PPTX
CPU Memory Hierarchy and Caching Techniques
PPTX
Data-Level Parallelism in Microprocessors
PDF
Instruction Level Parallelism – Hardware Techniques
PPTX
Instruction Level Parallelism – Compiler Techniques
PPTX
CPU Pipelining and Hazards - An Introduction
PPTX
Advanced Computer Architecture – An Introduction
PPTX
High Performance Networking with Advanced TCP
PPTX
Introduction to Content Delivery Networks
PPTX
Peer-to-Peer Networking Systems and Streaming
PPTX
Mobile Services
Designing for Multiple Blockchains in Industry Ecosystems
Introduction to Machine Learning
Time Series Analysis and Forecasting in Practice
Introduction to Dimension Reduction with PCA
Introduction to Descriptive & Predictive Analytics
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Introduction to Map-Reduce Programming with Hadoop
Embarrassingly/Delightfully Parallel Problems
Introduction to Warehouse-Scale Computers
Introduction to Thread Level Parallelism
CPU Memory Hierarchy and Caching Techniques
Data-Level Parallelism in Microprocessors
Instruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Compiler Techniques
CPU Pipelining and Hazards - An Introduction
Advanced Computer Architecture – An Introduction
High Performance Networking with Advanced TCP
Introduction to Content Delivery Networks
Peer-to-Peer Networking Systems and Streaming
Mobile Services
Ad

Recently uploaded (20)

PPT
Teaching material agriculture food technology
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Tartificialntelligence_presentation.pptx
PPTX
A Presentation on Artificial Intelligence
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
cuic standard and advanced reporting.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Getting Started with Data Integration: FME Form 101
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
Teaching material agriculture food technology
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Tartificialntelligence_presentation.pptx
A Presentation on Artificial Intelligence
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Approach and Philosophy of On baking technology
Reach Out and Touch Someone: Haptics and Empathic Computing
Digital-Transformation-Roadmap-for-Companies.pptx
Electronic commerce courselecture one. Pdf
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Dropbox Q2 2025 Financial Results & Investor Presentation
Per capita expenditure prediction using model stacking based on satellite ima...
Spectral efficient network and resource selection model in 5G networks
cuic standard and advanced reporting.pdf
20250228 LYD VKU AI Blended-Learning.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Getting Started with Data Integration: FME Form 101
Group 1 Presentation -Planning and Decision Making .pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf

Introduction to Concurrent Data Structures

  • 1. Concurrent Data Structures CS5225 Parallel and Concurrent Programming Dilum Bandara [email protected] Some slides adapted from “The Art of Multiprocessor Programming” by Maurice Herlihy & Nir Shavit
  • 2. Motivation  Many/multi-core processors  multiple threads working on shared data structures  Shared-memory multiprocessors  Threads communicate & synchronize through data structures  Locks, semaphores, mutexex, monitors satisfy Safety & Liveness properties  But they lack performance  Make everything serial 2
  • 3. Example – getAndIncrement()  From prime number example with a shared counter 3 public class Counter { private long value; public long getAndIncrement() { return value++; } } temp = value; value = value + 1; return temp;
  • 4. Example – Solution 4 public class Counter { private long value; public long getAndIncrement() { Lock() temp = value; value = temp + 1; Unlock() return temp; } } Synchronized block Mutual exclusion
  • 5. Performance Problems With Locking  Amdahl's law  Code between Lock() & Unlock() is sequential  Even a small sequential fraction reduces performance  Memory contention  Every processor need to access lock variable(s)  Low performance on cache-coherent multiprocessors  Blocking  Thread holding lock, delay others  e.g., blocking access to getAndIncrement() 5
  • 6. Blocking Techniques  Coarse grained blocking  e.g., blocking an entire linked list  Fine grained blocking  e.g., blocking individual elements of a linked list  What about our shared counter?  Only 1 element to access  A possible solution – combining tree 6
  • 7. Combining Tree 7 Source: Shang Wang, Taolun Chai and Xiaoming Jia Concurrent Counting using Combining Tree
  • 8. Combining Tree (Cont.)  Threads climb tree from leaves towards root, while combining with other concurrent operations  Every time operations of 2 threads are combined in an internal node, one thread wins & other looses  Loser – waits at that node until a return value is delivered to it  Winner – proceeds towards root carrying sum of all underlying operations  Winner that reaches root adds its sum to counter  Winner descends tree distributing a return value to each waiting loser 8
  • 9. Combining Tree (Cont.) 9 Source: http://www.cs.berkeley.edu/~demmel/cs267/lecture14.html
  • 10. Combining Tree (Cont.)  Pros  More parallelism  Increments are parallel until root is updated  p leaves take O(log p) to get a value  Effective speed up O(p / log p )  Cons  Parallelism reduces as threads go up the tree  p is typically predefined  Require O(p) states/memory  Losers have to just spin (a.k.a local spinning) to prevent generation of any unnecessary memory traffic that may slowdown winner 10
  • 11. Combining Tree (Cont.)  Cons (cont.)  Threads arriving after p need to wait  Not efficient under low contention  May lead to deadlocks  Coordination among ascending winners, losers, & ascending late threads 11
  • 12. Nonblocking Techniques  Delay of a thread doesn’t cause delay of others  By definition, these algorithms can’t use locks  Nonblocking progress conditions 1. Wait-freedom 2. Lock-freedom 3. Obstruction-freedom 12
  • 13. Nonblocking Techniques (Cont.)  Wait-freedom  A wait-free operation is guaranteed to complete after a finite no of its own steps  Independent of others  Lock-freedom  A lock-free operation guarantees that after a finite no of its own steps, some operation completes  Not necessarily itself  Obstruction-freedom  An obstruction-free operation is guaranteed to complete within a finite no of its own steps after it stops encountering interference from other operations 13
  • 14. Nonblocking Techniques (Cont.)  Wait-freedom is stronger than Lock-freedom  Lock-freedom is stronger than Obstruction- freedom  Stronger progress conditions are harder to implement  Weaker guarantees are generally simpler  e.g., can compensate for weaker progress conditions by employing backoff  Shared counter can’t be implemented to be Wait- free & Lock-free without hardware support  e.g., Compare-and-Swap (CaS) 14
  • 15. Compare-and-Swap (CaS)  There exists a wait-free implementation for any concurrent data structure in a system that supports CaS 15 bool CaS(L, E, N) { atomically { if (*L == E) { *L = N; return true; } else return false; } }
  • 16. Shared Counter Based on CaS  Implement getAndIncrement() using a CaS  Solution is only lock-free  Backoff is needed under heavy contention  Sequential bottleneck & high contention for a single location 16 public class Counter { private long value; public long getAndIncrement() { temp = value; while(CaS(value, temp, temp + 1) == false) temp = value; return temp; } }
  • 17. Linked Lists, Queues, & Stacks  Pool of items  Linked List  Insert(), delete(), & member()  Already looked at locking entire linked list, individual items, & read-write locks  Queue  First-in-first-out (FIFO) order  enq() & deq()  Stack  Last-in-first-out (LIFO) order  push() & pop() 17
  • 18. Bounded vs. Unbounded  Bounded  Fixed capacity  Good when resources are an issue  e.g., bounded buffer, queue  Unbounded  Holds any number of objects 18
  • 19. Blocking vs. Non-Blocking  What to do when  Removing from an empty pool?  Adding to a full (bounded) pool?  Blocking  Caller waits until state changes  Non-blocking  Method throws exception  e.g., retuning an error message or just dropping request 19
  • 20. Queues & Stacks  We’ll look at  Bounded, blocking, lock-based queue  Unbounded, non-blocking, lock-free queue  Objective  To enable modification of multiple memory locations atomically 20
  • 21. Queue – Concurrency enq(x) y=deq() enq() & deq() work at different ends tail head 21
  • 22. Concurrency enq(x) What if the queue is empty or full? y=deq() 22
  • 24. Bounded Queue – Locking 2 Ends 24 head tail deqLock enqLock At most 1 deq() & enq() Need to also tell whether queue is full or empty
  • 32. Bounded Queue 32 public class BoundedQueue<T> { ReentrantLock enqLock, deqLock; Condition notEmptyCondition, notFullCondition; AtomicInteger freeSlots; Node head; Node tail; int capacity; enqLock = new ReentrantLock(); notFullCondition = enqLock.newCondition(); deqLock = new ReentrantLock(); notEmptyCondition = deqLock.newCondition(); }
  • 33. Enq() & Deq() Methods  Share no locks  That’s good  But do share an atomic counter  Accessed on every method call  That’s not so good as sequential bottleneck  How to alleviate this bottleneck? 33
  • 34. Solution – Split Counter  enq() method  Decrements only  Cares only if value is zero  deq() method  Increments only  Cares only if value is capacity  Enqueuer decrements enqSideFreeSlots  Dequeuer increments deqSideFreeSlots  Then wakeup other party when only 0 or capacity is reached 34
  • 35. Lock-Free Unbounded Queue  Enqueue 35 head tail Enq( )
  • 38. Enqueue  These 2 steps aren’t atomic  Tail field refers to either  Actual last Node (good)  Penultimate Node (not so good)  What do you do if you find a trailing tail?  Stop & help fix it  If tail node has non-null next field  CaS the queue’s tail field to tail.next 38
  • 39. When CaSs Fail  During logical enqueue  Abandon & restart  Still solution is lock-free  During physical enqueue  Ignore it as some other thread has executed in on behalf of current thread 39
  • 41. Java Concurrency Utilities  Package contains a set of classes that makes it easier to develop concurrent applications  Lock  ReadWriteLock  BlockingQueue  ArrayBlockingQueue  DelayQueue  LinkedBlockingQueue  PriorityBlockingQueue  SynchronousQueue  BlockingDeque 41
  • 42. BlockingQueue public class BlockingQueueExample { public static void main(String[] args) throws Exception { BlockingQueue queue = new ArrayBlockingQueue(1024); Producer producer = new Producer(queue); Consumer consumer = new Consumer(queue); new Thread(producer).start(); new Thread(consumer).start(); Thread.sleep(4000); } } 42
  • 43. BlockingQueue (Cont.) public class Producer implements Runnable{ protected BlockingQueue queue = null; public Producer(BlockingQueue queue) { this.queue = queue; } public void run() { try { for (int i = 0; i < 1000; i++){ queue.put(i); Thread.sleep(1000); } } catch (InterruptedException e) { e.printStackTrace(); } } } 43