SlideShare a Scribd company logo
Parallelising Dynamic Programming
Raphael Reitzig
University of Kaiserslautern
Department of Computer Science
Algorithms and Complexity Group
September 27th, 2012
Vision
Compile dynamic programming recurrences into efficient parallel
code.
Goal 1
Understand what efficiency means in parallel algorithms.
Goal 1
Understand what efficiency means in parallel algorithms.
Goal 2
Characterise dynamic programming recurrences in a suitable way.
Goal 1
Understand what efficiency means in parallel algorithms.
Goal 2
Characterise dynamic programming recurrences in a suitable way.
Goal 3
Find and implement efficient parallel algorithms for DP.
Analysing Parallelism
Complexity theory
Classifies problems
Complexity theory
Classifies problems
Focuses on inherent parallelism
Complexity theory
Classifies problems
Focuses on inherent parallelism
Answers: How many processors do you need to be really fast
on inputs of a given size?
Complexity theory
Classifies problems
Focuses on inherent parallelism
Answers: How many processors do you need to be really fast
on inputs of a given size?
But...
...p grows with n – no statement about constant p and growing n!
Amdahl’s law
Parallel speedup ≤ 1
1−γ+γ
p
.
Amdahl’s law
Parallel speedup ≤ 1
1−γ+γ
p
.
Answers: How many processors can you utilise on given inputs?
Amdahl’s law
Parallel speedup ≤ 1
1−γ+γ
p
.
Answers: How many processors can you utilise on given inputs?
But...
...does not capture growth of n!
Work and depth
Work W = TA
1 and depth D = TA
∞
Work and depth
Work W = TA
1 and depth D = TA
∞
Brent’s Law: A with W
p ≤ TA
p < W
p + D is possible in a certain
setting.
Work and depth
Work W = TA
1 and depth D = TA
∞
Brent’s Law: A with W
p ≤ TA
p < W
p + D is possible in a certain
setting.
But...
...has limited applicability and D can be slippery!
Relative runtimes
Speedup SA
p :=
TA
1
TA
p
Relative runtimes
Speedup SA
p :=
TA
1
TA
p
Efficiency EA
p := TB
p·TA
p
Relative runtimes
Speedup SA
p :=
TA
1
TA
p
Efficiency EA
p := TB
p·TA
p
But...
...what are good values?
Relative runtimes
Speedup SA
p :=
TA
1
TA
p
Efficiency EA
p := TB
p·TA
p
But...
...what are good values?
Clear: SA
p ∈ [0, p] and EA
p ∈ [0, 1]
Relative runtimes
Speedup SA
p :=
TA
1
TA
p
Efficiency EA
p := TB
p·TA
p
But...
...what are good values?
Clear: SA
p ∈ [0, p] and EA
p ∈ [0, 1] – but we can certainly not always
hit the optima!
Proposal: Asymptotic relative runtimes
Definition
SA
p(∞) := lim inf
n→∞
SA
p(n)
?
= p
EA
p (∞) := lim inf
n→∞
EA
p (n)
?
= 1
Proposal: Asymptotic relative runtimes
Definition
SA
p(∞) := lim inf
n→∞
SA
p(n)
?
= p
EA
p (∞) := lim inf
n→∞
EA
p (n)
?
= 1
Goal
Find parallel algorithms that are asymptotically as scalable and
efficient as possible for all p.
Disclaimer
This means:
A good parallel algorithm can utilise any number of processors if
the inputs are large enough.
Disclaimer
This means:
A good parallel algorithm can utilise any number of processors if
the inputs are large enough.
Not:
More processors are always better.
Disclaimer
This means:
A good parallel algorithm can utilise any number of processors if
the inputs are large enough.
Not:
More processors are always better.
Just as in sequential algorithmics.
Afterthoughts
Machine model
Keep it simple: (P)RAM with p processors and spawn/join.
Afterthoughts
Machine model
Keep it simple: (P)RAM with p processors and spawn/join.
Which quantities to analyse?
Elementary operations, memory accesses, inter-thread
communication, ...
Afterthoughts
Machine model
Keep it simple: (P)RAM with p processors and spawn/join.
Which quantities to analyse?
Elementary operations, memory accesses, inter-thread
communication, ...
Implicit interaction – blocking, communication via memory, ... – is
invisible in code!
Attacking Dynamic Programming
Disclaimer
Only two dimensions
Only finite domains
Only rectangular domains
Memoisation-table point-of-view
Reducing to dependencies
e(i, j) :=



0 i = j = 0
j i = 0 ∧ j > 0
i i > 0 ∧ j = 0
min



e(i − 1, j) + 1
e(i, j − 1) + 1
e(i − 1, j − 1) + [ vi = wj ]
else
Reducing to dependencies
e(i, j) :=



0 i = j = 0
j i = 0 ∧ j > 0
i i > 0 ∧ j = 0
min



e(i − 1, j) + 1
e(i, j − 1) + 1
e(i − 1, j − 1) + [ vi = wj ]
else
Gold standard
Parallelising Dynamic Programming
Parallelising Dynamic Programming
?
?
?
Parallelising Dynamic Programming
Parallelising Dynamic Programming
Parallelising Dynamic Programming
?
Parallelising Dynamic Programming
Parallelising Dynamic Programming
Parallelising Dynamic Programming
Parallelising Dynamic Programming
Simplification
DL D DR
UL U UR
L R
Three cases
Impossible
Three cases
Impossible
Possible
Three cases
Assuming dependencies are area-complete and uniform, there are
only three cases up to symmetry:
Facing Reality
Challenges
Contention
Challenges
Contention
Method of synchronisation
Challenges
Contention
Method of synchronisation
Metal issues (moving threads, cache sync)
Performance Examples
Edit distance on two-core shared memory machine:
0 0.2 0.4 0.6 0.8 1 1.2 1.4
·105
0
0.5
1
1.5
2
2.5
0 0.2 0.4 0.6 0.8 1 1.2 1.4
·105
0
0.5
1
1.5
2
2.5
Performance Examples
Edit distance on four-core NUMA machine:
0 1 2 3 4
·105
0
1
2
3
4
0 1 2 3 4
·105
0
1
2
3
4
Performance Examples
Pseudo-Bellman-Ford on two-core shared memory machine:
0 0.2 0.4 0.6 0.8 1 1.2 1.4
·105
0
0.5
1
1.5
2
2.5
0 0.2 0.4 0.6 0.8 1 1.2 1.4
·105
0
1
2
3
4
Performance Examples
Pseudo-Bellman-Ford on four-core NUMA machine:
0 1 2 3 4
·105
0
1
2
3
4
0 1 2 3 4
·105
0
2
4
6
8
Future Work
Fill gaps in theory (caching and communication).
Future Work
Fill gaps in theory (caching and communication).
Generalise theory to more dimensions and interleaved DPs.
Future Work
Fill gaps in theory (caching and communication).
Generalise theory to more dimensions and interleaved DPs.
Improve and extend implementations.
Future Work
Fill gaps in theory (caching and communication).
Generalise theory to more dimensions and interleaved DPs.
Improve and extend implementations.
More experiments (different problems, more diverse machines).
Future Work
Fill gaps in theory (caching and communication).
Generalise theory to more dimensions and interleaved DPs.
Improve and extend implementations.
More experiments (different problems, more diverse machines).
Improve compiler integration (detection, backtracing, result
functions).
Future Work
Fill gaps in theory (caching and communication).
Generalise theory to more dimensions and interleaved DPs.
Improve and extend implementations.
More experiments (different problems, more diverse machines).
Improve compiler integration (detection, backtracing, result
functions).
Integrate with other tools.

More Related Content

PDF
14 - 08 Feb - Dynamic Programming
PPT
Parallel algorithms
PDF
Writing a SAT solver as a hobby project
PDF
Introduction to Max-SAT and Max-SAT Evaluation
PPTX
Computability, turing machines and lambda calculus
PPTX
NP completeness
PDF
Bron Kerbosch Algorithm - Presentation by Jun Zhai, Tianhang Qiang and Yizhen...
DOC
Algorithms Question bank
14 - 08 Feb - Dynamic Programming
Parallel algorithms
Writing a SAT solver as a hobby project
Introduction to Max-SAT and Max-SAT Evaluation
Computability, turing machines and lambda calculus
NP completeness
Bron Kerbosch Algorithm - Presentation by Jun Zhai, Tianhang Qiang and Yizhen...
Algorithms Question bank

What's hot (20)

PDF
P, NP, NP-Complete, and NP-Hard
PPTX
Automatski - NP-Complete - TSP - Travelling Salesman Problem Solved in O(N^4)
PPT
Complexity of Algorithm
PPTX
Computability - Tractable, Intractable and Non-computable Function
RTF
Design and Analysis of algorithms
PPTX
Asymptotic Notations
PPTX
Sample presentation slides
PDF
Analysis of Algorithms II - PS5
PPTX
Theory of Automata and formal languages Unit 5
PDF
Introduction to Algorithms Complexity Analysis
PDF
An Efficient and Parallel Abstract Interpreter in Scala — First Algorithm
PDF
Analysis of Algorithms II - PS2
PPT
Time andspacecomplexity
PDF
Daa notes 3
PPT
Basic terminologies & asymptotic notations
PPTX
Big o notation
PPTX
Church Turing Thesis
PPTX
Travelling salesman problem
PPTX
Complexity analysis in Algorithms
P, NP, NP-Complete, and NP-Hard
Automatski - NP-Complete - TSP - Travelling Salesman Problem Solved in O(N^4)
Complexity of Algorithm
Computability - Tractable, Intractable and Non-computable Function
Design and Analysis of algorithms
Asymptotic Notations
Sample presentation slides
Analysis of Algorithms II - PS5
Theory of Automata and formal languages Unit 5
Introduction to Algorithms Complexity Analysis
An Efficient and Parallel Abstract Interpreter in Scala — First Algorithm
Analysis of Algorithms II - PS2
Time andspacecomplexity
Daa notes 3
Basic terminologies & asymptotic notations
Big o notation
Church Turing Thesis
Travelling salesman problem
Complexity analysis in Algorithms
Ad

Viewers also liked (20)

PPT
08 si(systems analysis and design )
PPT
IGARSS2011(OkiKazuo110726).ppt
PPT
WE1.L10 - USE OF NASA DATA IN THE JOINT CENTER FOR SATELLITE DATA ASSIMILATION
PPTX
Using Satellite Imagery to Measure Pasture Production
PPT
13 si(systems analysis and design )
PDF
Stochastic Integer Programming. An Algorithmic Perspective
PDF
Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Progra...
PPTX
Helen Chedzey_The derivation and application of the MODIS 19-band reflectance...
PPT
Spce technologies for disaster in thailand
PDF
Website securitysystems
PPTX
VALIDATING SATELLITE LAND SURFACE TEMPERATURE PRODUCTS FOR GOES-R AND JPSS MI...
PDF
Boston university; operations research presentation; 2013
PDF
FR3TO5.1.pdf
PDF
03. dynamic programming
ODP
How does a Global Navigation Satellite know where it is to tell you where you...
PPT
DustDection_liuyang_Final.ppt
PPT
15 si(systems analysis and design )
PPTX
Presentation
PPT
12 si(systems analysis and design )
PPT
100528 satellite obs_china_husar
08 si(systems analysis and design )
IGARSS2011(OkiKazuo110726).ppt
WE1.L10 - USE OF NASA DATA IN THE JOINT CENTER FOR SATELLITE DATA ASSIMILATION
Using Satellite Imagery to Measure Pasture Production
13 si(systems analysis and design )
Stochastic Integer Programming. An Algorithmic Perspective
Biosight: Quantitative Methods for Policy Analysis: Stochastic Dynamic Progra...
Helen Chedzey_The derivation and application of the MODIS 19-band reflectance...
Spce technologies for disaster in thailand
Website securitysystems
VALIDATING SATELLITE LAND SURFACE TEMPERATURE PRODUCTS FOR GOES-R AND JPSS MI...
Boston university; operations research presentation; 2013
FR3TO5.1.pdf
03. dynamic programming
How does a Global Navigation Satellite know where it is to tell you where you...
DustDection_liuyang_Final.ppt
15 si(systems analysis and design )
Presentation
12 si(systems analysis and design )
100528 satellite obs_china_husar
Ad

Similar to Parallelising Dynamic Programming (20)

PDF
comp422-534-2020-Lecture2-ConcurrencyDecomposition.pdf
PDF
Lecture 4 principles of parallel algorithm design updated
PDF
Parallel Algorithms
PPT
Lecture1
PPT
SecondPresentationDesigning_Parallel_Programs.ppt
PPTX
Complier design
PDF
Parallel Algorithms
PPT
Unit-3.ppt
PPT
Parallel algorithms
PPT
Parallel algorithms
PPT
Parallel algorithms
PPTX
In-class slides with activities
PDF
Unit- 2_my1.pdf jbvjwe vbeijv dv d d d kjd k
PPTX
Algorithm analysis.pptx
PPT
Task and Data Parallelism
PPTX
Compiler design
PDF
Gk3611601162
PPT
Chap5 slides
PDF
comp422-534-2020-Lecture3-ConcurrencyMapping.pdf
PPTX
Parallel algorithms
comp422-534-2020-Lecture2-ConcurrencyDecomposition.pdf
Lecture 4 principles of parallel algorithm design updated
Parallel Algorithms
Lecture1
SecondPresentationDesigning_Parallel_Programs.ppt
Complier design
Parallel Algorithms
Unit-3.ppt
Parallel algorithms
Parallel algorithms
Parallel algorithms
In-class slides with activities
Unit- 2_my1.pdf jbvjwe vbeijv dv d d d kjd k
Algorithm analysis.pptx
Task and Data Parallelism
Compiler design
Gk3611601162
Chap5 slides
comp422-534-2020-Lecture3-ConcurrencyMapping.pdf
Parallel algorithms

Recently uploaded (20)

PDF
Sciences of Europe No 170 (2025)
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PPTX
Microbiology with diagram medical studies .pptx
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
BIOMOLECULES PPT........................
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
famous lake in india and its disturibution and importance
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
neck nodes and dissection types and lymph nodes levels
PDF
. Radiology Case Scenariosssssssssssssss
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
The scientific heritage No 166 (166) (2025)
Sciences of Europe No 170 (2025)
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
bbec55_b34400a7914c42429908233dbd381773.pdf
Microbiology with diagram medical studies .pptx
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
BIOMOLECULES PPT........................
Biophysics 2.pdffffffffffffffffffffffffff
2. Earth - The Living Planet Module 2ELS
famous lake in india and its disturibution and importance
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
The KM-GBF monitoring framework – status & key messages.pptx
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
neck nodes and dissection types and lymph nodes levels
. Radiology Case Scenariosssssssssssssss
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
The scientific heritage No 166 (166) (2025)

Parallelising Dynamic Programming