SlideShare a Scribd company logo
Memory Models

Dr. C.V. Suresh Babu
Memory Consistency Models
Parallelism for the masses!
Shared-memory most common
Memory model = Legal values for reads
Memory Consistency Models
Parallelism for the masses!
Shared-memory most common
Memory model = Legal values for reads
Memory Consistency Models
Parallelism for the masses!
Shared-memory most common
Memory model = Legal values for reads
Memory Consistency Models
Parallelism for the masses!
Shared-memory most common
Memory model = Legal values for reads
Memory Consistency Models
Parallelism for the masses!
Shared-memory most common
Memory model = Legal values for reads
20 Years of Memory Models
• Memory model is at the heart of concurrency semantics
– 20 year journey from confusion to convergence at last!
– Hard lessons learned
– Implications for future

• Current way to specify concurrency semantics is too hard
– Fundamentally broken

• Must rethink parallel languages and hardware
• Implications for broader CS disciplines
What is a Memory Model?
• Memory model defines what values a read can return
Initially A=B=C=Flag=0
Thread 1
Thread 2
A = 26
while (Flag != 1) {;}
90
B = 90
r1 = B
26 0
…
r2 = A
Flag = 1
…
Memory Model is Key to Concurrency Semantics
• Interface between program and transformers of program
– Defines what values a read can return

Compiler

Assembly

C++ program

Dynamic
optimizer

Hardware

• Weakest system component exposed to the programmer
– Language level model has implications for hardware
• Interface must last beyond trends
Desirable Properties of a Memory Model
• 3 Ps
– Programmability
– Performance
– Portability

• Challenge: hard to satisfy all 3 Ps
– Late 1980’s - 90’s: Largely driven by hardware
• Lots of models, little consensus
– 2000 onwards: Largely driven by languages/compilers
• Consensus model for Java, C++ (C, others ongoing)
• Had to deal with mismatches in hardware models

Path to convergence has lessons for future
Programmability – SC [Lamport79]
• Programmability: Sequential consistency (SC) most intuitive
– Operations of a single thread in program order
– All operations in a total order or atomic

• But Performance?
– Recent (complex) hardware techniques boost performance with SC
– But compiler transformations still inhibited

• But Portability?
– Almost all h/w, compilers violate SC today

⇒SC not practical, but…
Next Best Thing – SC Almost Always
• Parallel programming too hard even with SC
– Programmers (want to) write well structured code
– Explicit synchronization, no data races
Thread 1

Thread 2

Lock(L)

Lock(L)

Read Data1

Read Data2

Write Data2

Write Data1

…

…

Unlock(L)

Unlock(L)

– SC for such programs much easier: can reorder data accesses

⇒ Data-race-free model [AdveHill90]
– SC for data-race-free programs
– No guarantees for programs with data races
Definition of a Data Race
• Distinguish between data and non-data (synchronization) accesses
• Only need to define for SC executions ⇒ total order
• Two memory accesses form a race if
– From different threads, to same location, at least one is a write
– Occur one after another
Thread 1
Write, A, 26
Write, B, 90

Thread 2

Read, Flag, 0
Write, Flag, 1
Read, Flag, 1
Read, B, 90
Read, A, 26

• A race with a data access is a data race
• Data-race-free-program = No data race in any SC execution
Data-Race-Free Model
Data-race-free model = SC for data-race-free programs
– Does not preclude races for wait-free constructs, etc.
• Requires races be explicitly identified as synchronization
• E.g., use volatile variables in Java, atomics in C++

– Dekker’s algorithm
Initially Flag1 = Flag2 = 0
volatile Flag1, Flag2
Thread1
Flag1 = 1
if Flag2 == 0
//critical section

Thread2
Flag2 = 1
if Flag1 == 0
//critical section

SC prohibits both loads returning 0
Data-Race-Free Approach
• Programmer’s model: SC for data-race-free programs
• Programmability
– Simplicity of SC, for data-race-free programs

• Performance
– Specifies minimal constraints (for SC-centric view)

• Portability
– Language must provide way to identify races
– Hardware must provide way to preserve ordering on races
– Compiler must translate correctly
1990's in Practice (The Memory Models Mess)
• Hardware
– Implementation/performance-centric view

LD

LD

LD

– Various ordering guarantees + fences to impose other orders
– Many ambiguities - due to complexity, by design(?), …

• High-level languages
– Most shared-memory programming with Pthreads, OpenMP
• Incomplete, ambiguous model specs
• Memory model property of language, not library [Boehm05]
– Java – commercially successful language with threads
• Chapter 17 of Java language spec on memory model
• But hard to interpret, badly broken

ST

Fence

– Different vendors had different models – most non-SC
• Alpha, Sun, x86, Itanium, IBM, AMD, HP, Cray, …

ST

ST

LD

ST
2000 – 2004: Java Memory Model
• ~ 2000: Bill Pugh publicized fatal flaws in Java model
• Lobbied Sun to form expert group to revise Java model
• Open process via mailing list
–
–
–
–

Diverse participants
Took 5 years of intense, spirited debates
Many competing models
Final consensus model approved in 2005 for Java 5.0
[MansonPughAdve POPL 2005]
Java Memory Model Highlights
• Quick agreement that SC for data-race-free was required
• Missing piece: Semantics for programs with data races
– Java cannot have undefined semantics for ANY program
– Must ensure safety/security guarantees
– Limit damage from data races in untrusted code

• Goal: Satisfy security/safety, w/ maximum system flexibility
– Problem: “safety/security, limited damage” w/ threads very vague
Java Memory Model Highlights
Initially X=Y=0
Thread 1
r1 = X
Y = r1

Thread 2

r2 = Y
X = r2
Is r1=r2=42 allowed?

Data races produce causality loop!
Definition of a causality loop was surprisingly hard
Common compiler optimizations seem to violate“causality”
Java Memory Model Highlights
• Final model based on consensus, but complex
– Programmers can (must) use “SC for data-race-free”
– But system designers must deal with complexity
– Correctness tools, racy programs, debuggers, …??
– Recent discovery of bugs [SevcikAspinall08]
2005 - :C++, Microsoft Prism, Multicore
• ~ 2005: Hans Boehm initiated C++ concurrency model
– Prior status: no threads in C++, most concurrency w/ Pthreads

• Microsoft concurrently started its own internal effort
• C++ easier than Java because it is unsafe
– Data-race-free is plausible model

• BUT multicore ⇒ New h/w optimizations, more scrutiny
– Mismatched h/w, programming views became painfully obvious
– Debate that SC for data-race-free inefficient w/ hardware models
C++ Challenges
2006: Pressure to change Java/C++ to remove SC baseline
To accommodate some hardware vendors

• But what is alternative?
– Must allow some hardware optimizations
– But must be teachable to undergrads

• Showed such an alternative (probably) does not exist
C++ Compromise
• Default C++ model is data-race-free
• AMD, Intel, … on board
• But
– Some systems need expensive fence for SC
– Some programmers really want more flexibility
• C++ specifies low-level atomics only for experts
• Complicates spec, but only for experts
• We are not advertising this part
– [BoehmAdve PLDI 2008]
Summary of Current Status
• Convergence to “SC for data-race-free” as baseline
• For programs with data races
– Minimal but complex semantics for safe languages
– No semantics for unsafe languages
Lessons Learned
• Specifying semantics for programs with data races is HARD
– But “no semantics for data races” also has problems
• Not an option for safe languages
• Debugging, correctness checking tools

• Hardware-software mismatch for some code
– “Simple” optimizations have unintended consequences

⇒State-of-the-art is fundamentally broken
Lessons Learned
• Specifying semantics for programs with data races is HARD
– But “no semantics for data races” also has problems
• Not an option for safe languages
• Debugging, correctness checking tools

Banish mismatch for some code
• Hardware-softwareshared-memory?
– “Simple” optimizations have unintended consequences

⇒State-of-the-art is fundamentally broken
Lessons Learned
• Specifying semantics for programs with data races is HARD
– But “no semantics for data races” also has problems
• Not an option for safe languages
• Debugging, correctness checking tools

Banish wild shared-memory!
• Hardware-software mismatch for some code
– “Simple” optimizations have unintended consequences

⇒State-of-the-art is fundamentally broken
• We need
– Higher-level disciplined models that enforce discipline
– Hardware co-designed with high-level models
Research Agenda for Languages
• Disciplined shared-memory models
– Simple
– Enforceable
– Expressive
– Performance
Key: What discipline?
How to enforce it?
Data-Race-Free
• A near-term discipline: Data-race-free
• Enforcement
– Ideally, language prohibits by design
– e.g., ownership types [Boyapati+02]

– Else, runtime catches as exception
e.g., Goldilocks [Elmas+07]

• But work still needed for expressivity and/or performance
• But data-race-free still not sufficiently high level
Deterministic-by-Default Parallel Programming
• Even data-race-free parallel programs are too hard
– Multiple interleavings due to unordered synchronization (or races)
– Makes reasoning and testing hard

• But many algorithms are deterministic
– Fixed input gives fixed output
– Standard model for sequential programs
– Also holds for many transformative parallel programs
• Parallelism not part of problem specification, only for performance

Why write such an algorithm in non-deterministic style,
then struggle to understand and control its behavior?
Deterministic-by-Default Model
• Parallel programs should be deterministic-by-default
– Sequential semantics (easier than SC!)

• If non-determinism is needed
– should be explicitly requested, encapsulated
– should not interfere with guarantees for the rest of the program

• Enforcement:
– Ideally, language prohibits by design
– Else, runtime catches violations as exceptions
State-of-the-art
• Many deterministic languages today
– Functional, pure data parallel, some domain-specific, …
– Much recent work on runtime, library-based approaches
– E.g., Allen09, Divietti09, Olszewski09, …

• Our work: Language approach for modern O-O methods
– Deterministic Parallel Java (DPJ) [V. Adve et al.]
Deterministic Parallel Java (DPJ)
• Object-oriented type and effect system
–
–
–
–

Aliasing information: partition the heap into “regions”
Effect specifications: regions read or written by each method
Language guarantees determinism through type checking
Side benefit: regions, effects are valuable documentation

• Implemented as extension to base Java type system
–
–
–
–

Initial evaluation for expressivity, performance [Bocchino+09]
Semi-automatic tool for region annotations [Vakilian+09]
Recent work on encapsulating frameworks and unchecked code
Ongoing work on integrating non-determinism
Implications for Hardware
• Current hardware not matched even to current model
• Near term: ISA changes, speculation
• Long term: Co-design hardware with new software models
– Use disciplined software to make more efficient hardware
– Use hardware to support disciplined software
Illinois DeNovo Project
• How to design hardware from the ground up to
– Exploit disciplined parallelism
• for better performance, power, …

– Support disciplined parallelism
• for better dependability

• Working with DPJ to exploit region, effect information
– Software-assisted coherence, communication, scheduling
– New hardware/software interface
– Opportune time as we determine how to scale multicore
Conclusions
• Current way to specify concurrency semantics fundamentally broken
– Best we can do is SC for data-race-free
• But cannot hide from programs with data races

– Mismatched hardware-software
• Simple optimizations give unintended consequences

• Need
– High-level disciplined models that enforce discipline
– Hardware co-designed with high-level models
• E.g., DPJ, DeNovo

– Implications for many CS communities

More Related Content

PPTX
Delegates and events in C#
PPTX
Object Oriented Programming Using C++
PPTX
Preprocessor directives in c language
PDF
Methods in Java
PPTX
Exception handling c++
PPTX
Inheritance in c++
PPT
Java Streams
PPSX
Php and MySQL
Delegates and events in C#
Object Oriented Programming Using C++
Preprocessor directives in c language
Methods in Java
Exception handling c++
Inheritance in c++
Java Streams
Php and MySQL

What's hot (20)

PDF
JavaScript - Chapter 12 - Document Object Model
PDF
Polymorphism
PPTX
class and objects
PPT
Generics in java
PPTX
Control structures in c++
PPTX
Polymorphism In c++
PPTX
OOPS Basics With Example
PPTX
Relational algebra in DBMS
PPTX
Polymorphism
PPTX
Dynamic HTML (DHTML)
PPTX
DATABASE CONSTRAINTS
PPT
Method overriding
PPT
Structure of C++ - R.D.Sivakumar
PDF
Python programming : Classes objects
PDF
Web technology practical list
PPT
Assembly Language Lecture 1
PPTX
Introduction TO Finite Automata
PPTX
Inheritance in c++
PPTX
Loops in C
PPT
Inheritance C#
JavaScript - Chapter 12 - Document Object Model
Polymorphism
class and objects
Generics in java
Control structures in c++
Polymorphism In c++
OOPS Basics With Example
Relational algebra in DBMS
Polymorphism
Dynamic HTML (DHTML)
DATABASE CONSTRAINTS
Method overriding
Structure of C++ - R.D.Sivakumar
Python programming : Classes objects
Web technology practical list
Assembly Language Lecture 1
Introduction TO Finite Automata
Inheritance in c++
Loops in C
Inheritance C#
Ad

Viewers also liked (20)

PDF
Working memory model
PPT
Tulving episodic semantic
PPTX
Human Memory (Psychology)
PPTX
Psychology- Memory
PPT
Attention and Consciousness
PPTX
The Feature-Integration of Attention_Jing
PPTX
HSA Memory Model Hot Chips 2013
PPTX
Memory and Models of Memory
PDF
Java Memory Model
PPTX
Chapter 4 Psych 1 Online Stud
PPT
Computer Architecture: A quantitative approach - Cap4 - Section 6
PPT
Dynamic data race detection in concurrent Java programs
PPTX
INSTRUCTION LEVEL PARALLALISM
PDF
Instruction Level Parallelism (ILP) Limitations
PPTX
Attention
PPTX
Forgetting and theories of forgetting
PDF
Pipelining and ILP (Instruction Level Parallelism)
PPTX
Sensory memory
PPTX
Memory
PPTX
Memory
Working memory model
Tulving episodic semantic
Human Memory (Psychology)
Psychology- Memory
Attention and Consciousness
The Feature-Integration of Attention_Jing
HSA Memory Model Hot Chips 2013
Memory and Models of Memory
Java Memory Model
Chapter 4 Psych 1 Online Stud
Computer Architecture: A quantitative approach - Cap4 - Section 6
Dynamic data race detection in concurrent Java programs
INSTRUCTION LEVEL PARALLALISM
Instruction Level Parallelism (ILP) Limitations
Attention
Forgetting and theories of forgetting
Pipelining and ILP (Instruction Level Parallelism)
Sensory memory
Memory
Memory
Ad

Similar to Memory models (20)

PDF
Nondeterminism is unavoidable, but data races are pure evil
PDF
Programming Language Memory Models: What do Shared Variables Mean?
PPTX
Memory model
PDF
HSA-4123, HSA Memory Model, by Ben Gaster
PPTX
ISCA final presentation - Memory Model
PPTX
Parallel Programming Models: Shared variable model
ODP
C11/C++11 Memory model. What is it, and why?
PDF
20150207 howes-gpgpu8-dark secrets
PDF
Design of Parallel and HPC, Lecture: Memory Models
PDF
Can we efficiently verify concurrent programs under relaxed memory models in ...
PPTX
Memory model
PDF
HIS 2017 Mark Batty-Industrial concurrency specification for C/C++
PPTX
Seminar on Parallel and Concurrent Programming
PDF
Introduction to multicore .ppt
PDF
Simon Peyton Jones: Managing parallelism
PDF
Peyton jones-2011-parallel haskell-the_future
PDF
Welcome and Lightning Intros
PPTX
20090720 smith
DOCX
Paper Analysis Essay The 5-page Paper You Submit Must At L.docx
DOCX
Paper Analysis Essay The 5-page Paper You Submit Must At L.docx
Nondeterminism is unavoidable, but data races are pure evil
Programming Language Memory Models: What do Shared Variables Mean?
Memory model
HSA-4123, HSA Memory Model, by Ben Gaster
ISCA final presentation - Memory Model
Parallel Programming Models: Shared variable model
C11/C++11 Memory model. What is it, and why?
20150207 howes-gpgpu8-dark secrets
Design of Parallel and HPC, Lecture: Memory Models
Can we efficiently verify concurrent programs under relaxed memory models in ...
Memory model
HIS 2017 Mark Batty-Industrial concurrency specification for C/C++
Seminar on Parallel and Concurrent Programming
Introduction to multicore .ppt
Simon Peyton Jones: Managing parallelism
Peyton jones-2011-parallel haskell-the_future
Welcome and Lightning Intros
20090720 smith
Paper Analysis Essay The 5-page Paper You Submit Must At L.docx
Paper Analysis Essay The 5-page Paper You Submit Must At L.docx

More from Dr. C.V. Suresh Babu (20)

PPTX
Data analytics with R
PPTX
Association rules
PPTX
PPTX
Classification
PPTX
Blue property assumptions.
PPTX
Introduction to regression
PPTX
Expert systems
PPTX
Dempster shafer theory
PPTX
Bayes network
PPTX
Bayes' theorem
PPTX
Knowledge based agents
PPTX
Rule based system
PPTX
Formal Logic in AI
PPTX
Production based system
PPTX
Game playing in AI
PPTX
Diagnosis test of diabetics and hypertension by AI
PPTX
A study on “impact of artificial intelligence in covid19 diagnosis”
PDF
A study on “impact of artificial intelligence in covid19 diagnosis”
Data analytics with R
Association rules
Classification
Blue property assumptions.
Introduction to regression
Expert systems
Dempster shafer theory
Bayes network
Bayes' theorem
Knowledge based agents
Rule based system
Formal Logic in AI
Production based system
Game playing in AI
Diagnosis test of diabetics and hypertension by AI
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”

Recently uploaded (20)

DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PDF
01-Introduction-to-Information-Management.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
Weekly quiz Compilation Jan -July 25.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
History, Philosophy and sociology of education (1).pptx
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PDF
Updated Idioms and Phrasal Verbs in English subject
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
01-Introduction-to-Information-Management.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Weekly quiz Compilation Jan -July 25.pdf
Final Presentation General Medicine 03-08-2024.pptx
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
History, Philosophy and sociology of education (1).pptx
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
Microbial diseases, their pathogenesis and prophylaxis
202450812 BayCHI UCSC-SV 20250812 v17.pptx
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
Chinmaya Tiranga quiz Grand Finale.pdf
Final Presentation General Medicine 03-08-2024.pptx
Paper A Mock Exam 9_ Attempt review.pdf.
Practical Manual AGRO-233 Principles and Practices of Natural Farming
Updated Idioms and Phrasal Verbs in English subject
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf

Memory models

  • 2. Memory Consistency Models Parallelism for the masses! Shared-memory most common Memory model = Legal values for reads
  • 3. Memory Consistency Models Parallelism for the masses! Shared-memory most common Memory model = Legal values for reads
  • 4. Memory Consistency Models Parallelism for the masses! Shared-memory most common Memory model = Legal values for reads
  • 5. Memory Consistency Models Parallelism for the masses! Shared-memory most common Memory model = Legal values for reads
  • 6. Memory Consistency Models Parallelism for the masses! Shared-memory most common Memory model = Legal values for reads
  • 7. 20 Years of Memory Models • Memory model is at the heart of concurrency semantics – 20 year journey from confusion to convergence at last! – Hard lessons learned – Implications for future • Current way to specify concurrency semantics is too hard – Fundamentally broken • Must rethink parallel languages and hardware • Implications for broader CS disciplines
  • 8. What is a Memory Model? • Memory model defines what values a read can return Initially A=B=C=Flag=0 Thread 1 Thread 2 A = 26 while (Flag != 1) {;} 90 B = 90 r1 = B 26 0 … r2 = A Flag = 1 …
  • 9. Memory Model is Key to Concurrency Semantics • Interface between program and transformers of program – Defines what values a read can return Compiler Assembly C++ program Dynamic optimizer Hardware • Weakest system component exposed to the programmer – Language level model has implications for hardware • Interface must last beyond trends
  • 10. Desirable Properties of a Memory Model • 3 Ps – Programmability – Performance – Portability • Challenge: hard to satisfy all 3 Ps – Late 1980’s - 90’s: Largely driven by hardware • Lots of models, little consensus – 2000 onwards: Largely driven by languages/compilers • Consensus model for Java, C++ (C, others ongoing) • Had to deal with mismatches in hardware models Path to convergence has lessons for future
  • 11. Programmability – SC [Lamport79] • Programmability: Sequential consistency (SC) most intuitive – Operations of a single thread in program order – All operations in a total order or atomic • But Performance? – Recent (complex) hardware techniques boost performance with SC – But compiler transformations still inhibited • But Portability? – Almost all h/w, compilers violate SC today ⇒SC not practical, but…
  • 12. Next Best Thing – SC Almost Always • Parallel programming too hard even with SC – Programmers (want to) write well structured code – Explicit synchronization, no data races Thread 1 Thread 2 Lock(L) Lock(L) Read Data1 Read Data2 Write Data2 Write Data1 … … Unlock(L) Unlock(L) – SC for such programs much easier: can reorder data accesses ⇒ Data-race-free model [AdveHill90] – SC for data-race-free programs – No guarantees for programs with data races
  • 13. Definition of a Data Race • Distinguish between data and non-data (synchronization) accesses • Only need to define for SC executions ⇒ total order • Two memory accesses form a race if – From different threads, to same location, at least one is a write – Occur one after another Thread 1 Write, A, 26 Write, B, 90 Thread 2 Read, Flag, 0 Write, Flag, 1 Read, Flag, 1 Read, B, 90 Read, A, 26 • A race with a data access is a data race • Data-race-free-program = No data race in any SC execution
  • 14. Data-Race-Free Model Data-race-free model = SC for data-race-free programs – Does not preclude races for wait-free constructs, etc. • Requires races be explicitly identified as synchronization • E.g., use volatile variables in Java, atomics in C++ – Dekker’s algorithm Initially Flag1 = Flag2 = 0 volatile Flag1, Flag2 Thread1 Flag1 = 1 if Flag2 == 0 //critical section Thread2 Flag2 = 1 if Flag1 == 0 //critical section SC prohibits both loads returning 0
  • 15. Data-Race-Free Approach • Programmer’s model: SC for data-race-free programs • Programmability – Simplicity of SC, for data-race-free programs • Performance – Specifies minimal constraints (for SC-centric view) • Portability – Language must provide way to identify races – Hardware must provide way to preserve ordering on races – Compiler must translate correctly
  • 16. 1990's in Practice (The Memory Models Mess) • Hardware – Implementation/performance-centric view LD LD LD – Various ordering guarantees + fences to impose other orders – Many ambiguities - due to complexity, by design(?), … • High-level languages – Most shared-memory programming with Pthreads, OpenMP • Incomplete, ambiguous model specs • Memory model property of language, not library [Boehm05] – Java – commercially successful language with threads • Chapter 17 of Java language spec on memory model • But hard to interpret, badly broken ST Fence – Different vendors had different models – most non-SC • Alpha, Sun, x86, Itanium, IBM, AMD, HP, Cray, … ST ST LD ST
  • 17. 2000 – 2004: Java Memory Model • ~ 2000: Bill Pugh publicized fatal flaws in Java model • Lobbied Sun to form expert group to revise Java model • Open process via mailing list – – – – Diverse participants Took 5 years of intense, spirited debates Many competing models Final consensus model approved in 2005 for Java 5.0 [MansonPughAdve POPL 2005]
  • 18. Java Memory Model Highlights • Quick agreement that SC for data-race-free was required • Missing piece: Semantics for programs with data races – Java cannot have undefined semantics for ANY program – Must ensure safety/security guarantees – Limit damage from data races in untrusted code • Goal: Satisfy security/safety, w/ maximum system flexibility – Problem: “safety/security, limited damage” w/ threads very vague
  • 19. Java Memory Model Highlights Initially X=Y=0 Thread 1 r1 = X Y = r1 Thread 2 r2 = Y X = r2 Is r1=r2=42 allowed? Data races produce causality loop! Definition of a causality loop was surprisingly hard Common compiler optimizations seem to violate“causality”
  • 20. Java Memory Model Highlights • Final model based on consensus, but complex – Programmers can (must) use “SC for data-race-free” – But system designers must deal with complexity – Correctness tools, racy programs, debuggers, …?? – Recent discovery of bugs [SevcikAspinall08]
  • 21. 2005 - :C++, Microsoft Prism, Multicore • ~ 2005: Hans Boehm initiated C++ concurrency model – Prior status: no threads in C++, most concurrency w/ Pthreads • Microsoft concurrently started its own internal effort • C++ easier than Java because it is unsafe – Data-race-free is plausible model • BUT multicore ⇒ New h/w optimizations, more scrutiny – Mismatched h/w, programming views became painfully obvious – Debate that SC for data-race-free inefficient w/ hardware models
  • 22. C++ Challenges 2006: Pressure to change Java/C++ to remove SC baseline To accommodate some hardware vendors • But what is alternative? – Must allow some hardware optimizations – But must be teachable to undergrads • Showed such an alternative (probably) does not exist
  • 23. C++ Compromise • Default C++ model is data-race-free • AMD, Intel, … on board • But – Some systems need expensive fence for SC – Some programmers really want more flexibility • C++ specifies low-level atomics only for experts • Complicates spec, but only for experts • We are not advertising this part – [BoehmAdve PLDI 2008]
  • 24. Summary of Current Status • Convergence to “SC for data-race-free” as baseline • For programs with data races – Minimal but complex semantics for safe languages – No semantics for unsafe languages
  • 25. Lessons Learned • Specifying semantics for programs with data races is HARD – But “no semantics for data races” also has problems • Not an option for safe languages • Debugging, correctness checking tools • Hardware-software mismatch for some code – “Simple” optimizations have unintended consequences ⇒State-of-the-art is fundamentally broken
  • 26. Lessons Learned • Specifying semantics for programs with data races is HARD – But “no semantics for data races” also has problems • Not an option for safe languages • Debugging, correctness checking tools Banish mismatch for some code • Hardware-softwareshared-memory? – “Simple” optimizations have unintended consequences ⇒State-of-the-art is fundamentally broken
  • 27. Lessons Learned • Specifying semantics for programs with data races is HARD – But “no semantics for data races” also has problems • Not an option for safe languages • Debugging, correctness checking tools Banish wild shared-memory! • Hardware-software mismatch for some code – “Simple” optimizations have unintended consequences ⇒State-of-the-art is fundamentally broken • We need – Higher-level disciplined models that enforce discipline – Hardware co-designed with high-level models
  • 28. Research Agenda for Languages • Disciplined shared-memory models – Simple – Enforceable – Expressive – Performance Key: What discipline? How to enforce it?
  • 29. Data-Race-Free • A near-term discipline: Data-race-free • Enforcement – Ideally, language prohibits by design – e.g., ownership types [Boyapati+02] – Else, runtime catches as exception e.g., Goldilocks [Elmas+07] • But work still needed for expressivity and/or performance • But data-race-free still not sufficiently high level
  • 30. Deterministic-by-Default Parallel Programming • Even data-race-free parallel programs are too hard – Multiple interleavings due to unordered synchronization (or races) – Makes reasoning and testing hard • But many algorithms are deterministic – Fixed input gives fixed output – Standard model for sequential programs – Also holds for many transformative parallel programs • Parallelism not part of problem specification, only for performance Why write such an algorithm in non-deterministic style, then struggle to understand and control its behavior?
  • 31. Deterministic-by-Default Model • Parallel programs should be deterministic-by-default – Sequential semantics (easier than SC!) • If non-determinism is needed – should be explicitly requested, encapsulated – should not interfere with guarantees for the rest of the program • Enforcement: – Ideally, language prohibits by design – Else, runtime catches violations as exceptions
  • 32. State-of-the-art • Many deterministic languages today – Functional, pure data parallel, some domain-specific, … – Much recent work on runtime, library-based approaches – E.g., Allen09, Divietti09, Olszewski09, … • Our work: Language approach for modern O-O methods – Deterministic Parallel Java (DPJ) [V. Adve et al.]
  • 33. Deterministic Parallel Java (DPJ) • Object-oriented type and effect system – – – – Aliasing information: partition the heap into “regions” Effect specifications: regions read or written by each method Language guarantees determinism through type checking Side benefit: regions, effects are valuable documentation • Implemented as extension to base Java type system – – – – Initial evaluation for expressivity, performance [Bocchino+09] Semi-automatic tool for region annotations [Vakilian+09] Recent work on encapsulating frameworks and unchecked code Ongoing work on integrating non-determinism
  • 34. Implications for Hardware • Current hardware not matched even to current model • Near term: ISA changes, speculation • Long term: Co-design hardware with new software models – Use disciplined software to make more efficient hardware – Use hardware to support disciplined software
  • 35. Illinois DeNovo Project • How to design hardware from the ground up to – Exploit disciplined parallelism • for better performance, power, … – Support disciplined parallelism • for better dependability • Working with DPJ to exploit region, effect information – Software-assisted coherence, communication, scheduling – New hardware/software interface – Opportune time as we determine how to scale multicore
  • 36. Conclusions • Current way to specify concurrency semantics fundamentally broken – Best we can do is SC for data-race-free • But cannot hide from programs with data races – Mismatched hardware-software • Simple optimizations give unintended consequences • Need – High-level disciplined models that enforce discipline – Hardware co-designed with high-level models • E.g., DPJ, DeNovo – Implications for many CS communities