SlideShare a Scribd company logo
Parallel Programming

       By Roman Okolovich
Overview
   Traditionally, computer software has been written for serial
    computation. To solve a problem, an algorithm is constructed
    and implemented as a serial stream of instructions. These
    instructions are executed on a central processing unit on one
    computer. Only one instruction may execute at a time—after
    that instruction is finished, the next is executed.
   Nowadays one single machine (PC) can have multi-core
    and/or multi-processor computer architecture.
   A multiprocessor computer architecture where two or more
    identical processors can connect to a single shared main
    memory. Most common multiprocessor systems today use an
    SMP (symmetric multiprocessing) architecture. In the case of
    multi-core processors, the SMP architecture applies to the
    cores, treating them as separate processors.
Speedup
   The amount of performance gained by
    the use of a multi-core processor is
    strongly dependent on the software
    algorithms and implementation. In
    particular, the possible gains are limited
    by the fraction of the software that can
    be "parallelized" to run on multiple cores
    simultaneously; this effect is described
    by Amdahl's law. In the best case, so-
    called embarrassingly parallel problems
    may realize speedup factors near the
    number of cores. Many typical
    applications, however, do not realize
    such large speedup factors and thus,
    the parallelization of software is a
    significant on-going topic of research.
Intel Atom
   Nokia Booklet 3G - Intel® Atom™ Z530, 1.6 GHz
   Intel Atom is the brand name for a line of ultra-low-voltage x86
    and x86-64 CPUs (or microprocessors) from Intel, designed in 45
    nm CMOS and used mainly in Netbooks, Nettops and MIDs.
   Intel Atom can execute up to two instructions per cycle. The
    performance of a single core Atom is equal to around half that
    offered by an equivalent Celeron.
   Hyper-threading (officially termed Hyper-Threading Technology
    or HTT) is an Intel-proprietary technology used to improve
    parallelization of computations (doing multiple tasks at once)
    performed on PC microprocessors.
   A processor with hyper-threading enabled is treated by the
    operating system as two processors instead of one. This means
    that only one processor is physically present but the operating
    system sees two virtual processors, and shares the workload
    between them.
   The advantages of hyper-threading are listed as: improved
    support for multi-threaded code, allowing multiple threads to run
    simultaneously, improved reaction and response time.
Instruction level parallelism
   Instruction-level parallelism (ILP) is a measure of how
    many of the operations in a computer program can be
    performed simultaneously. Consider the following
    program:
   1. e = a + b
    2. f = c + d
    3. g = e * f
   Operation 3 depends on the results of operations 1 and
    2, so it cannot be calculated until both of them are
    completed. However, operations 1 and 2 do not depend
    on any other operation, so they can be calculated
    simultaneously. (See also: Data dependency) If we
    assume that each operation can be completed in one unit
    of time then these three instructions can be completed in
    a total of two units of time, giving an ILP of 3/2.
Qt 4's Multithreading
   Qt provides thread support in the form of platform-independent threading
    classes, a thread-safe way of posting events, and signal-slot connections
    across threads. This makes it easy to develop portable multithreaded Qt
    applications and take advantage of multiprocessor machines.
       QThread provides the means to start a new thread.
       QThreadStorage provides per-thread data storage.
       QThreadPool manages a pool of threads that run QRunnable objects.
       QRunnable is an abstract class representing a runnable object.
       QMutex provides a mutual exclusion lock, or mutex.
       QMutexLocker is a convenience class that automatically locks and unlocks a
        QMutex.
       QReadWriteLock provides a lock that allows simultaneous read access.
       QReadLocker and QWriteLocker are convenience classes that automatically lock
        and unlock a QReadWriteLock.
       QSemaphore provides an integer semaphore (a generalization of a mutex).
       QWaitCondition provides a way for threads to go to sleep until woken up by
        another thread.
       QAtomicInt provides atomic operations on integers.
       QAtomicPointer provides atomic operations on pointers.
OpenMP
   The OpenMP Application Program Interface (API) supports multi-platform
    shared-memory parallel programming in C/C++ and Fortran on all
    architectures, including Unix platforms and Windows NT platforms.
   OpenMP is a portable, scalable model that gives shared-memory parallel
    programmers a simple and flexible interface for developing parallel
    applications for platforms ranging from the desktop to the supercomputer.
   The designers of OpenMP wanted to provide an easy method to thread
    applications without requiring that the programmer know how to create,
    synchronize, and destroy threads or even requiring him or her to determine
    how many threads to create. To achieve these ends, the OpenMP designers
    developed a platform-independent set of compiler pragmas, directives,
    function calls, and environment variables that explicitly instruct the compiler
    how and where to insert threads into the application.
   Most loops can be threaded by inserting only one pragma right before the
    loop. Further, by leaving the nitty-gritty details to the compiler and OpenMP,
    you can spend more time determining which loops should be threaded and
    how to best restructure the algorithms for maximum performance.
OpenMP Example                                   • OpenMP places the following five restrictions on
#include <omp.h>                                      which loops can be threaded:
#include <stdio.h>                                       • The loop variable must be of type signed
int main() {                                               integer. Unsigned integers, such as
#pragma omp parallel                                       DWORD's, will not work.
printf("Hello from thread %d, nthreads %dn",            • The comparison operation must be in the
    omp_get_thread_num(), omp_get_num_threads());          form loop_variable <, <=, >, or >=
}                                                          loop_invariant_integer
                                                         • The third expression or increment portion of
//-------------------------------------------              the for loop must be either integer addition
#pragma omp parallel shared(n,a,b)                         or integer subtraction and by a loop
{                                                          invariant value.
  #pragma omp for                                        • If the comparison operation is < or <=, the
  for (int i=0; i<n; i++)                                  loop variable must increment on every
  {                                                        iteration, and conversely, if the comparison
   a[i] = i + 1;                                           operation is > or >=, the loop variable must
   #pragma omp parallel for                                decrement on every iteration.
   /*-- Okay - This is a parallel region --*/            • The loop must be a basic block, meaning
   for (int j=0; j<n; j++)                                 no jumps from the inside of the loop to the
    b[i][j] = a[i];                                        outside are permitted with the exception of
  }
                                                           the exit statement, which terminates the
} /*-- End of parallel region --*/
                                                           whole application. If the statements goto or
//-------------------------------------------
                                                           break are used, they must jump within the
#pragma omp parallel for
                                                           loop, not outside it. The same goes for
for (i=0; i < numPixels; i++)
                                                           exception handling; exceptions must be
{                                                          caught within the loop.
   pGrayScaleBitmap[i] = (unsigned BYTE)
            (pRGBBitmap[i].red * 0.299 +
             pRGBBitmap[i].green * 0.587 +
             pRGBBitmap[i].blue * 0.114);
}
OpenMP and Visual Studio
Intel Threading Building Blocks (TBB)
   Intel® Threading Building Blocks (Intel® TBB) is an award-winning C++ template
    library that abstracts threads to tasks to create reliable, portable, and scalable
    parallel applications. Just as the C++ Standard Template Library (STL) extends the
    core language, Intel TBB offers C++ users a higher level abstraction for parallelism.
    To implement Intel TBB, developers use familiar C++ templates and coding style,
    leaving low-level threading details to the library. It is also portable between
    architectures and operating systems.
   Intel® TBB for Windows (Linux, Mac OS) costs $299 per sit.


     #include   <iostream>
     #include   <string>
     #include   “tbb/parallel_for.h”
     #include   “tbb/blocked_range.h”
     using namespace tbb;
     using namespace std;
     int main() {
       //...
       parallel_for(blocked_range<size_t>(0, to_scan.size() ),
                    SubStringFinder( to_scan, max, pos ));
       //...
       return 0;
     }
Parallel Pattern Library (PPL)
   The Concurrency Runtime is a concurrent programming framework for C++.
    The Concurrency Runtime simplifies parallel programming and helps you
    write robust, scalable, and responsive parallel applications.
   The features that the Concurrency Runtime provides are unified by a
    common work scheduler. This work scheduler implements a work-stealing
    algorithm that enables your application to scale as the number of available
    processors increases.
   The Concurrency Runtime enables the following programming patterns and
    concepts:
       Imperative data parallelism: Parallel algorithms distribute computations on
        collections or on sets of data across multiple processors.
       Task parallelism: Task objects distribute multiple independent operations across
        processors.
       Declarative data parallelism: Asynchronous agents and message passing enable
        you to declare what computation has to be performed, but not how it is performed.
       Asynchrony: Asynchronous agents make productive use of latency by doing work
        while waiting for data.
   The Concurrency Runtime is provided as part of the C Runtime Library
    (CRT).
   Only Visual Studio 2010 supports PPL
Concurrency Runtime Architecture
   The Concurrency Runtime is divided into four components: the
    Parallel Patterns Library (PPL), the Asynchronous Agents Library,
    the work scheduler, and the resource manager. These components
    reside between the operating system and applications. The
    following illustration shows how the Concurrency Runtime
    components interact among the operating system and applications:
                               struct LongRunningOperationMsg{
                                       LongRunningOperationMsg (int x, int y)
                                       : m_x(x),m_y(y){}
                                       int m_x;
                                       int m_y;
                               }
                               call<LongRunningOperationMsg>*
                                LongRunningOperationCall = new
                                  call<LongRunningOperationMsg>([](
                                LongRunningOperationMsg msg)
                                {
                                 LongRunningOperation(msg.x, msg.y);
                                })
                               void SomeFunction(int x, int y){
                                   asend(LongRunningOperationCall,
                                         LongRunningOperationMsg(x,y));
                               }
References
   Parallel computing
   Superscalar
   Simultaneous multithreading
   Hyper-threading
   Thread Support in Qt
   OpenMP
   Intel: Getting Started with OpenMP
   Intel® Threading Building Blocks (Intel® TBB)
   Intel® Threading Building Blocks 2.2 for Open Source
   Concurrency Runtime Library
   Four Ways to Use the Concurrency Runtime in Your C++
    Projects
   Parallel Programming in Native Code blog

More Related Content

PDF
Introduction to OpenMP
PDF
Open mp
KEY
OpenMP
PDF
Open mp directives
PDF
OpenMP Tutorial for Beginners
PPTX
Presentation on Shared Memory Parallel Programming
PDF
Concurrent Programming OpenMP @ Distributed System Discussion
PPTX
Intro to OpenMP
Introduction to OpenMP
Open mp
OpenMP
Open mp directives
OpenMP Tutorial for Beginners
Presentation on Shared Memory Parallel Programming
Concurrent Programming OpenMP @ Distributed System Discussion
Intro to OpenMP

What's hot (20)

PDF
Open mp intro_01
PPTX
Parallelization using open mp
PPT
OpenMP And C++
PPTX
MPI n OpenMP
ODP
openmp
PDF
Introduction to OpenMP (Performance)
PDF
Introduction to OpenMP
ODP
OpenMp
PDF
Open mp library functions and environment variables
PPT
Programming using Open Mp
PPTX
PDF
Directive-based approach to Heterogeneous Computing
PDF
.Net Multithreading and Parallelization
PPT
Suyash Thesis Presentation
PPT
CFD - OpenFOAM
PPTX
Trends of SW Platforms for Heterogeneous Multi-core systems and Open Source ...
PDF
May2010 hex-core-opt
PDF
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
PDF
Compiler optimization
PPT
Automatic Generation of Peephole Superoptimizers
Open mp intro_01
Parallelization using open mp
OpenMP And C++
MPI n OpenMP
openmp
Introduction to OpenMP (Performance)
Introduction to OpenMP
OpenMp
Open mp library functions and environment variables
Programming using Open Mp
Directive-based approach to Heterogeneous Computing
.Net Multithreading and Parallelization
Suyash Thesis Presentation
CFD - OpenFOAM
Trends of SW Platforms for Heterogeneous Multi-core systems and Open Source ...
May2010 hex-core-opt
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Compiler optimization
Automatic Generation of Peephole Superoptimizers
Ad

Viewers also liked (8)

PPTX
code analysis for c++
PPTX
SDLC, Iterative Model
DOCX
process models- software engineering
PPTX
Introduction to parallel processing
PPSX
Agile vs Iterative vs Waterfall models
PPT
Software Development Life Cycle (SDLC)
PPT
Process models
PPTX
List of Software Development Model and Methods
code analysis for c++
SDLC, Iterative Model
process models- software engineering
Introduction to parallel processing
Agile vs Iterative vs Waterfall models
Software Development Life Cycle (SDLC)
Process models
List of Software Development Model and Methods
Ad

Similar to Parallel Programming (20)

PDF
Unmanaged Parallelization via P/Invoke
PPT
Lecture6
PDF
Open MP cheet sheet
PPT
openmp.New.intro-unc.edu.ppt
PPT
CS4961-L9.ppt
PPT
OPEN MP TO FOR knowing more in the front
PDF
Intel open mp
PPT
OpenMP-Quinn17_L4bOpen <MP_Open MP_Open MP
PDF
Multithreaded Programming Part- II.pdf
PPT
openmp.ppt
PPT
openmp.ppt
PPTX
openmp final2.pptx
PPTX
Blazing Fast Windows 8 Apps using Visual C++
PPT
slides8 SharedMemory.ppt
PPTX
Parallel Computing - openMP -- Lecture 5
PDF
32 OpenMP Traps For C++ Developers
PPT
parallel programming models
PPT
Parallel Programming: Beyond the Critical Section
PPTX
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
PDF
Introducing Parallel Pixie Dust
Unmanaged Parallelization via P/Invoke
Lecture6
Open MP cheet sheet
openmp.New.intro-unc.edu.ppt
CS4961-L9.ppt
OPEN MP TO FOR knowing more in the front
Intel open mp
OpenMP-Quinn17_L4bOpen <MP_Open MP_Open MP
Multithreaded Programming Part- II.pdf
openmp.ppt
openmp.ppt
openmp final2.pptx
Blazing Fast Windows 8 Apps using Visual C++
slides8 SharedMemory.ppt
Parallel Computing - openMP -- Lecture 5
32 OpenMP Traps For C++ Developers
parallel programming models
Parallel Programming: Beyond the Critical Section
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
Introducing Parallel Pixie Dust

More from Roman Okolovich (10)

PPTX
Unit tests and TDD
PPTX
C# XML documentation
PPT
Using QString effectively
PDF
Ram Disk
PDF
64 bits for developers
PDF
Virtual Functions
PDF
Visual Studio 2008 Overview
PDF
State Machine Framework
PDF
The Big Three
PDF
Smart Pointers
Unit tests and TDD
C# XML documentation
Using QString effectively
Ram Disk
64 bits for developers
Virtual Functions
Visual Studio 2008 Overview
State Machine Framework
The Big Three
Smart Pointers

Recently uploaded (20)

PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
1. Introduction to Computer Programming.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Encapsulation theory and applications.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Tartificialntelligence_presentation.pptx
PDF
Getting Started with Data Integration: FME Form 101
Assigned Numbers - 2025 - Bluetooth® Document
Building Integrated photovoltaic BIPV_UPV.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
1. Introduction to Computer Programming.pptx
MYSQL Presentation for SQL database connectivity
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Encapsulation theory and applications.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Unlocking AI with Model Context Protocol (MCP)
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
20250228 LYD VKU AI Blended-Learning.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Network Security Unit 5.pdf for BCA BBA.
Dropbox Q2 2025 Financial Results & Investor Presentation
Tartificialntelligence_presentation.pptx
Getting Started with Data Integration: FME Form 101

Parallel Programming

  • 1. Parallel Programming By Roman Okolovich
  • 2. Overview  Traditionally, computer software has been written for serial computation. To solve a problem, an algorithm is constructed and implemented as a serial stream of instructions. These instructions are executed on a central processing unit on one computer. Only one instruction may execute at a time—after that instruction is finished, the next is executed.  Nowadays one single machine (PC) can have multi-core and/or multi-processor computer architecture.  A multiprocessor computer architecture where two or more identical processors can connect to a single shared main memory. Most common multiprocessor systems today use an SMP (symmetric multiprocessing) architecture. In the case of multi-core processors, the SMP architecture applies to the cores, treating them as separate processors.
  • 3. Speedup  The amount of performance gained by the use of a multi-core processor is strongly dependent on the software algorithms and implementation. In particular, the possible gains are limited by the fraction of the software that can be "parallelized" to run on multiple cores simultaneously; this effect is described by Amdahl's law. In the best case, so- called embarrassingly parallel problems may realize speedup factors near the number of cores. Many typical applications, however, do not realize such large speedup factors and thus, the parallelization of software is a significant on-going topic of research.
  • 4. Intel Atom  Nokia Booklet 3G - Intel® Atom™ Z530, 1.6 GHz  Intel Atom is the brand name for a line of ultra-low-voltage x86 and x86-64 CPUs (or microprocessors) from Intel, designed in 45 nm CMOS and used mainly in Netbooks, Nettops and MIDs.  Intel Atom can execute up to two instructions per cycle. The performance of a single core Atom is equal to around half that offered by an equivalent Celeron.  Hyper-threading (officially termed Hyper-Threading Technology or HTT) is an Intel-proprietary technology used to improve parallelization of computations (doing multiple tasks at once) performed on PC microprocessors.  A processor with hyper-threading enabled is treated by the operating system as two processors instead of one. This means that only one processor is physically present but the operating system sees two virtual processors, and shares the workload between them.  The advantages of hyper-threading are listed as: improved support for multi-threaded code, allowing multiple threads to run simultaneously, improved reaction and response time.
  • 5. Instruction level parallelism  Instruction-level parallelism (ILP) is a measure of how many of the operations in a computer program can be performed simultaneously. Consider the following program:  1. e = a + b 2. f = c + d 3. g = e * f  Operation 3 depends on the results of operations 1 and 2, so it cannot be calculated until both of them are completed. However, operations 1 and 2 do not depend on any other operation, so they can be calculated simultaneously. (See also: Data dependency) If we assume that each operation can be completed in one unit of time then these three instructions can be completed in a total of two units of time, giving an ILP of 3/2.
  • 6. Qt 4's Multithreading  Qt provides thread support in the form of platform-independent threading classes, a thread-safe way of posting events, and signal-slot connections across threads. This makes it easy to develop portable multithreaded Qt applications and take advantage of multiprocessor machines.  QThread provides the means to start a new thread.  QThreadStorage provides per-thread data storage.  QThreadPool manages a pool of threads that run QRunnable objects.  QRunnable is an abstract class representing a runnable object.  QMutex provides a mutual exclusion lock, or mutex.  QMutexLocker is a convenience class that automatically locks and unlocks a QMutex.  QReadWriteLock provides a lock that allows simultaneous read access.  QReadLocker and QWriteLocker are convenience classes that automatically lock and unlock a QReadWriteLock.  QSemaphore provides an integer semaphore (a generalization of a mutex).  QWaitCondition provides a way for threads to go to sleep until woken up by another thread.  QAtomicInt provides atomic operations on integers.  QAtomicPointer provides atomic operations on pointers.
  • 7. OpenMP  The OpenMP Application Program Interface (API) supports multi-platform shared-memory parallel programming in C/C++ and Fortran on all architectures, including Unix platforms and Windows NT platforms.  OpenMP is a portable, scalable model that gives shared-memory parallel programmers a simple and flexible interface for developing parallel applications for platforms ranging from the desktop to the supercomputer.  The designers of OpenMP wanted to provide an easy method to thread applications without requiring that the programmer know how to create, synchronize, and destroy threads or even requiring him or her to determine how many threads to create. To achieve these ends, the OpenMP designers developed a platform-independent set of compiler pragmas, directives, function calls, and environment variables that explicitly instruct the compiler how and where to insert threads into the application.  Most loops can be threaded by inserting only one pragma right before the loop. Further, by leaving the nitty-gritty details to the compiler and OpenMP, you can spend more time determining which loops should be threaded and how to best restructure the algorithms for maximum performance.
  • 8. OpenMP Example • OpenMP places the following five restrictions on #include <omp.h> which loops can be threaded: #include <stdio.h> • The loop variable must be of type signed int main() { integer. Unsigned integers, such as #pragma omp parallel DWORD's, will not work. printf("Hello from thread %d, nthreads %dn", • The comparison operation must be in the omp_get_thread_num(), omp_get_num_threads()); form loop_variable <, <=, >, or >= } loop_invariant_integer • The third expression or increment portion of //------------------------------------------- the for loop must be either integer addition #pragma omp parallel shared(n,a,b) or integer subtraction and by a loop { invariant value. #pragma omp for • If the comparison operation is < or <=, the for (int i=0; i<n; i++) loop variable must increment on every { iteration, and conversely, if the comparison a[i] = i + 1; operation is > or >=, the loop variable must #pragma omp parallel for decrement on every iteration. /*-- Okay - This is a parallel region --*/ • The loop must be a basic block, meaning for (int j=0; j<n; j++) no jumps from the inside of the loop to the b[i][j] = a[i]; outside are permitted with the exception of } the exit statement, which terminates the } /*-- End of parallel region --*/ whole application. If the statements goto or //------------------------------------------- break are used, they must jump within the #pragma omp parallel for loop, not outside it. The same goes for for (i=0; i < numPixels; i++) exception handling; exceptions must be { caught within the loop. pGrayScaleBitmap[i] = (unsigned BYTE) (pRGBBitmap[i].red * 0.299 + pRGBBitmap[i].green * 0.587 + pRGBBitmap[i].blue * 0.114); }
  • 10. Intel Threading Building Blocks (TBB)  Intel® Threading Building Blocks (Intel® TBB) is an award-winning C++ template library that abstracts threads to tasks to create reliable, portable, and scalable parallel applications. Just as the C++ Standard Template Library (STL) extends the core language, Intel TBB offers C++ users a higher level abstraction for parallelism. To implement Intel TBB, developers use familiar C++ templates and coding style, leaving low-level threading details to the library. It is also portable between architectures and operating systems.  Intel® TBB for Windows (Linux, Mac OS) costs $299 per sit. #include <iostream> #include <string> #include “tbb/parallel_for.h” #include “tbb/blocked_range.h” using namespace tbb; using namespace std; int main() { //... parallel_for(blocked_range<size_t>(0, to_scan.size() ), SubStringFinder( to_scan, max, pos )); //... return 0; }
  • 11. Parallel Pattern Library (PPL)  The Concurrency Runtime is a concurrent programming framework for C++. The Concurrency Runtime simplifies parallel programming and helps you write robust, scalable, and responsive parallel applications.  The features that the Concurrency Runtime provides are unified by a common work scheduler. This work scheduler implements a work-stealing algorithm that enables your application to scale as the number of available processors increases.  The Concurrency Runtime enables the following programming patterns and concepts:  Imperative data parallelism: Parallel algorithms distribute computations on collections or on sets of data across multiple processors.  Task parallelism: Task objects distribute multiple independent operations across processors.  Declarative data parallelism: Asynchronous agents and message passing enable you to declare what computation has to be performed, but not how it is performed.  Asynchrony: Asynchronous agents make productive use of latency by doing work while waiting for data.  The Concurrency Runtime is provided as part of the C Runtime Library (CRT).  Only Visual Studio 2010 supports PPL
  • 12. Concurrency Runtime Architecture  The Concurrency Runtime is divided into four components: the Parallel Patterns Library (PPL), the Asynchronous Agents Library, the work scheduler, and the resource manager. These components reside between the operating system and applications. The following illustration shows how the Concurrency Runtime components interact among the operating system and applications: struct LongRunningOperationMsg{ LongRunningOperationMsg (int x, int y) : m_x(x),m_y(y){} int m_x; int m_y; } call<LongRunningOperationMsg>* LongRunningOperationCall = new call<LongRunningOperationMsg>([]( LongRunningOperationMsg msg) { LongRunningOperation(msg.x, msg.y); }) void SomeFunction(int x, int y){ asend(LongRunningOperationCall, LongRunningOperationMsg(x,y)); }
  • 13. References  Parallel computing  Superscalar  Simultaneous multithreading  Hyper-threading  Thread Support in Qt  OpenMP  Intel: Getting Started with OpenMP  Intel® Threading Building Blocks (Intel® TBB)  Intel® Threading Building Blocks 2.2 for Open Source  Concurrency Runtime Library  Four Ways to Use the Concurrency Runtime in Your C++ Projects  Parallel Programming in Native Code blog