SlideShare a Scribd company logo
Introduction to OpenMP


Presenter: Vengada Karthik Rangaraju

           Fall 2012 Term

       September 13th, 2012
What is openMP?

•   Open Standard for Shared Memory Multiprocessing
•   Goal: Exploit multicore hardware with shared memory
•   Programmer’s view: The openMP API
•   Structure: Three primary API components:
    – Compiler directives,
    – Runtime Library routines and
    – Environment Variables
Shared Memory Architecture in a
    Multi-Core Environment
The key components of the API and its
             functions

• Compiler Directives
   - Spawning parallel regions (threads)
   - Synchronizing
   - Dividing blocks of code among threads
   - Distributing loop iterations
The key components of the API and its
             functions

• Runtime Library Routines
   - Setting & querying no. of threads
   - Nested parallelism
   - Control over locks
   - Thread information
The key components of the API and its
             functions

• Environment Variables
   - Setting no. of threads
   - Specifying how loop iterations are divided
   - Thread processor binding
   - Enabling/Disabling dynamic threads
   - Nested parallelism
Goals
• Standardization
• Ease of Use
• Portability
Paradigm for using openMP
          Write sequential
              program


         Find parallelizable
        portions of program

                                       Insert calls to
               Insert                 runtime library
        directives/pragmas     +   routines and modify
         into existing code            environment
                                    variables, if desired

          Use openMP’s
        extended Compiler
                                      What happens
                                         here?

        Compile and run !
Compiler translation


#pragma omp <directive-type> <directive-clauses></n>
{
……
…..// Block of code executed as per instruction !
}
Basic Example in C
{
… //Sequential
}
 #pragma omp parallel //fork
{
printf(“Hello from thread
   %d.n”,omp_get_thread_num());
} //join
{
… //Sequential
}
What exactly happens when lines of
    code are executed in parallel?


• A team of threads are created
• Each thread can have its own set of private
  variables
• All threads can have shared variables
• Original thread : Master Thread
• Fork-Join Model
• Nested Parallelism
openMP LifeCycle – Petrinet model
Compiler directives – The Multi Core
           Magic Spells !
  <directive type>   Description
  parallel           Each thread will perform
                     same computation as
                     others(replicated
                     computations)
  for / sections     These are called workshare
                     directives. Portions of
                     overall work divided among
                     threads(different
                     computations). They don’t
                     create threads. It has to be
                     enclosed inside a parallel
                     directive for threads to
                     takeover the divided work.
Compiler directives – The Multi Core
             Magic Spells !

• Types of workshare directives

   for                      Countable iteration[static]

   sections                 One or more sequential
                            sections of code, executed
                            by a single thread

   single                   Serializes a section of code
Compiler directives – The Multi Core
             Magic Spells !
• Clauses associated with each directive


    <directive type>       <directive clause>
    parallel               If(expression)
                           private(var1,var2,…)
                           firstprivate(var1,var2,..)
                           lastprivate(var1,var2,..)
                           shared(var1,var2,..)
                           NUM_THREADS(integer value)
Compiler directives – The Multi Core
             Magic Spells !
• Clauses associated with each directive

   <directive type>       <directive clause>
   for                    schedule(type, chunk)
                          private(var1,var2,…)
                          firstprivate(var1,var2,..)
                          lastprivate(var1,var2,..)
                          shared(var1,var2,..)
                          collapse(n)
                          nowait
                          Reduction(operator:list)
Compiler directives – The Multi Core
             Magic Spells !
• Clauses associated with each directive



   <directive type>       <directive clause>
   sections               private(var1,var2,…)
                          firstprivate(var1,var2,..)
                          lastprivate(var1,var2,..)
                          reduction(operator:list)
                          nowait
Matrix Multiplication using loop
                directive
 #pragma omp parallel private(i,j,k)
{
  #pragma omp for
  for(i=0;i<N;i++)
      for(k=0;k<K;k++)
            for(j=0;j<M;j++)
                  C[i][j]=C[i][j]+A[i][k]*B[k][j];
}
Scheduling Parallel Loops
•   Static
•   Dynamic
•   Guided
•   Automatic
•   Runtime
Scheduling Parallel Loops
• Static - Amount of work/iteration - same
         - Set of contiguous chunks in RR fashion
         - 1 Chunk = x iterations
Scheduling Parallel Loops
• Dynamic - Amount of work/iteration - Varies
           - Each thread will grab chunk of
             iterations and return to grab another
             chunk when it has executed them.
• Guided - Same as dynamic, only difference,
         - a good proportion of iterations
            remaining are shared among each
            thread.
Scheduling Parallel Loops
• Runtime - Schedule determined using an
            environment variable. Library
            routine provided !
• Automatic - Implementation chooses any
               schedule
Matrix Multiplication using loop
      directive – with a schedule
 #pragma omp parallel private(i,j,k)
{
  #pragma omp for schedule(static)
  for(i=0;i<N;i++)
      for(k=0;k<K;k++)
            for(j=0;j<M;j++)
                  C[i][j]=C[i][j]+A[i][k]*B[k][j];
}
openMP worshare directive – sections
 int g;
 void foo(int m, int n)
{
      int p,i;
        #pragma omp sections firstprivate(g) nowait
        {
            #pragma omp section
            {
               p=f1(g);
               for(i=0;i<m;i++)
               do_stuff;
            }
            #pragma omp section
            {
               p=f2(g);
               for(i=0;i<n;i++)
               do_other_stuff;
            }
        }
return;
}
Parallelizing when the no.of Iterations
        is unknown[dynamic] !


• openMP has a directive called task
Explicit Tasks
 void processList(Node* list)
{
    #pragma omp parallel
    pragma omp single
    {
       Node *currentNode = list;
       while(currentNode)
        {
           #pragma omp task firstprivate(currentNode)
           doWork(currentNode);
          currentNode=currentNode->next;
        }
     }
}
Explicit Tasks – Petrinet Model
Synchronization
•   Barrier
•   Critical
•   Atomic
•   Flush
Performing Reductions
• A loop containing reduction will always be
  sequential, since each iteration would form a
  result depending on previous iteration.
• openMP allows these loops to be parallelized
  as long as the developer says, loop contains
  reduction and indicates the variable and kind
  of reduction via “Clauses”
Without using reduction
#pragma omp parallel shared(array,sum)
firstprivate(local_sum)
{
    #pragma omp for private(i,j)
    for(i=0;i<max_i;i++)
    {
          for(j=0;j<max_j;++j)
          local_sum+=array[i][j];
    }
}
#pragma omp critical
sum+=local_sum;
}
Using Reductions in openMP
sum=0;
#pragma omp parallel shared(array)
{
  #pragma omp for reduction(+:sum) private(i,j)
  for(i=0;i<max_i;i++)
  {
       for(j=0;j<max_j;++j)
       sum+=array[i][j];
  }
}
Programming for performance
• Use of IF clause before creating parallel
  regions
• Understanding Cache Coherence
• Judicious use of parallel and flush
• Critical and atomic - know the difference !
• Avoid unnecessary computations in critical
  region
• Use of barrier - a starvation alert !
References
• NUMA UMA

   http://vvirtual.wordpress.com/2011/06/13/what-is-numa/

   http://www.e-zest.net/blog/non-uniform-memory-architecture-numa/

• openMP basics

   https://computing.llnl.gov/tutorials/openMP/

• Workshop on openMP SMP, by Tim Mattson from Intel (video)

  http://www.youtube.com/watch?v=TzERa9GA6vY
Interesting links

• openMP official page

   http://openmp.org/wp/

• 32 openMP Traps for C++ Developers

   http://www.viva64.com/en/a/0054/#ID0EMULM

More Related Content

PPT
Clock synchronization in distributed system
PDF
management of distributed transactions
PPTX
Database , 8 Query Optimization
PPTX
Shared Memory Multi Processor
PPTX
Virtual Memory
PPT
System models in distributed system
PDF
Shared-Memory Multiprocessors
Clock synchronization in distributed system
management of distributed transactions
Database , 8 Query Optimization
Shared Memory Multi Processor
Virtual Memory
System models in distributed system
Shared-Memory Multiprocessors

What's hot (20)

PPTX
Distributed shared memory ch 5
PPT
PPTX
Parallel Programming
PPTX
Concurrency
PPTX
Bandwidth utilization
PPTX
Communication model of parallel platforms
PDF
operating system structure
PPTX
Replication Techniques for Distributed Database Design
PPTX
Inter Process Communication
PPTX
Query processing in Distributed Database System
PDF
Ddb 1.6-design issues
PDF
Course outline of parallel and distributed computing
PPTX
Firewall Design and Implementation
PPTX
Fragmentaton
PDF
PPTX
Computer architecture virtual memory
PPTX
Pram model
PPT
Client Centric Consistency Model
PPTX
Lecture 3 threads
PPT
Computer architecture pipelining
Distributed shared memory ch 5
Parallel Programming
Concurrency
Bandwidth utilization
Communication model of parallel platforms
operating system structure
Replication Techniques for Distributed Database Design
Inter Process Communication
Query processing in Distributed Database System
Ddb 1.6-design issues
Course outline of parallel and distributed computing
Firewall Design and Implementation
Fragmentaton
Computer architecture virtual memory
Pram model
Client Centric Consistency Model
Lecture 3 threads
Computer architecture pipelining
Ad

Viewers also liked (14)

PPTX
Intro to OpenMP
PDF
OpenMP Tutorial for Beginners
ODP
OpenMp
KEY
OpenMP
PDF
Open mp intro_01
PDF
Open mp
PDF
Openmp combined
PDF
Wolfgang Lehner Technische Universitat Dresden
PDF
Biref Introduction to OpenMP
PPTX
PDF
Parallel-kmeans
PDF
Deep Learning at Scale
PPTX
PDF
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Intro to OpenMP
OpenMP Tutorial for Beginners
OpenMp
OpenMP
Open mp intro_01
Open mp
Openmp combined
Wolfgang Lehner Technische Universitat Dresden
Biref Introduction to OpenMP
Parallel-kmeans
Deep Learning at Scale
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Ad

Similar to Presentation on Shared Memory Parallel Programming (20)

PPT
Lecture7
PDF
Introduction to OpenMP
PDF
Introduction to OpenMP
PPTX
MPI n OpenMP
PDF
Open MP cheet sheet
PDF
Introduction to OpenMP (Performance)
PPT
Lecture8
PPT
OpenMP-Quinn17_L4bOpen <MP_Open MP_Open MP
PPT
Programming using Open Mp
PDF
Parallel Programming
PPT
Lecture6
PPT
Nbvtalkataitamimageprocessingconf
PDF
Parallel and Distributed Computing Chapter 5
PPT
openmp.ppt
PPT
openmp.ppt
PDF
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
PPT
OPEN MP TO FOR knowing more in the front
PPT
Parllelizaion
PDF
(3) cpp procedural programming
PDF
CUG2011 Introduction to GPU Computing
Lecture7
Introduction to OpenMP
Introduction to OpenMP
MPI n OpenMP
Open MP cheet sheet
Introduction to OpenMP (Performance)
Lecture8
OpenMP-Quinn17_L4bOpen <MP_Open MP_Open MP
Programming using Open Mp
Parallel Programming
Lecture6
Nbvtalkataitamimageprocessingconf
Parallel and Distributed Computing Chapter 5
openmp.ppt
openmp.ppt
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
OPEN MP TO FOR knowing more in the front
Parllelizaion
(3) cpp procedural programming
CUG2011 Introduction to GPU Computing

Recently uploaded (20)

PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Insiders guide to clinical Medicine.pdf
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PDF
Business Ethics Teaching Materials for college
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
The Final Stretch: How to Release a Game and Not Die in the Process.
PDF
01-Introduction-to-Information-Management.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Electrolyte Disturbances and Fluid Management A clinical and physiological ap...
PPTX
Cardiovascular Pharmacology for pharmacy students.pptx
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
PSYCHOLOGY IN EDUCATION.pdf ( nice pdf ...)
PPTX
Cell Structure & Organelles in detailed.
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
TR - Agricultural Crops Production NC III.pdf
Insiders guide to clinical Medicine.pdf
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
Business Ethics Teaching Materials for college
Microbial diseases, their pathogenesis and prophylaxis
102 student loan defaulters named and shamed – Is someone you know on the list?
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Microbial disease of the cardiovascular and lymphatic systems
Renaissance Architecture: A Journey from Faith to Humanism
The Final Stretch: How to Release a Game and Not Die in the Process.
01-Introduction-to-Information-Management.pdf
Anesthesia in Laparoscopic Surgery in India
Electrolyte Disturbances and Fluid Management A clinical and physiological ap...
Cardiovascular Pharmacology for pharmacy students.pptx
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PSYCHOLOGY IN EDUCATION.pdf ( nice pdf ...)
Cell Structure & Organelles in detailed.
O5-L3 Freight Transport Ops (International) V1.pdf

Presentation on Shared Memory Parallel Programming

  • 1. Introduction to OpenMP Presenter: Vengada Karthik Rangaraju Fall 2012 Term September 13th, 2012
  • 2. What is openMP? • Open Standard for Shared Memory Multiprocessing • Goal: Exploit multicore hardware with shared memory • Programmer’s view: The openMP API • Structure: Three primary API components: – Compiler directives, – Runtime Library routines and – Environment Variables
  • 3. Shared Memory Architecture in a Multi-Core Environment
  • 4. The key components of the API and its functions • Compiler Directives - Spawning parallel regions (threads) - Synchronizing - Dividing blocks of code among threads - Distributing loop iterations
  • 5. The key components of the API and its functions • Runtime Library Routines - Setting & querying no. of threads - Nested parallelism - Control over locks - Thread information
  • 6. The key components of the API and its functions • Environment Variables - Setting no. of threads - Specifying how loop iterations are divided - Thread processor binding - Enabling/Disabling dynamic threads - Nested parallelism
  • 7. Goals • Standardization • Ease of Use • Portability
  • 8. Paradigm for using openMP Write sequential program Find parallelizable portions of program Insert calls to Insert runtime library directives/pragmas + routines and modify into existing code environment variables, if desired Use openMP’s extended Compiler What happens here? Compile and run !
  • 9. Compiler translation #pragma omp <directive-type> <directive-clauses></n> { …… …..// Block of code executed as per instruction ! }
  • 10. Basic Example in C { … //Sequential } #pragma omp parallel //fork { printf(“Hello from thread %d.n”,omp_get_thread_num()); } //join { … //Sequential }
  • 11. What exactly happens when lines of code are executed in parallel? • A team of threads are created • Each thread can have its own set of private variables • All threads can have shared variables • Original thread : Master Thread • Fork-Join Model • Nested Parallelism
  • 12. openMP LifeCycle – Petrinet model
  • 13. Compiler directives – The Multi Core Magic Spells ! <directive type> Description parallel Each thread will perform same computation as others(replicated computations) for / sections These are called workshare directives. Portions of overall work divided among threads(different computations). They don’t create threads. It has to be enclosed inside a parallel directive for threads to takeover the divided work.
  • 14. Compiler directives – The Multi Core Magic Spells ! • Types of workshare directives for Countable iteration[static] sections One or more sequential sections of code, executed by a single thread single Serializes a section of code
  • 15. Compiler directives – The Multi Core Magic Spells ! • Clauses associated with each directive <directive type> <directive clause> parallel If(expression) private(var1,var2,…) firstprivate(var1,var2,..) lastprivate(var1,var2,..) shared(var1,var2,..) NUM_THREADS(integer value)
  • 16. Compiler directives – The Multi Core Magic Spells ! • Clauses associated with each directive <directive type> <directive clause> for schedule(type, chunk) private(var1,var2,…) firstprivate(var1,var2,..) lastprivate(var1,var2,..) shared(var1,var2,..) collapse(n) nowait Reduction(operator:list)
  • 17. Compiler directives – The Multi Core Magic Spells ! • Clauses associated with each directive <directive type> <directive clause> sections private(var1,var2,…) firstprivate(var1,var2,..) lastprivate(var1,var2,..) reduction(operator:list) nowait
  • 18. Matrix Multiplication using loop directive #pragma omp parallel private(i,j,k) { #pragma omp for for(i=0;i<N;i++) for(k=0;k<K;k++) for(j=0;j<M;j++) C[i][j]=C[i][j]+A[i][k]*B[k][j]; }
  • 19. Scheduling Parallel Loops • Static • Dynamic • Guided • Automatic • Runtime
  • 20. Scheduling Parallel Loops • Static - Amount of work/iteration - same - Set of contiguous chunks in RR fashion - 1 Chunk = x iterations
  • 21. Scheduling Parallel Loops • Dynamic - Amount of work/iteration - Varies - Each thread will grab chunk of iterations and return to grab another chunk when it has executed them. • Guided - Same as dynamic, only difference, - a good proportion of iterations remaining are shared among each thread.
  • 22. Scheduling Parallel Loops • Runtime - Schedule determined using an environment variable. Library routine provided ! • Automatic - Implementation chooses any schedule
  • 23. Matrix Multiplication using loop directive – with a schedule #pragma omp parallel private(i,j,k) { #pragma omp for schedule(static) for(i=0;i<N;i++) for(k=0;k<K;k++) for(j=0;j<M;j++) C[i][j]=C[i][j]+A[i][k]*B[k][j]; }
  • 24. openMP worshare directive – sections int g; void foo(int m, int n) { int p,i; #pragma omp sections firstprivate(g) nowait { #pragma omp section { p=f1(g); for(i=0;i<m;i++) do_stuff; } #pragma omp section { p=f2(g); for(i=0;i<n;i++) do_other_stuff; } } return; }
  • 25. Parallelizing when the no.of Iterations is unknown[dynamic] ! • openMP has a directive called task
  • 26. Explicit Tasks void processList(Node* list) { #pragma omp parallel pragma omp single { Node *currentNode = list; while(currentNode) { #pragma omp task firstprivate(currentNode) doWork(currentNode); currentNode=currentNode->next; } } }
  • 27. Explicit Tasks – Petrinet Model
  • 28. Synchronization • Barrier • Critical • Atomic • Flush
  • 29. Performing Reductions • A loop containing reduction will always be sequential, since each iteration would form a result depending on previous iteration. • openMP allows these loops to be parallelized as long as the developer says, loop contains reduction and indicates the variable and kind of reduction via “Clauses”
  • 30. Without using reduction #pragma omp parallel shared(array,sum) firstprivate(local_sum) { #pragma omp for private(i,j) for(i=0;i<max_i;i++) { for(j=0;j<max_j;++j) local_sum+=array[i][j]; } } #pragma omp critical sum+=local_sum; }
  • 31. Using Reductions in openMP sum=0; #pragma omp parallel shared(array) { #pragma omp for reduction(+:sum) private(i,j) for(i=0;i<max_i;i++) { for(j=0;j<max_j;++j) sum+=array[i][j]; } }
  • 32. Programming for performance • Use of IF clause before creating parallel regions • Understanding Cache Coherence • Judicious use of parallel and flush • Critical and atomic - know the difference ! • Avoid unnecessary computations in critical region • Use of barrier - a starvation alert !
  • 33. References • NUMA UMA http://vvirtual.wordpress.com/2011/06/13/what-is-numa/ http://www.e-zest.net/blog/non-uniform-memory-architecture-numa/ • openMP basics https://computing.llnl.gov/tutorials/openMP/ • Workshop on openMP SMP, by Tim Mattson from Intel (video) http://www.youtube.com/watch?v=TzERa9GA6vY
  • 34. Interesting links • openMP official page http://openmp.org/wp/ • 32 openMP Traps for C++ Developers http://www.viva64.com/en/a/0054/#ID0EMULM