Introduction to Parallel Programming with OpenMP in C++
Last Updated :
19 Mar, 2023
Parallel programming is the process of breaking down a large task into smaller sub-tasks that can be executed simultaneously, thus utilizing the available computing resources more efficiently. OpenMP is a widely used API for parallel programming in C++. It allows developers to write parallel code easily and efficiently by adding simple compiler directives to their existing code.
Syntax of OpenMP
OpenMP uses compiler directives to indicate the parallel sections of the code. The directives are preceded by the "#pragma" keyword and take the form:
#pragma omp <directive> [clause[,clause]...]
Parameters
1. The following are some common OpenMP directives:
- "parallel": create a team of threads that execute the enclosed code block in parallel.
- "for": splits a loop into smaller iterations that can be executed in parallel by different threads.
- "sections": split the enclosed code block into different sections that can be executed in parallel.
- "single": specifies that a code block should be executed by only one thread.
- "critical": specifies that a code block should be executed by only one thread at a time.
- "atomic": specifies that a variable should be accessed atomically.
2. clause: The clauses provide additional information to the directives. For example, the "num_threads" clause specifies the number of threads to be used for a parallel section.
Steps for Parallel Programming
Steps needed to achieve (openMP) parallelize in your programming:
1. Include the OpenMP header file:
#include <omp.h>
2. Add the OpenMP directives to the relevant sections of your code.
#pragma omp parallel
{
// Code block to be executed in parallel
}
Examples of Parallel Programming
Example 1: In this example, we define two functions, "sum_serial" and "sum_parallel", that calculate the sum of the first n natural numbers using a for a loop. The "sum_serial" function uses a serial implementation, while the "sum_parallel" function uses OpenMP to parallelize the for loop. We then benchmark the two implementations by calling both functions with n=100000000 and measuring the time taken to complete the task using the high_resolution_clock class from the chrono library.
Below is the implementation of the above example:
C++
// C++ Program to implement calculate
// the sum of the first n natural numbers
// using Parallel Programming
#include <chrono>
#include <iostream>
int sum_serial(int n)
{
int sum = 0;
for (int i = 0; i <= n; ++i) {
sum += i;
}
return sum;
}
// Parallel programming function
int sum_parallel(int n)
{
int sum = 0;
#pragma omp parallel for reduction(+ : sum)
for (int i = 0; i <= n; ++i) {
sum += i;
}
return sum;
}
// Driver Function
int main()
{
const int n = 100000000;
auto start_time
= std::chrono::high_resolution_clock::now();
int result_serial = sum_serial(n);
auto end_time
= std::chrono::high_resolution_clock::now();
std::chrono::duration<double> serial_duration
= end_time - start_time;
start_time = std::chrono::high_resolution_clock::now();
int result_parallel = sum_parallel(n);
end_time = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> parallel_duration
= end_time - start_time;
std::cout << "Serial result: " << result_serial
<< std::endl;
std::cout << "Parallel result: " << result_parallel
<< std::endl;
std::cout << "Serial duration: "
<< serial_duration.count() << " seconds"
<< std::endl;
std::cout << "Parallel duration: "
<< parallel_duration.count() << " seconds"
<< std::endl;
std::cout << "Speedup: "
<< serial_duration.count()
/ parallel_duration.count()
<< std::endl;
return 0;
}
OutputSerial result: 987459712
Parallel result: 987459712
Serial duration: 0.0942459 seconds
Parallel duration: 0.0658899 seconds
Speedup: 1.43035
Example 2: In this example, we're computing an approximation of pi using the following formula:
pi/4 = 1 - 1/3 + 1/5 - 1/7 + 1/9 - ...
Doing this by summing a large number of terms in the formula, and we're using OpenMP to parallelize the for loop that does the summation.
The compute_pi_serial function implements the formula in serial, using a simple for loop to compute the sum. The compute_pi_parallel function is parallelized using OpenMP, with the #pragma omp parallel for reduction(+:sum) directive.
The main function runs both the serial and parallel versions of the code and measures the execution time of each version using the high_resolution_clock class from the chrono library. It also calculates the speedup achieved by the parallel version.
Below is the implementation of the above example:
C++
// C++ Program to implement
// Parallel Programming
#include <chrono>
#include <iostream>
#include <omp.h>
// Computes the value of pi using a serial computation.
double compute_pi_serial(long num_steps)
{
double step = 1.0 / num_steps;
double sum = 0.0;
for (long i = 0; i < num_steps; i++) {
double x = (i + 0.5) * step;
sum += 4.0 / (1.0 + x * x);
}
return sum * step;
}
// Computes the value of pi using a parallel computation.
double compute_pi_parallel(long num_steps)
{
double step = 1.0 / num_steps;
double sum = 0.0;
// parallelize loop and reduce sum variable
#pragma omp parallel for reduction(+ : sum)
for (long i = 0; i < num_steps; i++) {
double x = (i + 0.5) * step;
sum += 4.0 / (1.0 + x * x);
}
return sum * step;
}
// Driver function
int main()
{
const long num_steps = 1000000000L;
// Compute pi using serial computation and time it.
auto start_time
= std::chrono::high_resolution_clock::now();
double pi_serial = compute_pi_serial(num_steps);
auto end_time
= std::chrono::high_resolution_clock::now();
std::chrono::duration<double> serial_duration
= end_time - start_time;
// Compute pi using parallel computation and time it.
start_time = std::chrono::high_resolution_clock::now();
double pi_parallel = compute_pi_parallel(num_steps);
end_time = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> parallel_duration
= end_time - start_time;
std::cout << "Serial result: " << pi_serial
<< std::endl;
std::cout << "Parallel result: " << pi_parallel
<< std::endl;
std::cout << "Serial duration: "
<< serial_duration.count() << " seconds"
<< std::endl;
std::cout << "Parallel duration: "
<< parallel_duration.count() << " seconds"
<< std::endl;
std::cout << "Speedup: "
<< serial_duration.count()
/ parallel_duration.count()
<< std::endl;
return 0;
}
OutputSerial result: 3.14159
Parallel result: 3.14159
Serial duration: 1.64776 seconds
Parallel duration: 1.51894 seconds
Speedup: 1.08481
Similar Reads
Introduction of HIP parallel programming language
One must have header terms parallel computing, heterogeneous computing, high performance computing (HPC). If not you can explore HPC domain it is one of the fastest growing technology. It has touched almost all industries such as Medical, Security, Entertainment, Autonomous car, Finance, Oil industr
2 min read
Introduction to C++ Programming Language
C++ is a general-purpose programming language that was developed by Bjarne Stroustrup as an enhancement of the C language to add object-oriented paradigm. It is a high-level programming language that was first released in 1985 and since then has become the foundation of many modern technologies like
4 min read
OpenMP | Introduction with Installation Guide
After a long thirst for parallelizing highly regular loops in matrix-oriented numerical programming, OpenMP was introduced by OpenMP Architecture Review Board (ARB) on 1997 . In the subsequent releases, the enthusiastic OpenMP team added many features to it including the task parallelizing, support
3 min read
C++ Tutorial | Learn C++ Programming
C++ is a popular programming language that was developed as an extension of the C programming language to include OOPs programming paradigm. Since then, it has become foundation of many modern technologies like game engines, web browsers, operating systems, financial systems, etc.Features of C++Why
5 min read
C++ Programming Multiple Choice Questions
C++ is the most used and most popular programming language developed by Bjarne Stroustrup. C++ is a high-level and object-oriented programming language. This language allows developers to code clean and efficient code for large applications and software like software/Application development, game de
1 min read
Writing code faster during Competitive Programming in C++
This article focuses on how to implement your solutions and implement them fast while doing competitive programming. Setup Please refer Setting up a C++ Competitive Programming Environment Snippets Snippet is a programming term for a small region of re-usable source Code. A lot of modern text editor
3 min read
Writing C/C++ code efficiently in Competitive programming
First of all you need to know about Template, Macros and Vectors before moving on the next phase! Templates are the foundation of generic programming, which involve writing code in a way that is independent of any particular type.A Macro is a fragment of code which has been given a name. Whenever th
6 min read
C++ Program to Implement Minimum Spanning Tree
Given a weighted graph, the objective is to find a subset of edges that forms a tree including every vertex, where the total weight of all the edges in the tree is minimized. In this article, we will explore various algorithms through which we can implement a Minimum Spanning Tree in C++.What is Min
7 min read
Which C++ libraries are useful for competitive programming?
C++ is one of the most recommended languages in competitive programming (please refer our previous article for the reason) C++ STL contains lots of containers which are useful for different purposes. In this article, we are going to focus on the most important containers from competitive programming
3 min read
C++ tricks for competitive programming (for C++ 11)
We have discussed some tricks in the below post. In this post, some more tricks are discussed. Writing C/C++ code efficiently in Competitive programming Although, practice is the only way that ensures increased performance in programming contests but having some tricks up your sleeve ensures an uppe
5 min read