32 OpenMP Traps for C++ Developers

Abstract

Because multi-core systems are spreading fast, the problem of parallel programming becomes more and more urgent. However, even the majority of experienced developers are new to this sphere. The existing compilers and code analyzers allow finding some bugs that appear during parallel code development. However, many errors are not diagnosed. The article contains a description of a number of errors that lead to incorrect behavior of parallel programs created with OpenMP.

Introduction

Parallel programming appeared a long time ago. The first multiprocessor computer was created in the 1960s. However, the processors’ performance increase has been achieved through clock frequency increment and multiprocessor systems have been rare until recently. The clock frequency increment slows down nowadays and processors’ performance increase is achieved through multiple cores. Multi-core processors are widespread; therefore, the problem of parallel programming becomes more and more urgent. Earlier, it was enough to install a CPU with a higher clock frequency or larger cache memory to increase a program’s performance. Now, this approach is useless and a developer will have to modify the program to increase the program’s performance.

Because parallel programming begins to gain popularity only now, the process of an existing application parallelization or a new parallel program creation may become very problematic even for experienced developers because this sphere is new for them. Currently existing compilers and code analyzers allow finding only some (very few) potential errors. All other errors remain unrecorded and may increase debug and testing time significantly. Besides that, almost all errors of this kind cannot be stably reproduced. The article concerns the C++ language, because C++ programs are usually demanded to work fast. Because Visual Studio 2005 & 2008 support the OpenMP 2.0 standard, you will concern yourselves with the OpenMP technology. OpenMP allows you to parallelize your code with minimal efforts; all you need to do is to enable the /openmp compiler option and add the needed compiler directives describing how the program’s execution flow should be parallelized to your code.

This article describes only some of the potential errors that are not diagnosed by compilers, static code analyzers, and dynamic code analyzers. However, we hope that this paper will help you understand some peculiarities of parallel development and avoid multiple errors.

Also, please note that this paper contains research results that will be used in the VivaMP static analyzer development (http://www.viva64.com/vivamp.php). The static analyzer will be designed to find errors in parallel programs created with OpenMP. We are very interested in receiving feedback on this article and learning more patterns of parallel programming errors.

The errors described in this article are split into logical errors and performance errors similar to the approach used in one of the references [1]. Logical errors are errors that cause unexpected results; in other words, incorrect program behavior. Performance errors are errors that decrease a program’s performance.

First of all, you should learn some specific terms that will be used in this article:

Directives are OpenMP directives that define code parallelization means. All OpenMP directives have the appearance of #pragma omp …

Clauses are auxiliary parts of OpenMP directives. Clauses define how a work is shared between threads, the number of threads, variables access mode, and so forth.

A parallel section is a code fragment to which the #pragma omp parallel directive is applied.

The article is for developers who are familiar with OpenMP and use the technology in their programs. If you are not familiar with OpenMP, we recommend that you take a look at this document [2]. A more detailed description of OpenMP directives, clauses, functions, and environment variables can be found in the OpenMP 2.0 specification [3]. The specification is duplicated in the MSDN Library and this form of specification is handier than the one in the PDF format.

Now, here are the potential errors that are badly diagnosed by standard compilers or are not diagnosed at all.

Logical Errors

1. Missing /openmp

Start with the simplest error: OpenMP directives will be ignored if OpenMP support is not enabled in the compiler settings. The compiler will not report an error or even a warning; the code simply will not work the way the developer expects.

OpenMP support can be enabled in the “Configuration Properties | C/C++ | Language” section of the project properties dialog.

2. Missing parallel

OpenMP directives have a rather complex format; therefore, first of all you will consider the simplest errors caused by an incorrect directive format. The following listings show incorrect and correct versions of the same code:

Incorrectly:

#pragma omp for
... //your code

Correctly:

#pragma omp parallel for
... // your code
#pragma omp parallel
{
   #pragma omp for
   ... //your code
}

The first code fragment will be successfully compiled, and the #pragma omp for directive will be simply ignored by the compiler. Therefore, a single thread only will execute the loop, and it will be rather difficult for a developer to find this out. Besides the #pragma omp parallel for directive, the error also may occur with the #pragma omp parallel sections directive.

3. Missing omp

A problem similar to the previous one occurs if you omit the omp keyword in an OpenMP directive [4]. Take a look at the following simple example:

Incorrectly:

#pragma omp parallel num_threads(2)
{
   #pragma single
   {
     printf("men");
   }
}

Correctly:

#pragma omp parallel num_threads(2)
{
   #pragma omp single
   {
     printf("men");
   }
}

The “me” string will be printed twice, not once. The compiler will report the “warning C4068: unknown pragma” warning. However, warnings can be disabled in the project’s properties or simply ignored by a developer.

4. Missing for

The #pragma omp parallel directive may be applied to a single code line as well as to a code fragment. This fact may cause unexpected behavior of the for loop, as shown below:

#pragma omp parallel num_threads(2)
for (int i = 0; i < 10; i++)
   myFunc();

If the developer wanted to share the loop between two threads, he should have used the #pragma omp parallel for directive. In this case, the loop would have been executed 10 times. However, the code above will be executed once in every thread. As a result, the myFunc function will be called 20 times. The correct version of the code is provided below:

#pragma omp parallel for num_threads(2)
for (int i = 0; i < 10; i++)
   myFunc();

5. Unnecessary parallelization

Applying the #pragma omp parallel directive to a large code fragment may cause unexpected behavior in cases similar to the one below:

#pragma omp parallel num_threads(2)
{
   ...    // N code lines
   #pragma omp parallel for
   for (int i = 0; i < 10; i++)
   {
      myFunc();
   }
}

In the code above a forgetful or an inexperienced developer who wanted to share the loop execution between two threads placed the parallel keyword inside a parallel section. The result of the code execution will be similar to the previous example: The myFunc function will be called 20 times, not 10. The correct version of the code should look like this:

#pragma omp parallel num_threads(2)
{
   ...    // N code lines
   #pragma omp for
   for (int i = 0; i < 10; i++)
   {
      myFunc();
   }
}

More by Author

Must Read