C++ Programming: Static Code Analysis and the New Language Standard C++0x

Abstract

The article discusses the new capabilities of C++ language described in the standard C++0x and supported in Visual Studio 2010. By the example of PVS-Studio we will see how the changes in the language influence static code analysis tools.

Introduction

The new C++ language standard is about to come into our life. They are still calling it C++0x, although its final name seems to be C++11. The new standard is partially supported by modern C++ compilers, for example, Intel C++ and Visual C++. This support is far from being full-fledged and it is quite clear why. First, the standard has not been accepted yet, and second, it will take some time to introduce its specifics into compilers even when it is accepted.

Compiler developers are not the only ones for whom support of the new standard is important. The language innovations must be quickly provided with support in static source code analyzers. It is promised that the new standard will provide backward compatibility. The obsolete C++ code is almost guaranteed to be able to be correctly compiled by new compilers without any modifications. But it does not mean that a program that does not contain new language constructs still can be processed by a static analyzer that does not support the new standard C++0x. We got convinced of it in practice when trying to check a project created in the beta-version of Visual Studio 2010 with PVS-Studio. The point is about the header files that already use the new language constructs. For example, you may see that the header file “stddef.h” uses the new operator decltype:

namespace std { typedef decltype(__nullptr) nullptr_t; }

Such constructs are naturally considered syntactically wrong by an analyzer that does not support C++0x, and either cause a program abort or incorrect results. It got obvious that we must provide support for C++0x in PVS-Studio by the moment Visual Studio is released, at least to the extent it is done in this compiler.

We may say that we have fulfilled this task with success, and by the moment of writing this article, the new version PVS-Studio 3.50, integrating both into Visual Studio 2005/2008 and Visual Studio 2010, has become available on our site. Beginning with the version PVS-Studio 3.50, the tool provides support for the same part of C++0x standard as in Visual Studio 2010. This support is not perfect as, for example, in case of “right-angle brackets”, but we will continue the work on developing the support for C++0x standard in the next versions.

In this article, we will study the new features of the language which are supported in the first edition of Visual Studio 2010. We will look at these features from different viewpoints: what this or that new ability is about, if there is a relation to 64-bit errors, how the new language construct is supported in PVS-Studio and how its appearance impacts the library VivaCore.

Note. VivaCore is a library of code parsing, analysis and transformation. VivaCore is an open-source library that supports the languages C and C++. The product PVS-Studio is based on VivaCore as well as other program projects may be created relying on this library.

The article we want to present may be called a report on the investigation and support of the new standard in PVS-Studio. The tool PVS-Studio diagnoses 64-bit and parallel OpenMP errors. But since the topic of moving to 64-bit systems is more relevant at the moment, we will mostly consider examples that show how to detect 64-bit errors with PVS-Studio.

1. auto

Like in C, the type of a variable in C++ must be defined explicitly. But with the appearance of template types and techniques of template metaprogramming in C++ language, it became usual that the type of an object is not so easy to define. Even in a rather simple case – when searching for array items – we need to define the type of an iterator in the following way:

for (vector<int>::iterator itr = myvec.begin();
     itr != myvec.end();
     ++itr)

Such constructs are very long and cumbersome. To make the record briefer, we may use typedef but it will spawn new entities and do little for the purpose of convenience.

C++0x offers its own technique to make this issue a bit less complicated. The meaning of the key word auto is replaced with a different one in the new standard. While auto has meant before that a variable is created in the stack, and it was implied if you had not specified otherwise (for example, register), now it is analogous to var in C# 3.0. The type of a variable defined as auto is determined by the compiler itself relying on what object initializes this variable.

We should notice that an auto-variable cannot store values of different types during one instance of program execution. C++ still remains a statically typed language, and by using auto we just tell the compiler to see to defining the type on its own: once the variable is initialized, its type cannot be changed.

Now the iterator can be defined in this way:

for (auto itr = myvec.begin(); itr != myvec.end(); ++itr)

Besides mere convenience of writing the code and its simplification, the key word auto makes the code safer. Let us consider an example where auto will be used to make the code safe from the viewpoint of 64-bit software development:

bool Find_Incorrect(const string *arrStr, size_t n)
{
  for (size_t i = 0; i != n; ++i)
  {
    unsigned n = arrStr[i].find("ABC");
    if (n != string::npos)
      return true;
  }
  return false;
};

This code has a 64-bit error: the function behaves correctly when compiling the Win32 version and fails when the code is built in the Win64 mode. The error is in using the type unsigned for the variable “n”, although the type string::size_type must be used which is returned by the function find(). In the 32-bit program, the types string::size_type and unsigned coincide and we get correct results. In the 64-bit program, string::size_type and unsigned do not coincide any more. When the substring is not found, the function find() returns the value string::npos that equals 0xFFFFFFFFFFFFFFFFui64. This value is cut to the value 0xFFFFFFFFu and placed into a 32-bit variable. As a result, the condition 0xFFFFFFFFu != 0xFFFFFFFFFFFFFFFFui64 is true and we have the situation when the function Find_Incorrect always returns true.

In this example, the error is not so dangerous because it is detected even by the compiler not to speak of a specialized analyzer Viva64 (included into PVS-Studio).

This is how the compiler detects the error:

warning C4267: 'initializing' :
conversion from 'size_t' to 'unsigned int', possible loss of data

This is how Viva64 does it:

V103: Implicit type conversion from memsize to 32-bit type.

What is most important, this error is quite possible and often occurs in code due to inaccurate choice of a type to store the returned value. The error might appear even because the programmer is reluctant to use a cumbersome construct of the string::size_type kind.

Now we can easily avoid such errors without overloading the code. Using the type auto, we may write the following simple and safe code:

auto n = arrStr[i].find("ABC");
if (n != string::npos)
  return true;

The error disappeared by itself. The code has not become more complicated or less effective. Here is the conclusion – it is reasonable in many cases to use auto.

The key word auto will reduce the number of 64-bit errors or let you eliminate them with more grace. But auto does not in itself guarantee that all the 64-bit errors will be eliminated! It is just one more language tool that serves to make programmers’ life easier but not to take all their work of managing the types. Consider this example:

void *AllocArray3D(int x, int y, int z,
                   size_t objectSize)
{
  int size = x * y * z * objectSize;
  return malloc(size);
}

The function must calculate the array’s size and allocate the necessary memory amount. It is logical to expect that this function will be able to allocate the necessary memory amount for the array of the size 2000*2000*2000 of double type in the 64-bit environment. But the call of the “AllocArray3D(2000, 2000, 2000, sizeof(double));” kind will always return NULL, as if it is impossible to allocate such an amount of memory. The true reason for this is the overflow in the expression “int size = x * y * z * sizeof(double)”. The variable size takes the value -424509440 and the further call of the function malloc is senseless. By the way, the compiler will also warn that this expression is unsafe:

warning C4267: 'initializing' :
conversion from 'size_t' to 'int', possible loss of data

Relying on auto, an inaccurate programmer may modify the code in the following way:

void *AllocArray3D(int x, int y, int z,
                   size_t objectSize)

{
  auto size = x * y * z * objectSize;
  return (double *)malloc(size);
}

But it will not eliminate the error at all and will only hide it. The compiler will not generate a warning any more but the function AllocArray3D will still return NULL.

The type of the variable size will automatically turn into size_t. But the overflow occurs when calculating the expression “x * y * z”. This subexpression has the type int at first and only then it will be extended to size_t when being multiplied by the variable “objectSize”.

Now this hidden error may be found only with the help of Viva64 analyzer:

V104: Implicit type conversion to memsize type in an
arithmetic expression.

The conclusion – you must be attentive even if you use auto.

Let us now briefly look how the new key word is supported in the library VivaCore the static analyzer Viva64 is based on. So, the analyzer must be able to understand that the variable AA has the type int to warn (see V101) the programmer about an extension of the variable AA to the type size_t:

void Foo(int X, int Y)
{
  auto AA = X * Y;
  size_t BB = AA; //V101
}

First of all, a new table of lexemes was composed that included the new C++0x key words. This table is stored in the file Lex.cc and has the name tableC0xx. To avoid modifying the obsolete code responsible for processing the lexeme “auto” (tkAUTO), it got the name tkAUTOcpp0x in this table.

With the appearance of the new lexeme, the following functions were modified: isTypeToken, optIntegralTypeOrClassSpec. A new class LeafAUTOc0xx appeared. TypeInfoId has a new object class – AutoDecltypeType.

To code the type auto, the letter ‘x’ was chosen and it was reflected in the functions of the classes TypeInfo and Encoding. These are, for example, such functions as IsAutoCpp0x, MakePtree.

These corrections let you parse the code with the key word auto that has a new meaning and save the type of objects in the coded form (letter ‘x’). But this does not let you know what type is actually assigned to the variable. That is, VivaCore lacks the functionality that would let you make sure that the variable AA in the expression “auto AA = X * Y” will have the type int.

This functionality is implemented in the source code of Viva64 and cannot be integrated into the code of VivaCore library. The implementation principle lies in additional work of calculating the type in TranslateAssignInitializer method. After the right side of the expression is calculated, the association between the (Bind) name of the variable and the type is replaced with another.

2. decltype

In some cases it is useful to “copy” the type of some object. The key word auto determines the type relying on the expression used to initialize the variable. If the variable is not initialized, you may use the key word decltype to determine the type of the expression during compilation. Here is an example of code where the variable “value” has the type returned by the function Calc():

decltype(Calc()) value;
try {
  value = Calc();
}
catch(...) {
  throw;
}

You may use decltype to define the type:

void f(const vector<int>& a,
       vector<float>& b)
{
  typedef decltype(a[0]*b[0]) Tmp;
  for (int i=0; i<b.size(); ++i)
  {
    Tmp* p = new Tmp(a[i]*b[i]);
    // ...
  }
}

Keep in mind that the type defined with decltype may differ from that defined with auto.

const std::vector<int> v(1);
auto a = v[0];
decltype(v[0]) b = 1;
// type a - int
// type b - const int& (returned value
// std::vector<int>::operator[](size_type) const)

Let us look at another sample where decltype can be useful from the viewpoint of 64 bits. The function IsPresent searches for an element in a sequence and returns true if it is found:

bool IsPresent(char *array,
               size_t arraySize,
               char key)
{
  for (unsigned i = 0; i < arraySize; i++)
    if (array[i] == key)
      return true;
  return false;
}

This function cannot work on a 64-bit system with large arrays. If the variable arraySize has a value more than UINT_MAX, the condition “i < arraySize” will never be fulfilled and an eternal loop will occur.

If we use the key word auto, it will not change anything:

for (auto i = 0; i < arraySize; i++)
  if (array[i] == key)
    return true;

The variable “i” will have the type int because 0 has int type. The appropriate correction of the error lies in using decltype:

for (decltype(arraySize) i = 0; i < arraySize; i++)
  if (array[i] == key)
    return true;

Now the counter “i” has the type size_t as well as the variable arraySize.

decltype in VivaCore library is supported much like auto. A new lexeme tkDECLTYPE was added. The parsing function rDecltype in the file Parser.cc was added. With the appearance of the new lexeme they had to modify the function optIntegralTypeOrClassSpec. A new class LeafDECLTYPE appeared.

To code the type returned by the operator decltype, the character ‘X’ was chosen (capital ‘X’ unlike lower-case ‘x’ used for auto). Because of this, the functionality of the classes TypeInfo and Encoding changed too: for example, the functions WhatIs, IsDecltype, MakePtree.

The functionality of calculating the types for decltype operator is implemented in the class Environment and included into VivaCore library. The type is calculated while writing a new variable/type into Environment (the functions RecordTypedefName, RecordDeclarator, RecordConstantDeclarator). The function FixIfDecltype is responsible for calculating the type.

More by Author

Must Read