SHARE

BOOST Multi-Index Container and Transaction Storage

Introduction STL provides a wide range of ordered and sorted collections, such as vector, list, set and map. All of these collection are ordered or sorted according to a single criterion. This sorting criterion must be specified at design time. Even if the STL library is flexible when choosing the sorting criteria, the collections are […]

Written By

CodeGuru Staff

Jun 10, 2009

5 minute read

CodeGuru content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

Introduction

STL provides a wide range of ordered and sorted collections, such as vector, list, set and map. All of these collection are ordered or sorted according to a single criterion. This sorting criterion must be specified at design time. Even if the STL library is flexible when choosing the sorting criteria, the collections are still confined to a single sorting order.

Sometimes it is necessary to have more than one searching option for the same collection. Consider the example of a request-reply asynchronous protocol. The pending transactions must be sorted according to a transaction ID. It allows efficient matching between the pending request and the arrived reply by transaction ID. But, it often requirs thepossibility to report all pending transactions for given user. It will require the collection to be sorted according to the user ID as well. To solve the problem, you can use two sorted collections pointing to the same data: one is sorted by transaction ID; another is sorted by user ID. Such approach, however, requires a lot of efforts to keep both collections synchronized.

Fortunately the BOOST library provides very convenient solution for the problem the boost::multi_index_container class. The class has reach interface and it is worth considering commonly used techniques on a modeling example.

Transaction Manager as Multi-Index Container

Our modeling example refers to implementation of a request-reply asynchronous protocol. “Asynchronous” means that an application is not blocked waiting the reply; instead it gets notification when the reply arrived.

The application issues requests to different servers. Each server is identified by its endpoint – host and port pair. Each issued request should be stored as a transaction in a container. When a reply from a server arrives the container will be searched for matching pending transaction. It is assumes (as most protocols do) that the matching is done according to unique transaction id. It is also assumed that the transaction ID is unique in respect to any given server. Actually, only the combination transaction of ID + endpoint is unique.

This is why the container should be sorted by the “transaction ID + endpoint” key.

The best-effort transport model (UDP like) is assumed – when messages are usually delivered to its destination but there is a probability that a message is lost. Hence a pending request can wait for its reply forever. To avoid the container overflow, the pending requests must have limited life time. If a request is not answered during certain period, the request will be re-sent. If such re-sent request is still not answered after certain timeout, it will be dropped as expired.

To implement this timeout function the application needs to scan the transaction collection periodically. Normally the transaction collection contains thousands of pending transactions with only a small fraction of them expired. Hence successive scanning of the collection would be ineffective because the scanning time increases linearly with the collection size. It is a better idea to sort the collection by expiration time and scan only really expired transactions. Note that because a new transaction will be expired later than any already stored transaction, the new transaction should be inserted always into end of the collection. Hence we can use “hinted” insertion that takes constant time. Therefore the collection should be sorted by the expiration time as well.

Also the collection should be ordered by the user ID. The necessity in last sorting index came from the requirement to produce reports sorted by user ID. As a result, a sorted collection with three indices — by expiration time, by user id and by transaction id — is needed. To define such multi-index collection use the boost::multi_index_container class:

/** Stored transaction entry */
struct entry
{
  /** possible states */
  enum state_type {pending, resent, succeeded, failed};

  endpoint_type peer; /** network peer */
  tx_id_type tx_id; /** transaction id */
  boost::system_time expiration; /** expiration time */
  user_info info; /** user specific data */
  mutable state_type state; /** current transaction state */

};

/** comapres 2 entries for using as identity key */
inline bool operator<(const entry& l, const entry& r)
{
  if(l.info.group_id==r.info.group_id)
    return l.info.user_id<r.info.user_id;

  return l.info.group_id<r.info.group_id;
}

/** Multi index storage */
class mi_map_type :
  public boost::multi_index::multi_index_container
<
  // stored class
  entry,
  // all indices enumeration:
  boost::multi_index::indexed_by
  <
    ...// index view definition #0
    ...// index view definition #1
    ...// index view definition #2
  >
>
{};

To define an indexed view, first of all, you should specify its type. The BOOST library supports the following index types:

Ordered (unique and non-unique) indices sort the elements like std::set/std::multiset.

Sequenced indices arrange the elements as if in std::list.

Random access indices provide interface similar to std::vector, without memory contiguity, however

Hashed indices provide fast access to the elements through hashing techniques.

Because our collection has more than one index view, a way is needed to specify exactly what index view are being used. For example, the find() method will return different results depending on selected index view.

The index view can be specified by its cardinal number. As usual the numeration starts from zero. For example:

mi_map_type::nth_index<2>::type::iterator it=
  m_map.get<2>().find();

It means searching by the index view #2 – by transaction id.

As a better option, the indices can be assigned tags. With assigned tags the example above can be re-written in more readable form:

mi_map_type::index<full_tx_id>::type::iterator it=
  m_map.get<full_tx_id >().find();

The examples below demonstrate different ways of indexed view definitions.

The First Indexed View – By Expiration Time

Our first indexed view is ordered by expiration time. As the key the view uses the expiration field. The index is non-unique because two different transactions can have exactly the same expiration time. The index should be sorted in direct order, because you want the oldest transaction be first.

The first index definition looks like:

/** tag structures */
struct expiration_time{};

// 1st index: non unique index by expiration time
// used for time-out checking

boost::multi_index::ordered_non_unique
<
  // tag
  boost::multi_index::tag<expiration_time>,
  // member as the key
  boost::multi_index::member
  <
    entry, // referred structure
    boost::system_time, // member type
    &entry::expiration // pointer to the member
  >,
  // direct order, default predicat can be omitted
  std::less<boost::system_time>
>

The ordered_non_unique template has three parameters. The first parameter is optional and it assigns “tag” to the view. Generally any C++ type can be used as a tag, but it is more convenient to use dedicated structure, struct expiration_time in our case. The second parameter is the key specification and it is described in details below. The third parameter is optional, it specifies associated comparison predicate, which must order the keys in a less-than fashion. The default predicate value is std::less.

To specify a field as sorting key we need use the multi_index::member template. This template in its turn has 3 parameters. The first parameter is the referred structure type, entry in our case. The second parameter is type of the referred structure field, boost::system_time, in out case. The last parameter is pointer of the referred structure field, in our case it is pointer to the entry::expiration field.

// member as the key
boost::multi_index::member
<
  entry, // referred structure
  boost::system_time, // member type
  &entry::expiration // pointer to the member
>

The Second Indexed View – By Full User Id

Our second indexed view is ordered by full user id. “Luckily” the entry structure has operator<() that sorts objects according to the full user id. To define this operator<() as the sorting key we should use the boost::multi_index::identity template as it shown below:

/** tag structures */
struct identity_id {};

// 2nd index: non unuque index using operator<()
// used for sorted reports

boost::multi_index::ordered_non_unique
<
  // tag
  boost::multi_index::tag<identity_id>,
  // object as index
  boost::multi_index::identity<entry>
>