Discover the Network Search Capabilities of the Boost Graph Library

The Boost C++ libraries, or simply Boost, are an unusual beast in the domain of language and library development. First and foremost, Boost provides free peer-reviewed portable C++ source libraries. Unlike application-specific solutions, Boost libraries are intended to be widely useful and usable across a broad spectrum of applications. Boost includes more than just containers, iterators, and algorithms; you also get a powerful Regular Expression (regex), threads, advanced math, I/O, memory management, and parsing libraries—just to name a few. The license encourages both commercial and non-commercial use.

More impressively, Boost has been pushing the C++ standard quite hard in two distinct ways:

  1. It has legitimized some bleeding-edge techniques, such as template metaprogramming.
  2. It has led the way for expanding the scope of the C++ Library like no other project has.

As a result, 10 Boost libraries are already included in the C++ Standards Committee's Library Technical Report #1 (TR1), which is a step toward becoming part of a future C++ Standard, and more Boost libraries are proposed for the upcoming TR2. (This push probably is due in large part to the participation of industry heavyweights such as Dave Abrahams, Nicolai Josuttis, and others who participate in ANSI/ISO C++ committees.)

Covering all the Boost libraries is beyond the scope of any single article. This article concentrates on the Boost Graph Library (BGL), which can offer a quick and easy solution for some graph searching problems you encounter. For Win32 developers, getting started requires only downloading and unzipping the source files. Unlike most traditional libraries, you don't need to link with any .LIB or .SO files. Because Boost uses the STL approach, using it is a matter of including the header files and compiling your own code. (Boost newbies should check out Beyond The C++ Standard Library: An Introduction to Boost by Björn Karlsson, 2005.)

Boost works on almost any modern operating system, including UNIX and Windows variants. In fact, most popular Linux and Unix distributions such as Fedora, Debian, and NetBSD come with pre-built Boost packages.

Boost Graph Library: What's a Graph Anyway?

If you expected to read about graphics when you saw Boost Graph Library, you may be disappointed. Graphs are not graphics, but rather a fundamental area of study in discrete mathematics. Basically, a graph is a set of vertices connected by edges. A vertex (or node) might represent a state of operation ("stopped"), a destination ("Los Angeles"), or whatever your imagination requires. The edges (or arcs) connect the nodes and may be one-way ("directed") or two-way ("undirected"). Edges can even be assigned a weight, which might represent distance, cost, or time. Our old friend the binary-tree is but one special example of a graph.

How does this relate to application development today? Well, if you used MapQuest this week, made a call over a long-distance network, used the Linked In contact manager, or ran a "link-checker" on your Web site, you implicitly entered data into someone's graph solver system. Nowadays, data mining has moved past the obvious connections and into the realm of discovering and optimizing relationships. To enable you to bring your applications into this realm, Boost integrates with your code in the same way that other template library algorithms and containers do.

Sample App: Six Degrees of Kevin Bacon

I asked author Jeremy Siek for help in producing a BGL demo that would both solve a problem that many people recognize and have some application to searching networks. We came up with a compact solution for the Six Degrees of Kevin Bacon game. In a nutshell, the idea is to link actor "X" with a movie that Kevin Bacon starred in using the minimum number of links. Confused? Here's an example showing the relationship between Star Wars' Mark Hamill and Kevin Bacon:

  1. Mark Hamill was in The Little Mermaid (1989) with Jim Cummings.
  2. Jim Cummings was in Balto (1995) with Kevin Bacon.

Therefore, Mark Hamill has a Bacon number of 2.

Aficionados of graph theory immediately recognize this as a classic breadth-first search problem. The breadth-first search (BFS) exhaustively searches a graph and can solve several queries, including finding the shortest path between two unweighted nodes and testing for bipartiteness. BFS leaves no stone unturned, so it is the perfect algorithm for when you require 100 percent coverage of the graph. An important characteristic of BFS is that it discovers vertices with smaller shortest-path distances before vertices with larger distances.

The following code is the demo application:

 1 //=====================================================
 2 // Copyright 2001 Jeremy G. Siek, Andrew Lumsdaine, Lie-Quan Lee,
 3 //
 4 // Distributed under the Boost Software License, Version 1.0.
 5 // (See accompanying file LICENSE_1_0.txt or copy at
 6 // http://www.boost.org/LICENSE_1_0.txt)
 7 //=======================================================
 8 #include <boost/config.hpp>
 9 #include <iostream>
10 #include <fstream>
11 #include <string>
12 #include <boost/tuple/tuple.hpp>
13 #include <boost/graph/adjacency_list.hpp>
14 #include <boost/graph/visitors.hpp>
15 #include <boost/graph/breadth_first_search.hpp>
16 #include <map>
17 #include <boost/graph/adj_list_serialize.hpp>
18 #include <boost/archive/text_iarchive.hpp>
19
20 struct vertex_properties {
21    std::string name;
22
23    template<class Archive>
24    void serialize(Archive & ar, const unsigned int version) {
25       ar & name;
26    }
27 };
28
29 struct edge_properties {
30    std::string name;
31
32    template<class Archive>
33    void serialize(Archive & ar, const unsigned int version) {
34       ar & name;
35    }
36 };
37
38 using namespace boost;
39
40 typedef adjacency_list<vecS, vecS, undirectedS,
41                        vertex_properties,
                          edge_properties> Graph;
42 typedef graph_traits<Graph>::vertex_descriptor Vertex;
43 typedef graph_traits<Graph>::edge_descriptor Edge;
44
45 class bacon_number_recorder : public default_bfs_visitor
46 {
47 public:
48    bacon_number_recorder(int* dist) : d(dist) { }
49
50    void tree_edge(Edge e, const Graph& g) const {
51       Vertex u = source(e, g), v = target(e, g);
52       d[v] = d[u] + 1;
53   }
54 private:
55    int* d;
56 };
57
58 int main()
59 {
60    std::ifstream ifs("./kevin-bacon2.dat");
61    if (!ifs) {
62       std::cerr << "No ./kevin-bacon2.dat file" << std::endl;
63       return EXIT_FAILURE;
64    }
65    archive::text_iarchive ia(ifs);
66    Graph g;
67    ia >> g;
68
69    std::vector<int> bacon_number(num_vertices(g));
70
71    // Get the vertex for Kevin Bacon
72    Vertex src;
73    graph_traits<Graph>::vertex_iterator i, end;
74    for (tie(i, end) = vertices(g); i != end; ++i)
75       if (g[*i].name == "Kevin Bacon")
76          src = *i;
77
78    // Set Kevin's number to zero
79    bacon_number[src] = 0;
80
81    // Perform a breadth first search to compute everyone's
      // Bacon number.
82    breadth_first_search(g, src,
83       visitor(bacon_number_recorder(&bacon_number[0])));
84
85    for (tie(i, end) = vertices(g); i != end; ++i)
86       std::cout << g[*i].name << " has a Bacon number of "
87          << bacon_number[*i] << std::endl;
88
89    return 0;
90 }

Discover the Network Search Capabilities of the Boost Graph Library

In the graph, each actor is assigned a vertex (or node). From this node, the code generates one edge for each movie that the actor was in. For example, if Kevin Bacon starred with Jim Cummins in Balto, the edge connecting Kevin Bacon and Jim Cummins is labeled Balto.

First, the code reads in a data file containing a series of records formatted with an actor's name, a movie they were in together, and another actors's name (see lines 60-68 of kevin-bacon.cpp). Specifically, this data is stored in an edge-adjacency list called simply Graph (declared in line 66, templated in line 40):

typedef
adjacency_list<vecS, vecS, undirectedS,
               vertex_properties, edge_properties> Graph;

One of the coolest things you get from BGL is the vertex iterators. Lines 71-76 show how to traverse all the vertices from graph "g" that you just read in above. Note the tie() function, which is a tuple helper that allows a std::pair to be broken into two scalers:

71    // Get the vertex for Kevin Bacon
72    Vertex src;
73    graph_traits<Graph>::vertex_iterator i, end;
74    for (tie(i, end) = vertices(g); i != end; ++i)
75       if (g[*i].name == "Kevin Bacon")
76          src = *i;

The solution then can set the bacon_number of Kevin Bacon to zero, which is required to compute distances later, in line 79. Vertices all have a unique number, so a vector sized up to the count of all vertices is appropriate to store that value. Finally, the solution is ready to run the breadth_first_search() algorithm. As the code visits each vertex, it invokes the bacon_number_recorder() from lines 45-56. Basically, inside this visitor function, the solution says that the shortest distance (or bacon number) between adjacent nodes u and v is the shortest distance to v plus one (in other words, the edge you are on):

81    // Perform a breadth first search to compute everyone's
      // Bacon number.
82    breadth_first_search(g, src,
83       visitor(bacon_number_recorder(&bacon_number[0])));
84
85    for (tie(i, end) = vertices(g); i != end; ++i)
86       std::cout << g[*i].name << " has a Bacon number of "
87          << bacon_number[*i] << std::endl;

Last, the solution dumps the output of the entire tree. This time, it traverses the list of vertices again. Thus, it gives you the following output:

Tom Hanks has a Bacon number of 1
Zoya Barantsevich has a Bacon number of 5
Nikolai Panov has a Bacon number of 4
Zoia Karabanova has a Bacon number of 3
William Challee has a Bacon number of 2
P. Biryukov has a Bacon number of 5

Solutions like this one are not just for fun and games; services such as Linked In, Plaxo, and Friendster are generating a new landscape called Social Computing.

Solve Graph Searching Problems

Hopefully, I have demonstrated a relatively quick and painless way of solving some of the classic graph searching problems in computer science. I only scratched the surface of what Boost can handle. Once you've committed your data structures to a graph representation, the possibilities of sorting, searching, and connecting networks are limitless.

For Additional Reading

Certainly anyone who wants to use the Boost Graph Library ought to immediately get his or her hands on The Boost Graph Library: User Guide and Reference Manual by Siek, Lee, and Lumsdaine (2002). As you might expect from the title, the book neatly divides into two nearly equal parts: the first half covers the User Guide, and the second half covers the Reference Manual.

The User Guide starts off with a review of basic concepts of both graphs and Generic Programming. It then delves into classic graph problems such as shortest-path, minimum spanning-tree, maximum flow, knight's tour, and others. A particularly nice touch is a chapter explaining how to integrate Boost code with LEDA and Stanford GraphBase (SGB).

[cover.jpg]

About the Author

Victor Volkman has been writing for C/C++ Users Journal and other programming journals since the late 1980s. He is a graduate of Michigan Tech and a faculty advisor board member for the Washtenaw Community College CIS department. Volkman is the editor of numerous books, including C/C++ Treasure Chest and is the owner of Loving Healing Press. He can help you in your quest for open source tools and libraries, just drop an e-mail to sysop@HAL9K.com.



About the Author

Victor Volkman

Victor Volkman has been writing for C/C++ Users Journal and other programming journals since the late 1980s. He is a graduate of Michigan Tech and a faculty advisor board member for Washtenaw Community College CIS department. Volkman is the editor of numerous books, including C/C++ Treasure Chest and is the owner of Loving Healing Press. He can help you in your quest for open source tools and libraries, just drop an e-mail to sysop@HAL9K.com.

Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • On-demand Event Event Date: September 10, 2014 Modern mobile applications connect systems-of-engagement (mobile apps) with systems-of-record (traditional IT) to deliver new and innovative business value. But the lifecycle for development of mobile apps is also new and different. Emerging trends in mobile development call for faster delivery of incremental features, coupled with feedback from the users of the app "in the wild." This loop of continuous delivery and continuous feedback is how the best mobile …

  • Packaged application development teams frequently operate with limited testing environments due to time and labor constraints. By virtualizing the entire application stack, packaged application development teams can deliver business results faster, at higher quality, and with lower risk.

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds