Hamsterdb: a Small, Fast Database That Won't Weigh You Down

What's in the Cage?

Cristoph Rupp's hamsterdb is a lightweight, embedded database engine designed for ease of use, high performance, stability, and portability. In the database world, you have typically two extremes. On the high end, you have the full-featured and sometimes unwieldy Relational DBMS with SQL and a daemon/server process (such as Oracle). On the low end, there are b+tree-based systems, which are essentially just a database engine that is linked into the application and usually are without SQL support. As a lightweight database engine, hamsterdb fits into the latter category. It is very fast, but only supports the minimum needed operations. Specifically, it is embeddable, and therefore does not have the external dependencies or installation hassles of an SQL server. It is simply a database engine, but not a database management system (DBMS) and it has no relational functions or other features provided by SQL. For many apps that need to manage a lot of data, but don't need an externally accessible database for report writers or 3rd party tools, hamsterdb may be your "pet" solution.

Of course, embedded systems such as cellphones and other portable devices, where memory is at a premium, also will benefit from the lightweight hamsterdb. It also supports in-memory databases, which may be helpful for these platforms as well.

Hamsterdb prides itself on fast algorithms and efficient data structures guarantee high performance for all scenarios. Specifically, the design minimizes redundant disk access and memory allocations. For example, it chooses memory-mapped file operations over the slower read/write I/O mechanism when possible. Hamsterdb has been around the block a few times: it has hundreds of unit tests with an impressive coverage rate of over 90%, which is executed on each target OS before release.

Because it's written it generic ANSI-C, hamsterdb runs on many architectures including Intel/AMD, PowerPC, UltraSPARC, ARM, RISC, and others. Tested operating systems include Win32, Win64, Windows CE, Linux, and Solaris 9. The file formats are OS independent so you can read a file written in Solaris 9 on Windows and vice-versa. For object-oriented purists, there is indeed a ham::db class available. If you're working in Windows but outside the C/C++ environment, you still can access Hamsterdb via .NET, Java, or Python wrappers.

Hamsterdb has your choice of two hassle-free licenses: GNU Public License 2 for non-commerical use or a close-source license for commercial use where you buy as many developer seats as you need.

Getting Started with Hamsterdb

Of course, the first order of business is downloading the package. After you untar it, you'll see Visual Studio 2005 project files for building static and dynamic (DLL with import library) versions of Hamsterdb. Similar project files are also provided for a half-dozen or so sample programs. An additional set of project files provide buildable Windows CE library targets. Unlike a lot of open source projects, hamsterdb is completely self-contained, so there is no scavenger hunt to locate critical dependent libraries and tools.

The only glitch, if such could be said, was that the Doxygen generated HTML help files were not included. Fortunately, I am pretty familiar with Doxygen and used the following command to regenerate the docs:

cd \hamsterdb-1.0.4
doxygen documentation\doxyfile

In this article, you'll only have time to look at one example in-depth. As usual, it will be a simple example but explained thoroughly. The following test program demonstrates several basic features, such as the following:

  • Creating the database
  • Inserting data
  • Looking up data
  • Erasing data

Because it's kind of longish, I'll interweave code samples with explanation. If you want to look at the code unfettered by my commentary, that's okay too!

 1 #include <stdio.h>
 2 #include <string.h>
 3 #include <stdlib.h> /* for exit() */
 4 #include <ham/hamsterdb.h>
 5
 6 #define LOOP 10
 7
 8 int main(int argc, char **argv)
 9 {
10    int i;
11    ham_status_t st;        /* status variable */
12    ham_db_t *db;           /* hamsterdb database object */
13    ham_key_t key;          /* the structure for a key */
14    ham_record_t record;    /* the structure for a record */
15
16    memset(&key,    0, sizeof(key));
17    memset(&record, 0, sizeof(record));
18
19    st=ham_new(&db);
20    if (st!=HAM_SUCCESS)
21       error("ham_new", st);
22
23    st=ham_create(db, "test.db", 0, 0664);
24    if (st!=HAM_SUCCESS)
25       error("ham_create", st);

First, let me note that I've removed the #ifdefs for Windows CE in this example because they don't contribute to functional issues. The first real API call, ham_new() in line #20, gets you a db object you'll continue to use until you call ham_delete(). In line #23, you create the actual database as the second parameter, "test.db". If you wanted to use an in-memory database, you would simply pass in null here. The third parameter is a set of flags that you can use to tune performance. I'll only mention a few of the more interesting choices in passing:

  • HAM_WRITE_THROUGH: Immediately write modified pages to the disk. This slows down all database operations, but may provide integrity in case of a crash.
  • HAM_IN_MEMORY_DB: No file will be created, and the database contents are lost after the database is closed.
  • HAM_RECORD_NUMBER: Creates an "auto-increment" database.
  • HAM_ENABLE_DUPLICATES: Enable duplicate keys for this Database. By default, duplicate keys are disabled.
  • HAM_LOCK_EXCLUSIVE: Place an exclusive lock on the file. Only one process may hold an exclusive lock for a given file at a given time.
  • HAM_ENABLE_TRANSACTIONS: Enables Transactions for this database.

Even more goodies are available through ham_new_ex(). These let you tune the cache size, page size, and B+tree index key size.

26
27    for (i=0; i<LOOP; i++) {    //demonstrate insert functions
28       key.size=sizeof(i);
29       key.data=&i;
30       record.size=sizeof(i);
31       record.data=&i;
32       st=ham_insert(db, 0, &key, &record, 0);
33       if (st!=HAM_SUCCESS)
34          error("ham_insert", st);
35    }
36

In the next section (lines 26-36), you simply insert about 10 records with the datavalues 1 thru 10. The ham_insert() used on line #32 is the standard method of getting data into the database. The second parameter is a transaction handle (or null if you don't care to use transactions). The next parameter is the primary key you will associate with the record data. If you open the database with the HAM_RECORD_NUMBER flag, the system will generate this for you. The fourth parameter is nothing but a pointer to the data you're going to insert (note the size was set on line #30). Last, a set of insertion flags that can be HAM_OVERWRITE to replace a record with a matching key or HAM_DUPLICATE to force an additional record if the key already existed.

37    for (i=0; i<LOOP; i++) {    // retrieve the data
38       key.size=sizeof(i);
39       key.data=&i;
40
41       st=ham_find(db, 0, &key, &record, 0);
42       if (st!=HAM_SUCCESS)
43          error("ham_find", st);
44
45       if (*(int *)record.data!=i) {
46          printf("ham_find() ok, but returned bad value\n");
47          return (-1);
48       }
49    }

Hamsterdb: a Small, Fast Database That Won't Weigh You Down

The next thing you try is to query the database to find those records you just inserted. Because you're not using SQL, you are limited to matching exact records 1-for-1 based on keys. If youwanted to do some type of query and scan the database closely, you would have used the cursor functions (which are outside the scope of this limited introduction). The first two parameters to ham_find() are the database handle and transaction handle (or null). The function returns either HAM_SUCCESS or HAM_KEY_NOT_FOUND depending on how it went and fills up the record data if it worked.

50
51    st=ham_close(db, 0);
52    if (st!=HAM_SUCCESS)
53       error("ham_close", st);
54    st=ham_open(db, "test.db", 0);
55    if (st!=HAM_SUCCESS)
56       error("ham_open", st);
57
58    for (i=0; i<LOOP; i++) {    // delete the data
59       key.size=sizeof(i);
60       key.data=&i;
61       st=ham_erase(db, 0, &key, 0);
62       if (st!=HAM_SUCCESS)
63          error("ham_erase", st);
64    }

In lines 51-64, you're showing off that you can close and reopen the database and pick up right where you left off. The ham_erase() function works about like you think it would, being an implicit search-and-destroy in one step. It returns HAM_SUCCESS or HAM_KEY_NOT_FOUND.

66    for (i=0; i<LOOP; i++) {
67       key.size=sizeof(i);
68       key.data=&i;
69
70       st=ham_find(db, 0, &key, &record, 0);
71       if (st!=HAM_KEY_NOT_FOUND)
72          error("ham_find", st);
73    }
74
75    st=ham_close(db, 0);
76    if (st!=HAM_SUCCESS)
77       error("ham_close", st);
78
79    ham_delete(db);
80    printf("success!\n");
81    return (0);
82 }
83
84 void error(const char *foo, ham_status_t st)
85 {
86    printf("%s() returned error %d: %s\n", foo, st,
         ham_strerror(st));
87    exit(-1);
88 }
89

In this last segment, lines #66-74, you try to find those records you just deleted. Normally, you wouldn't do things exactly this way, but it's just a demo so you're trying lots of variations on a theme. Finally, you close down the database file with ham_close() and delete the instance with ham_delete() on line #79.

Conclusion

Well, if hamsterdb isn't the simplest record-manager API I've seen in the past 30 years, I don't know what is. Certainly, it has the easiest learning curve of any B+tree system I've ever used. There are a ton of features that I didn't even get very near in this demo, including:

  • Cursors for navigating the data
  • Encryption support
  • Custom-comparison functions for sorting
  • Zlib-based compression of record data
  • Transaction and rollback

If you're developing a lean-and-mean application, hamsterdb may be the small and fast addition that you need for simple database needs.

About the Author

Victor Volkman has been writing for C/C++ Users Journal and other programming journals since the late 1980s. He is a graduate of Michigan Tech and a faculty advisor board member for the Washtenaw Community College CIS department. Volkman is the editor of numerous books including, C/C++ Treasure Chest and is the owner of Loving Healing Press. He can help you in your quest for open source tools and libraries; just send an email to sysop@HAL9K.com



About the Author

Victor Volkman

Victor Volkman has been writing for C/C++ Users Journal and other programming journals since the late 1980s. He is a graduate of Michigan Tech and a faculty advisor board member for Washtenaw Community College CIS department. Volkman is the editor of numerous books, including C/C++ Treasure Chest and is the owner of Loving Healing Press. He can help you in your quest for open source tools and libraries, just drop an e-mail to sysop@HAL9K.com.

Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • If you need new tools and tricks to make your meetings profitable and productive, then 5 Tips in 5 Minutes: A Quick Guide for More Profitable Sales Meetings is for you. Timely, practical tips that you can incorporate in just seconds will save you literally hours in travel and meeting time, not to mention help you to focus on what your sales prospects really want to know and how you can meet their needs. Get 5in5: A Quick Guide for More Profitable Sales Meetings and start building your sales the smarter, faster …

  • "Security" is the number one issue holding business leaders back from the cloud. But does the reality match the perception? Keeping data close to home, on premises, makes business and IT leaders feel inherently more secure. But the truth is, cloud solutions can offer companies real, tangible security advantages. Before you assume that on-site is the only way to keep data safe, it's worth taking a comprehensive approach to evaluating risks. Doing so can lead to big benefits.

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds