Data Replication Prototype Using ADO

Here is a version of a data replication prototyping attempt made almost nine years ago. Some parts of it might still be of interest. Initially, it used a proprietary library built on top of ODBC, but it can be easily modified to use any database interfacing layer. A demo application is drafted using ADO and the Nwind.mdb MS Access sample that comes with the MSDEV distribution (in MS Access 2000 or newer format) as well as the Northwind MS SQL 2000 database.

For this application to work properly with the latter one, the Cascade Delete Related Records option should be enabled on the relationship between the Order Details and Orders tables. Microsoft Jet 4.0 OLE DB Provider or Microsoft OLE DB Provider for MS SQL Server can be used. This utility is not limited by ADO/MSVC, though; for example, it easily can be used with data coming in the XML format utilizing XSLT or ported to Java or C#. Moreover, Fast (Bulk) Load functionality of MS SQL Server, that is available in the OLE DB interface (IRowsetFastLoad) but not in ADO, was added to the prototype.

This prototype engine uses real or artificially assigned unique indices of the inter-database meaning to identify records. Auto-Numbers or Counters are handled properly during the insertion of records. The replication plan is built by initializing collections of replication entry or link descriptors in the replicator class’s Init method. See DemoCopyHelper.cpp for an example.

There are four macros describing the structure and order of replication introduced in CopyHelper.h:

  • COPY_ENTRY
  • COPY_LINK
  • COPY_XLINK
  • COPY_RLINK

The replication graph structure consists of several ‘has-a’ object trees, possibly interfering and containing self-linked branches. The COPY_ENTRY macro describes tree root items. In the single item replication case, this means that all the data in the Table with the defined value of the defined Field should be replicated using the defined Filter for search and cleanup. The defined Field in the table being copied to will be replaced by ‘Id To’. To make database replication possible, there are COPY_ENTRY macros with a ‘No Filter’ Filter. It means that all data in the Table should be copied. The Field in this case should be the primary key (PK) one. In this case, ‘Id To’ and ‘Id From’ will be ignored. There is also a COPY_ENTRY_PK(ArrayId, SlaveTable, SlaveField) macro, where ArrayId is a pointer to an Array containing IDs of records to be replicated. The Predefined Field here is the table’s PK.

Entries start walking through a tree structure described by the COPY_LINK macros. This macro means that all Slave Table records with Slave (FK) Field values equal to the PK ones in the Master Table should be copied and Slave Field values in the table being copied to will be replaced by PK ones of the corresponding data receiving side Master Table record having been copied. “Indirect” entries referring to the link slave tables also exist. These entries do not start tree processing. They are used together with links to support filters consisting of several fields. Several entries can be applied to one table simultaneously. Links are always treated separately.

The COPY_XLINK macro describes links being outside ‘has-a’ trees (for example, denormalized tables). It requests updating Slave Field values in the Slave Table according to the PK in the Master Table. This x-link requires the Master Table to be replicated first. It is important because it can cause deadlock and failure of the replication process if links and x-links are constructed in the wrong way and cross each other. If the Master Table record with the proper PK is not available, and a matching record is not found using the Unique Index filter, a new record in the Master Table is created and its data is copied from the corresponding record on the input side.

R-link, created by the COPY_RLINK macro, is similar to the one created by COPY_XLINK. The main difference is that it does not require the Master Table to be replicated first. This macro is designed for links to reference tables. These links are out of scope of the tree-like replication structure.

X- and r-links share cache maps for input and output primary keys organized by master table type for replication optimization. During database migration, reference tables are copied, and, correspondingly, cache maps are filled before the main replication routine.

Most links require no customization. There is a set of template-based classes for other links. The base class CDataHandler contains virtual functions. Their implementations do nothing or execute default activities:


virtual BOOL DeleteRecord()
{
return GetTblCopyTo()->DeleteRecord();
}
virtual void CorrectTableData() {}
virtual BOOL AfterUpdate() { return TRUE; }

DeleteRecord() should be used for tables with records that cannot be deleted automatically. In this case, the link constructor should contain a SetLeaveData(lkLeaveUnique) call for COPY_ENTRY’s correct functioning, and only records matching by unique index will be deleted.

CorrectTableData() is introduced for tables where not all the data is set by the replication routine. It can be also used for data transformation.

AfterUpdate() is designed for activities that should take place after updating the actual table. This set of virtual functions can be extended easily.

google-sparsehash, an extremely memory-efficient hash_map and hash_set implementation [1] is used here to improve scalability.

Boost library [2], of version 1.32 or higher, is required for the prototype source compilation.

It is worth noting among implementation details that a convenient FOREACH macro by Eric Niebler [3], [4] was used for brevity.

References

[1] google-sparsehash Project

[2] Boost C++ Libraries

[3] Boost File Vault – the BOOST FOREACH macro

[4] Artima Developer – The C++ Source – Conditional Love: FOREACH Redux

More by Author

Get the Free Newsletter!

Subscribe to Data Insider for top news, trends & analysis

Must Read