Logician: A Table-based Rules Engine Suite In C++/.NET/JavaScript using XML

Introduction

Application logic, particularly business rules, can be messy and time consuming to maintain in code. If all your application logic is hard-coded, it can eventually lead to massive if-then-else or select-case code segments that could grow into huge nightmares. Developers have more important problems to solve and things to do than to maintain a mountain of string compares, boolean tests, or stored procedures. There have been numerous attempts at rules engines (aka "inference engines"), but many of them require the writing of even more cryptic looking code that is hard if not impossible for non-developers to maintain. De-coupling business logic from an application certainly makes for more robust and maintainable code, and provides you the ability to let non-developer subject matter experts maintain the data and rules model, provided it is logical and easy to understand. Of course, you can always link parts your application data to a database, but that still requires a lot of developer work to define the data model and queries needed for every unique "rule-driven" event in your application, not to mention the possibility of DB performance bottlenecks in a networked or limited resource environment. A table-based rules engine can be a very powerful and flexible solution for your application logic and automation needs. In a web environment, the Logician JavaScript libraries can also offload a lot of server CPU onto the user’s browser and eliminate laggy server callbacks.

Decision Tables

A Decision Table, or "Truth Table" as a mathematician or electrical/computer engineer might call it, is simply a spreadsheet defining the possible solutions to a problem given a set of input conditions. For example, suppose we are mixing paint, and I show you the following spreadsheet:

PaintColor1  PaintColor2  ResultColor
Red          Blue         Purple
Red          Yellow       Orange
Blue         Yellow       Green

Without writing a single line of code, or even offering any more background information about the problem, the meaning of the data is clear. Programmatically, we can think of this as a series of if-then-else statements reading left to right, top to bottom:


if (PaintColor1 == "Red" && PaintColor2 == "Blue")
{
  ResultColor == "Purple"
}
else if (PaintColor1 == "Red" && PaintColor2 == "Yellow")
{
  ResultColor = "Orange"

}
//....etc

Or in SQL:


SELECT ResultColor WHERE PaintColor1 = @PaintColor1 AND PaintColor2 = @PaintColor2

Any one of these 3 solutions gets the job done, but the first is certainly easier to comprehend for non-developers and is the way a lot of practical engineering and/or business data is maintained in the real world. Using a DB to drive the rule certainly de-couples the logic from your source code to some extent, but in a web environment you will have to use server callbacks or web services to retrieve the data, slowing the application down. What if the logic needs to change, and we start mixing 3 colors of paint? For the DB, you might have to go back and change the DB table schema and your select statement/stored procedures. If you had a few hundred combinations of colors and then added a 3rd input parameter to the hard coding method, you are in for a whole lot of monotonous error-prone coding. If you had gone with the decision table, you likely would have very little work to do, other than to copy the new rules XML file (that a non-developer/subject matter expert likely edited for you) with the added 3rd input to your website or application. In this tutorial, I’ll show you how to use the open source Logician Suite to accomplish this task. With the Logician package you get 3 basic components, a decision table evaluator library, a decision table editor, and a dynamical data/class modeler and rules engine library.

Decision Table Engine Background

As stated before, the table rules read sequentially, for a given series of inputs, the output(s) are determined. In the example above they work top to bottom, left to right. The decision table evaluator (EDSEngine library) creates a truth table given the information supplied to it by your code and the stored XML rule data. By itself it is stateless, but it is easy to determine and supply the necessary variables. The basic steps that occur are:

1. Code determines it needs to evaluate a table, asks EDSEngine what input values from the current application "state" it needs.
2. EDSEngine loads the table and returns the list of inputs in the table.
3. Code provides the corresponding list of current values for those inputs.
4. EDSEngine evaluates the decision table and returns results.

You should be able to automate steps 1-3 in your code, depending on how you design your data model. Something as simple as this might work:

//C++
map<string, string> mAppData; //application state as attribute-value pairs
CKnowledgeBase m_TableEvaluator;

//...application stuff, you loaded the rules file, etc

string GetResultingColor()
{
  return GetSingleSolution("ColorMixingTable", "ResultColor");
}

string GetSingleSolution(string tableToEvaluate, string nameOfOutput) //could reuse this function for all similar aplication events

{
  vector<string> inputsNeeded = m_TableEvaluator.GetInputDependencies(tableToEvaluate);
  //from our application data, obtain the values
  for (int i = 0; i < inputsNeeded.size(); i++)
    m_TableEvaluator.SetInputValue(inputsNeeded[i], mAppData[inputsNeeded[i]]);

  vector<string> results = m_TableEvaluator.EvaluateTable(tableToEvaluate, nameOfOutput);
  //EDSEngine supports returning multiple true results on a sigle line, but in this case we expect just a single result (the first one it finds)


  if (results.size() > 0)
    return results[0];
  else
    return "";
}

See the source code for this example in the ColorMixConsole application. Rule tables are stored as XML, and when "compiled" by the DecisionLogic table editor utility, linked together in a single XML file. All the values stored within the rules engine are natively strings since they get serialized to XML. In order to optimize performance, string compares are avoided when possible by numerically tokenizing all of the stored values in the rules table and any input values passed in. That way, it is just comparing numbers most of the time. So in memory, the previous paint color table looks more like:

PaintColor1   PaintColor2   ResultColor
0             1             3
0             2             4

1             2             5

Suppose we pass "Blue" and "Yellow" to the previous paint table. The values for PaintColor1 and PaintColor2 that we are testing are likewise assigned 1(Blue) and 2(Yellow). You can also perform the following boolean operations on an input value, de-tokenization will occur:
> : greater-than, alpha or numerical
< : less-than, alpha or numerical
!= or <> : not-equal to
[x,y] : range of values, inclusive ends
(x,y) : range of values, exclusive ends. You can mix [] and ()
= : not used explicitly, this is the default behavior for a rule cell and does not require the string to be de-tokenized

At run-time, once you pass in the input values for the table, it is broken down sequentially into a series of boolean cells, where the value of each cell is either true or false. Any input cell that you leave blank is always considered true. So if we passed the values PaintColor1 = "Blue" and PaintColor2 = "Yellow" our previous decision table looks a lot like a logical AND gate:

PaintColor1   PaintColor2   ResultColor
F             F             F
F             T             F
T             T             T <==This is our solution, corresponding to the tokenized memory value of 5, whose string value is "Green"

Other EDSEngine Features of Note 

You can specify more than one value in an input cell, this is called an "OR", and the test will check them against the input value just like an "or" in code: if (value1 = test || value2 = test || value3 = test ) then do something…is abbreviated as value1|value2|value3 in a cell. There is also a notion of "Global ORs" if you design the rules XML using the table designer tool, DecisionLogic. Listing many values out can be a lot of extra typing, so you can define a single list of values as a variable and reuse that variable in all of your project tables. In an output cell, the "|" delimiter acts like an "and" (&&). In this way your solution can return multiple values. The results of a table evaluation are always returned as an array (vector in C++). There is also the notion of a table being "Get One" or "Get All", which means the table designer intended for you to either return just the result of the first true row, or the combined unique results of all true rows. This is selectable in the DesicionLogic designer for every table. You of course always have the option to override it in code.

You can dynamically concatenate values into cells at run-time using the get() keyword. For instance, suppose we need an output of text for a price list display, and want to drive the text by rule. We might want it to read: "You have purchased X items of price P". In the table we could create an output: "You have purchased get(QtyOfItems) of price get(ItemPrice)" where QtyOfItems and ItemPrice are values in my application state that would have been supplied. You can also use a get() in an input to create a more dynamic test. Instead of an input cell of ">55", it could be ">get(SomeValue)".

Run-time scripting with Python (C++/C#) and JavaScript (All ports) are supported in output cells so you can perform mathematical calculations and implement more advanced rules. Your output cell will just contain the Python or JavaScript code snippet within the proper keyword, js() or py(). For a single line of code it might look like:

js(return (56 * 3).toString())

//Note: you can actually omit the "return" and ".toString()" for a single line of code:
js(56 * 3)
//Combine eqautions and variables
js(56 * get(MultValueFromCode))

If your code has multiple lines/functions make sure it explicitly returns a string at the end or you will get a type-casting error. This becomes more useful when combined with the get() keyword like: js(get(value1) * get(value2)). Also note that Python is only supported in the C++/C# implementation of EDSEngine. JavaScript-based scripting is a bit more portable for the web being the native run-time scripting language of web browsers.

A rather advanced but flexible feature is callback parameters. There are special table evaluation functions with overloads to support passing additional data to EDSEngine, that is also passed to the JavaScript or Python code (EvaluateTableWithParameter). The basic idea is maybe you want to send some text or XML data from your application to a rule, modify it in the script, and pass the modifications back along with the usual result. You can find more details in the developer’s documentation if the feature might be useful to you.

Relational Object Model and Implementing a Rules Engine

The use of the Relational Object Model library will demonstrate how you can extend EDSEngine with your own features for a full-blown rules engine. Instead of writing explicit classes to model physical products in an eCommerce setting, it may be useful to model the product using a tree-like object structure, similar to XML. For instance, suppose we were modeling a car. We might write C++ classes like:

class CPriceableItem
{
public:
  CPriceableItem();
  string CatalogNumber;
  double Price;
  double Cost;
};


class CEngine : public CPriceableItem
{
public:
  CEngine();
  string EngineType;
};

class CTires : public CPriceableItem
{
public:
  CTires();
  string TireType;
};
//etc, keep inheriting and adding special attributes to each class

If we model the whole Car as XML and work with it directly, the final state could instead be formed like:

<Object name='Car'>
  <Object name='Engine'>

    <Attribute EngineType='V6' Price='9000' Cost='4000' CatalogNumber='V6-OCTC-GM'></Attribute>
  </Object>

  <Object name='Tires'>
    <Attribute TiresType='17inch' Price='500' Cost='175' CatalogNumber='GY17'></Attribute>
  </Object>

  <!--And so forth.....-->
</Object>

In the remainder of the tutorial we will take advantage of ROM’s built in automation with EDSEngine. You may also find it convenient not to have to write code for many algorithms you may need, such as to sum up the total price of the Car object. ROM supports treating the data like XML and supports XPATH queries. You could just use an XPATH query to get the total price of the Car object, and could even do it in a output cell of a table rule using the eval() keyword: "Total price is eval(sum(//Attribute[@Price]))", yielding the final text result of "Total price is 9500".

When using the ROM component, decision tables are evaluated against a particular "Object" node context, and can drill down into the parent nodes when an input dependency value is not found in the current context. You can also use XPATH queries in your input column headers instead of dealing with input values in code, or to specify a relation between multiple "Object" nodes. See the project documentation for more information. It should be noted that the internal data storage mechanism is not XML since that would create a performance bottleneck. However, the current state can be serialized at any time, and is updated whenever a query is made.

More by Author

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Must Read