Introduction
In the previous article, I introduced an alternative database solution for .NET, the post-relational database Matisse. I used SQL programming demos to demonstrate its object-oriented features such as user-defined types, inheritance, and polymorphism.
Before we start writing .NET programs for Matisse, we need to know how to define database schemas. Although this is slightly different from relational databases, this is not difficult. If you are familiar with UML, you are almost ready to start. The key differentiator in schema definition has to do with the way you define relationships. Relationships work in ways similar to the relational Primary Key – Foreign Key concept, but they come with some specific advantages, as I will demonstrate in this article.
This article is part of a series. The next ones will cover .NET programming step by step, which is the ultimate goal of this series of articles.
UML, SQL DDL, or ODL?
Matisse supports three types of schema definition languages, UML, SQL DDL, and ODL (Object Definition Language). So, which one should you use? The answer is whichever one you like, because you can switch from one to the other whenever you need. For instance, you can define the database schema with SQL DDL at first, and then export the schema into an ODL file for a later update.
Here are a few tips:
SQL DDL is very handy when you need to do some tests, e.g., adding a class with a couple of attributes, or defining/dropping an index on an attribute. DDL is also useful when you need to migrate a relational schema into Matisse.
If you are familiar with C++ or IDL (Interface Definition Language), ODL is easy to understand. With ODL, you define your schema in a declarative way.
Rational Rose UML provides you with a more intuitive understanding of the model, and helps categorize a large schema into packages. However, Rational Rose does not come with a standard installation of Matisse.
Classes and Attributes
There are only three fundamental elements in schema definition for Matisse. These are class, attribute, and relationship. First, let us look at class and attribute.
Class is like a relational table, and attribute is like a relational column. The major differences are:
- You can use inheritance with classes.
- You can define methods for classes.
- Attribute can be of list type, e.g., list of integers.
Here is the definition of the class Manager in DDL, which inherits from Employee and has an attribute Title:
CREATE CLASS Manager UNDER Employee ( Title VARCHAR(64) );
Note that “CREATE CLASS” and “CREATE TABLE” are identical. By default, an attribute defined by DDL is NULLABLE. If the Title attribute cannot be NULL, the definition should use NOT NULL:
CREATE CLASS Manager UNDER Employee ( Title VARCHAR(64) NOT NULL );
Do you need to define a primary key in each class? Basically, the answer is no, because each class has an implicit OID attribute. But, if relational columns used as primary keys are meaningful in the application domain, you will obviously include these attributes in the class.
All the available built-in data types are listed in the Matisse Data Type Reference.
Relationships
The third fundamental element in schema definition is the relationship. A relationship defines an association between two classes. It works like the relational Primary Key – Foreign Key concept, but relationship is more intuitive, easier to maintain, and works faster for queries.
I will use the same schema as the one that I used in my first article. There are two relationships in the UML diagram: ManagedBy/Manages and Members/WorksIn.
The first relationship, ManagedBy/Manages, associates the Project class with the Manager class, and puts two constraints, which are:
- A Manager manages zero or more projects, and
- A Project always needs to be managed by exactly one Manager.
If you are using Rational Rose, you simply export the diagram to a Matisse database. Then, the database has the relationship with the constraints. The relevant part of the equivalent SQL DDL statements look like this:
CREATE CLASS Manager UNDER Employee ( /* other properties here */ Manages REFERENCES (Project) INVERSE Project.ManagedBy ); CREATE CLASS Project ( /* other properties here */ ManagedBy REFERENCES (Manager) CARDINALITY (1, 1) INVERSE Manager.Manages );
INVERSE is a new syntax to indicate that the two references (Manages and ManagedBy) are related to each other. Another new syntax is CARDINALITY, which specifies the minimum and maximum number of objects that can participate in the relationship. CARDINALITY can be omitted, in which case the cardinality of the relationship is 0 to many (the case for Manages in class Manager). Although the most common cardinalities are (0, 1), (1, 1), (0, n) and (1, n), you may use any number like (0, 5).
Some of the things I really like about the Matisse schema definition are that you do not need to write triggers to maintain referential integrity, worry about orphan rows after deletion or update of parent rows, or use some tricks for insertion of rows. You never have orphan rows (or dangling reference) problems with Matisse. Matisse always enforces the referential integrity.
The other relationship, Members/WorksIn, associates the class Employee and the class Project with many-to-many cardinality. With relational databases, you need to introduce an intermediate table in this case, but this is not needed here with Matisse. You simply express the many-to-many relationship:
CREATE CLASS Project ( -- other properties here Members REFERENCES (Employee) CARDINALITY (1, -1) -- here -1 indicates many INVERSE Employee.WorksIn ); CREATE CLASS Employee ( -- other properties here WorksIn REFERENCES (Project) INVERSE Project.Members );
Although both relationships above are bi-directional, mono-directional relationships are also available. Suppose that a project consists of many sub tasks (class Task), a task object does not need to know to which project it belongs. In other words, you do not have to navigate from a task object to a project object. Then, the class definition in DDL would look like this:
CREATE CLASS Project (
/* other properties here */
Tasks REFERENCES (Task)
);
Indexes and Entry-Point
After completing your logical model, you may need to define some indexes for better performance. When you define an index on an attribute (or up to four attributes), there are two things you need to know:
- The attribute(s) for the index criteria must be “NOT NULL“, i.e., not NULLABLE.
- If the attribute is of the string type, it needs to be defined with maximum size using VARCHAR(n).
In order to define an index on BirthDate of Employee and another one on Name of Employee, the class needs to be defined like this:
CREATE TABLE Employee ( Name VARCHAR(255) NOT NULL, BirthDate DATE NOT NULL, -- others );
The next DDL statements define an index on the attribute BirthDate of the Employee class, and another index on the attribute Name of Employee.
CREATE INDEX birthdate_idx ON Employee (BirthDate); CREATE INDEX emp_name_idx ON Employee (Name);
Matisse also has a full-text indexing feature, which is called ‘Entry-Point dictionary’. For example, the next DDL statement defines an entry-point dictionary with full-text indexing on the Description attribute of Project:
CREATE ENTRY_POINT DICTIONARY proj_desc_ep_dict ON Project (Description) MAKE_ENTRY "make-full-text-entry";
The next SELECT query returns projects whose description contains the word ‘.NET’:
SELECT * FROM Project WHERE ENTRY_POINT(proj_desc_ep_dict) = '.NET';
Schema Templates
In the Enterprise Manager’s Query Analyser window, you can get the basic schema templates in SQL DDL by right-clicking in the window. There are templates for defining classes with inheritance or relationships, SQL methods, and indexes. Note that there is also Help in the same menu, which covers other DDL statements.
Summary and Next Article
I outlined here the basics of schema definition with Matisse.
Although relationships may be a new notion as the way to define database schema for some developers, I believe it is not difficult but actually intuitive, especially for those already familiar with UML modeling. If you have questions, do not hesitate to post a comment to this article. I will be happy to respond.
My next article will describe the first steps for .NET programming: how to access the database from .NET.
Appendix 1: The Complete SQL DDL Used in this Article
CREATE TABLE Task ( TaskName STRING, StartDate DATE, EndDate DATE ); CREATE TABLE Project ( ProjectName VARCHAR(255), Budget NUMERIC(19,2), Description STRING, Members REFERENCES (Employee) CARDINALITY (1, -1) INVERSE Employee.WorksIn, ManagedBy REFERENCES (Manager) CARDINALITY (1, 1) INVERSE Manager.Manages, Tasks REFERENCES (Task) ); CREATE TABLE Employee ( Name VARCHAR(255) NOT NULL, BirthDate DATE NOT NULL, WorksIn REFERENCES (Project) INVERSE Project.Members ); CREATE TABLE Manager UNDER Employee ( Title VARCHAR(255), Manages REFERENCES (Project) INVERSE Project.ManagedBy ); CREATE INDEX birthdate_idx ON Employee (BirthDate); CREATE INDEX emp_name_idx ON Employee (Name); CREATE ENTRY_POINT DICTIONARY proj_desc_ep_dict ON Project ( Description ) MAKE_ENTRY "make-full-text-entry";
Note that VARCHAR and STRING in DDL are identical except that VARCHAR can specify maximum size of string.
Appendix 2: Equivalent Schema in ODL
interface Employee : persistent { attribute String<255> Name; attribute Date BirthDate; relationship Set<Project> WorksIn inverse Project::Members; mt_index birthdate_idx unique_key FALSE criteria {Employee::BirthDate MT_ASCEND}; mt_index emp_name_idx unique_key FALSE criteria {Employee::Name MT_ASCEND}; }; interface Manager : Employee : persistent { attribute String<64> Nullable Title; relationship Set<Project> Manages inverse Project::ManagedBy; }; interface Project : persistent { attribute String<64> Nullable ProjectName; attribute Numeric(19,2) Nullable Budget; attribute String Nullable Description; mt_entry_point_dictionary proj_desc_ep_dict entry_point_of Description unique_key FALSE make_entry_function "make-full-text-entry"; relationship Set<Employee> Members[1, -1] inverse Employee::WorksIn; relationship Manager ManagedBy inverse Manager::Manages; relationship Set<Task> Tasks; }; interface Task : persistent { attribute String Nullable TaskName; attribute Date Nullable StartDate; attribute Date Nullable EndDate; };