Using “M” and the Microsoft Oslo SDK to Develope a TSQL Domain Specific Language

My last article explained why a developer chooses a Domain Specific Language (DSL) and how an M DSL can be incorporated into a Windows application. The article was a starting point for a Visual Studio developer interested in DSLs, M, the Oslo SDK, and how all the components collaborate to render a solution.

In this article, I’m going to demonstrate some more advanced material. In particular, I’ll delve more into how a runtime can more effectively leverage the Oslo SDK data structures generated by M.

What is “Oslo”?

If you’ve been following my articles Oslo’s goals and technologies are familiar to you. If not, here is a brief summary. Oslo is composed of the following components displayed in Figure 1.

Oslo’s goal is to deliver a foundation for building and storing models of all types. Models are application metadata formatted for runtime consumption. Separate Microsoft initiatives aim to build runtimes and tooling into applications like Visual Studio that are Oslo model aware.

The M Language is composed of MSchema, MGraph, and MGrammar. A complete introduction to M is beyond the scope of this article. I introduced MSchema, MGraph, and MGrammar in my prior articles.

There is an underlying set of .NET Framework classes supporting all the functionality above. In my last article I demonstrated how some of these .NET Framework classes work. As I mentioned earlier, in this article I’m taking the demonstration a step further and delivering something more advanced.

Something “Advanced”

It’s no surprise that the best way to demonstrate some more advanced features is to do something, well, “advanced”. The sample application demonstrating more advanced features is depicted below.

The sample application is built on top of the January 2009 Oslo SDK CTP.

If you’re new to TSQL and unfamiliar with SELECT statements, later in the article I’ll dissect the SELECT statement.

As I mentioned earlier, one of the application components is a DSL written in M. Sample DSL code I call “Simple SQL” is below.

(Contact AS Contact)
(Employee AS Manager)

(Employee AT ContactId INTERSECT Contact AT ContactId)
(Employee AT ManagerId INTERSECT Manager AT EmployeeId)

Data: Employee.NationalIdNumber

There are three parts to the “Simple SQL” DSL program structure:

  • A section declaring the tables. Tables must be between “()”. Repeated tables must use the “AS” statement to recast the table with a new name. A TSQL concept called aliasing.
  • An “INTERSECT” section indicating how the relational data is joined between table.
  • A “Data” section containing the fields and the table with the originating field.
Like many DSLs, the syntax is expressed using the domain’s “business language”, making development more approachable to a wider audience. As long as a business user understands some basic relational concepts they’ll be capable of writing “Simple SQL” DSL code.

There’s one more topic I need to cover before exploring the sample runtime anatomy.

Essence of TSQL SELECT

A complete introduction to TSQL is beyond the scope of this article. I’ll only cover some basics from the perspective of the sample. The SELECT statement generated by the sample appears below.

, Employee.LoginId
, Contact.LastName
, Manager.LoginId

FROM HumanResources.Employee AS Employee
INNER JOIN Person.Contact AS Contact ON Employee.ContactId = Contact.ContactId
INNER JOIN HumanResources.Employee AS Manager ON Employee.ManagerId = Manager.EmployeeId

While there are many parts and options in a SELECT statement, I’m only employing the following:

  • The select list containing the fields being returned in the data.
  • The FROM clause containing the source tables.
  • INNER JOIN indicating how source tables are linked by their relationships. Inner joins specify that only results satisfying the expression after “ON” are returned in the result set. In the example above, only Employee information with a ContactId in the contact table and a ManagerId in the Employee table are included in the data.

Earlier I mentioned aliasing. Sometimes a table has a relationship between its key field and another field. In the example above, a Manager is also an Employee. So to distinguish between the Employee table related to the employee and the Employee table related to the Manager, the Manager Employee table is aliased.

Now that I’ve shown what the application does and shared some background on business problems the sample solves, I’m going to tackle the application internals starting with the “M” components.

MGrammar Revisited

MGrammar is the portion of the M language tasked with expressing parsing. MGrammar expresses patterns in text data and how to pull data from the parse text. The core of MGrammar are rules. There are three types of rules:

  • Tokens – think of tokens as the “words” in the program. In tokens, a developer describes specific patterns in the text.
  • Syntaxes are more like the sentences. Syntaxes arrange the tokens and other syntaxes and define the format of the data pulled from the text.
  • Interleaves define ignorable input.

There is more to M and MGrammar than rules. I’m going to explain some of these other features as I show how the “Simple SQL” has been implemented in M.

Specifying Keywords

There are 3 parts to the M portion of the sample application. As you may have observed in the DSL sample above, there are a number of keywords in the sample input text. Here is the MGrammar keyword section in the M code.
  @{ Classification[“Keyword”]}
@{ Classification[“Keyword”]}
token AT = “AT”;
@{ Classification[“Keyword”]}
token DataFieldQualifier = “Data:”;
@{ Classification[“Keyword”]}
token Alias = “AS”;

Keywords are simply tokens. The Attributes on the keyword are directives for Intellipad. When developing in Intellipad, keywords will be highlighted in Intellipad.

I like to think of M development the way I think about other types of development. So, I like to partition my M application similar to the way I partition a .NET application written in C#. I like to assemble a collection of tokens and then compose the tokens into Syntaxes and then Syntaxes upon Syntaxes.

Partitioning Syntaxes and Controlling a Production

The first partition of Syntaxes appears below

syntax Main = s:SelectInputs+ =>  { s };
syntax SelectInputs = l:DataFields | l:Sets | l:Tables => l;

syntax DataFields = DataFieldQualifier d:((ListText))+ => Data[valuesof(d)];
syntax Sets = “(” t1:ContentText AT fld1:ContentText INTERSECT t2:ContentText AT fld2:ContentText “)”
=> INTERSECT[t1,fld1,t2,fld2];
syntax TableReg = “(” reg:(ContentText) “)” => TableReg [reg];
syntax TableAlias = “(” alias:(ContentText Alias ContentText) “)” => TableAlias [alias];
syntax Tables = t:TableAlias | t:TableReg => t;
interleave Ignore = Garbage;
token Garbage = (‘ ‘ | ‘\n’ | ‘\r’)+;
token ContentText = ContentChar+;
token ContentChar = “a”..”z”|”A”..”Z”|”0″..”9″;
token ListChar = ContentChar | “.”;
token ListText = ListChar+;

Syntaxes are divided into the following patterns:

  • Patterns identifying the tables are divided into the pattern signifying the Alias table, TableAlias Syntax and a pattern matching a regular table TableReg syntax.
  • Patterns matching the Data called DataFields.
  • Finally, a pattern matching the table joins called Sets.

Productions are encapsulated in the syntax appearing to the right of the “=>”. Functions like “valuesof” change the shape of the data making it less hierarchal. “{}” and “[]” indicate unordered (“{}”) or ordered (“[]”) output.

Turning now to the .NET portion of the application, I’ll show how the M code is utilized by the application.

More by Author

Must Read