Using "M" and the Microsoft Oslo SDK to Develope a TSQL Domain Specific Language

My last article explained why a developer chooses a Domain Specific Language (DSL) and how an M DSL can be incorporated into a Windows application. The article was a starting point for a Visual Studio developer interested in DSLs, M, the Oslo SDK, and how all the components collaborate to render a solution.

In this article, I'm going to demonstrate some more advanced material. In particular, I'll delve more into how a runtime can more effectively leverage the Oslo SDK data structures generated by M.

What is "Oslo"?

If you've been following my articles Oslo's goals and technologies are familiar to you. If not, here is a brief summary. Oslo is composed of the following components displayed in Figure 1.



Click here for larger image

Source: "Microsoft PDC 2008 - A Lap Around Olso" Figure 1 Oslo Architecture

  • "M", a language for composing models and DSLs
  • The Repository a SQL Server database designed for storing models
  • Quadrant, a tool for editing and viewing model data

Currently, Quadrant is only available to PDC attendees. M and the Repository come with the Oslo SDK available on the Oslo Developer Center site http://msdn.microsoft.com/en- us/oslo/default.aspx.

Oslo's goal is to deliver a foundation for building and storing models of all types. Models are application metadata formatted for runtime consumption. Separate Microsoft initiatives aim to build runtimes and tooling into applications like Visual Studio that are Oslo model aware.

The M Language is composed of MSchema, MGraph, and MGrammar. A complete introduction to M is beyond the scope of this article. I introduced MSchema, MGraph, and MGrammar in my prior articles.

There is an underlying set of .NET Framework classes supporting all the functionality above. In my last article I demonstrated how some of these .NET Framework classes work. As I mentioned earlier, in this article I'm taking the demonstration a step further and delivering something more advanced.

Something "Advanced"

It's no surprise that the best way to demonstrate some more advanced features is to do something, well, "advanced". The sample application demonstrating more advanced features is depicted below.



Click here for larger image

Figure 2: SQL MGrammar application

Using a DSL syntax developed in M, the runtime parses the syntax, generates a TSQL SELECT statement from the parsed data, executes the SELECT against the SQL Server AdventureWorks database, and displays the Resultset in a WPF GridView.

The sample application is built on top of the January 2009 Oslo SDK CTP.

If you're new to TSQL and unfamiliar with SELECT statements, later in the article I'll dissect the SELECT statement.

As I mentioned earlier, one of the application components is a DSL written in M. Sample DSL code I call "Simple SQL" is below.

  (Employee)
  (Contact AS Contact)
  (Employee AS Manager)
  
  (Employee AT ContactId INTERSECT Contact AT ContactId)
  (Employee AT ManagerId INTERSECT Manager AT EmployeeId)
  
  Data: Employee.NationalIdNumber
  Employee.LoginId
  Contact.LastName
  Manager.LoginId

There are three parts to the "Simple SQL" DSL program structure:

  • A section declaring the tables. Tables must be between "()". Repeated tables must use the "AS" statement to recast the table with a new name. A TSQL concept called aliasing.
  • An "INTERSECT" section indicating how the relational data is joined between table.
  • A "Data" section containing the fields and the table with the originating field.
Like many DSLs, the syntax is expressed using the domain's "business language", making development more approachable to a wider audience. As long as a business user understands some basic relational concepts they'll be capable of writing "Simple SQL" DSL code.

There's one more topic I need to cover before exploring the sample runtime anatomy.

Essence of TSQL SELECT

A complete introduction to TSQL is beyond the scope of this article. I'll only cover some basics from the perspective of the sample. The SELECT statement generated by the sample appears below.

  SELECT 
  Employee.NationalIdNumber 
  , Employee.LoginId 
  , Contact.LastName 
  , Manager.LoginId 

FROM HumanResources.Employee AS Employee  
INNER JOIN Person.Contact AS Contact ON Employee.ContactId = Contact.ContactId  
INNER JOIN HumanResources.Employee AS Manager ON Employee.ManagerId = Manager.EmployeeId

While there are many parts and options in a SELECT statement, I'm only employing the following:

  • The select list containing the fields being returned in the data.
  • The FROM clause containing the source tables.
  • INNER JOIN indicating how source tables are linked by their relationships. Inner joins specify that only results satisfying the expression after "ON" are returned in the result set. In the example above, only Employee information with a ContactId in the contact table and a ManagerId in the Employee table are included in the data.

Earlier I mentioned aliasing. Sometimes a table has a relationship between its key field and another field. In the example above, a Manager is also an Employee. So to distinguish between the Employee table related to the employee and the Employee table related to the Manager, the Manager Employee table is aliased.

Now that I've shown what the application does and shared some background on business problems the sample solves, I'm going to tackle the application internals starting with the "M" components.

MGrammar Revisited

MGrammar is the portion of the M language tasked with expressing parsing. MGrammar expresses patterns in text data and how to pull data from the parse text. The core of MGrammar are rules. There are three types of rules:
  • Tokens - think of tokens as the "words" in the program. In tokens, a developer describes specific patterns in the text.
  • Syntaxes are more like the sentences. Syntaxes arrange the tokens and other syntaxes and define the format of the data pulled from the text.
  • Interleaves define ignorable input.

There is more to M and MGrammar than rules. I'm going to explain some of these other features as I show how the "Simple SQL" has been implemented in M.

Specifying Keywords

There are 3 parts to the M portion of the sample application. As you may have observed in the DSL sample above, there are a number of keywords in the sample input text. Here is the MGrammar keyword section in the M code.
  @{ Classification["Keyword"]}       
          token INTERSECT = "INTERSECT"; 
          @{ Classification["Keyword"]}       
          token AT = "AT"; 
          @{ Classification["Keyword"]}       
          token DataFieldQualifier = "Data:";
          @{ Classification["Keyword"]}       
          token Alias = "AS";

Keywords are simply tokens. The Attributes on the keyword are directives for Intellipad. When developing in Intellipad, keywords will be highlighted in Intellipad.

I like to think of M development the way I think about other types of development. So, I like to partition my M application similar to the way I partition a .NET application written in C#. I like to assemble a collection of tokens and then compose the tokens into Syntaxes and then Syntaxes upon Syntaxes.

Partitioning Syntaxes and Controlling a Production

The first partition of Syntaxes appears below

syntax Main = s:SelectInputs+ =>  { s };
syntax SelectInputs = l:DataFields | l:Sets | l:Tables => l;
      
syntax DataFields = DataFieldQualifier d:((ListText))+ => Data[valuesof(d)];
syntax Sets = "(" t1:ContentText AT fld1:ContentText INTERSECT t2:ContentText AT fld2:ContentText ")" 
=> INTERSECT[t1,fld1,t2,fld2];
syntax TableReg = "(" reg:(ContentText) ")" => TableReg [reg];
syntax TableAlias = "(" alias:(ContentText Alias ContentText) ")" => TableAlias [alias];
syntax Tables =  t:TableAlias | t:TableReg => t;
interleave Ignore = Garbage;        
token Garbage = (' ' | '\n' | '\r')+;
token ContentText = ContentChar+;
token ContentChar = "a".."z"|"A".."Z"|"0".."9";
token ListChar = ContentChar | ".";
token ListText = ListChar+;

Syntaxes are divided into the following patterns:

  • Patterns identifying the tables are divided into the pattern signifying the Alias table, TableAlias Syntax and a pattern matching a regular table TableReg syntax.
  • Patterns matching the Data called DataFields.
  • Finally, a pattern matching the table joins called Sets.

Productions are encapsulated in the syntax appearing to the right of the "=>". Functions like "valuesof" change the shape of the data making it less hierarchal. "{}" and "[]" indicate unordered ("{}") or ordered ("[]") output.

Turning now to the .NET portion of the application, I'll show how the M code is utilized by the application.

Using "M" and the Microsoft Oslo SDK to Develope a TSQL Domain Specific Language

Run

Executing the parsing and generating data all starts behind the WPF Application "Run" button. The Run button code appears below.

  GridView view;
  DBMediator med = new DBMediator();
  GridViewLayoutBuilder viewBuild = new GridViewLayoutBuilder();
  DataSet ds = null;
  List configList = null;
  MParserBuilder parser = new MParserBuilder();
  MGraphGenerator graphGen = new MGraphGenerator();
  MGraphIterator iterator = null;
  MGraphTSQLQueryVisitor visitor = new MGraphTSQLQueryVisitor();
  List visitors = new List();
  
  parser.Build(_mgFile);
  
  graphGen.Run(this._inputText.Text, parser.Parser);
  
  visitor.Graph = graphGen.Builder;
  
  visitors.Add(visitor);
  
  iterator = new MGraphIterator(graphGen.Builder,
                    graphGen.GraphRoot, visitors);
  
  iterator.Traverse();
  
  this.SQLCode.Text = visitor.TSQL;
  
  
  
  //Now view the data
  view = (GridView)this.SQLData.View;
  
  ds = med.GetDataSet(this.SQLCode.Text);
  
  configList = 
      (new GridViewColumnConfigBuilder(ds.Tables[0].Columns)).GetColumnNames();
  
  viewBuild.View = view;
  viewBuild.SetColumns(configList);
  
  this.SQLData.DataContext = ds;
  this.SQLData.ItemsSource = ds.Tables[0].Rows;

Classes declared at the top of the code have the following responsibilities.

  • GridView handles rendering the DataSet data.
  • DBMediator accepts a SQL query and returns a DataSet.
  • GridViewLayoutBuilder changes the GridView to match the DataSet
  • MParserBuilder builds the Oslo DynamicParser from the M code
  • MGraphGenerator uses the DynamicParser to build a GraphBuilder object.
  • MGraphIterator navigates the GraphBuilder object, invoking a IGraphVisitor object at each node.
  • MGraphTSQLQueryVisitor is the implementation of IGraphVisitor and handles SELECT statement construction.

I'll highlight the critical parts of each class beginning with the Iterator.

Navigating a Graph -- Iterator

In my prior article I demonstrated how a developer can incorporate the M code. You can view that article here http://www.codeguru.com/columns/experts/article.php/ c16043/ . In the prior article I navigated an Oslo Data structure called GraphBuilder.

GraphBuilder is generated by the Oslo classes from the text input and the MGrammar loaded in the application. GraphBuilder is a hiearchal data structure somewhat matching the shape of the MGraph-like data generated in the Tree view of Intellipad. A sample from Intellipad appears below.



Click here for larger image

Figure 3: Intellipad MGraph

To consume the data I chose to construct code following two well known software patterns, the Iterator pattern embodied in the MGraphIterator class and the Visitor pattern embodied in the IMGraphIterator interface. For a complete description of the patterns refer to the sources at the end of the article.

Also, the Iterator I built does not implement IEnumerable, so you can't utilize the C# foreeach statement with the Iterator. The Iterator name refers to the pattern not the implementation.

MGraphIterator and a class implementing IMGraphIterator work like this. As viewed above, GraphBuilder is a tree-like data structure. MGraphIterator recursively traverses the nodes on the GraphBuilder invoking the Accept function on a list of IMGraphIterator based classes. A portion of the code appears below.

          public void Traverse ()
          {
              WalkTheNodes(_graph, _root,_visitors);
          }
  
          private void WalkTheNodes(System.Dataflow.GraphBuilder graph, object nodeOn, List<IMGRAPHVISITOR> visitors)
          {
              bool isNode = false;
              
              foreach (object enumNext in graph.GetSuccessors(nodeOn))
              {
                  isNode = graph.IsNode(enumNext);
  
                  if ((isNode)) //Skip the non-nodes
                  {
                      foreach (IMGraphVisitor visitor in visitors)
                      {
                          visitor.Accept(enumNext);
                      }
  
                      WalkTheNodes(graph, enumNext, visitors);//Recurse to next node   
                  }
                  
              }
          }

I envisioned making the Iterator/Visitor interaction a reusable pattern. Different M languages could produce different GraphBuilders, but I thought that every GraphBuilder would be traversed in a similar manner. So, I decided to defer GraphBuilder consumption to an implementation of an interface following the Visitor pattern.

Now I'll show how I implemented a visitor for a GraphBuilder created from my "Simple SQL" DSL.

The Visitor

MGraphTSQLQueryVisitor implements the IGraphVisitor interface. Main parts of the Accept function appear below.
          public void Accept(object node)
          {
              List<OBJECT> vals;
              string seqLabel = "";
              object seqLabelObj = null;
              string tempVal = "";
              string originalVal = "";
  
              seqLabelObj = _graph.GetSequenceLabel(node);
  
              if (seqLabelObj != null) //Sometimes label returns null value
              {
                  seqLabel = seqLabelObj.ToString();
  
                  if (seqLabel == "TableReg")
                  {
  ..
                  }
  
                  if (seqLabel == "TableAlias") 
                  {
    ..              
  }
  
                  if (seqLabel == "INTERSECT") 
                  {
  ..
  
                  }
  
                  if (seqLabel == "Data")
                  {
                      vals = GetSuccessorsTo(node);
  
                      foreach (object obj in vals)
                      {
                          _selectFieldSection.Add( GetFirstSuccessorAtDepth(obj, 5));
  
                      }
  
                  }
  
              }
          }
  

As you can see above MGraphTSQLQueryVisitor executes different code depending on the visited Node. When a node contains a root with the first word in one of the projections, Accept retrieves the information in levels below the Node being visited.

MGraphTSQLQueryVisitor leverages a class called TSQLTableNameSchemaResolver. TSQLTableNameSchemaResolver is pseudo database-aware. Business users are probably not aware of Schema information. So TSQLTableNameSchemaResolver qualifies the utilized tables with the correct schema information. I envisioned this class querying system tables in the database to match tables entered in the DSL code.

Some internal collections store what is retrieved from GraphBuilder so that the SELECT statement can be assembled and executed. WPF application code handles data display.

Displaying Data

The sample is a WPF Application. So the GridView rendering classes I'll review shortly use the WPF controls. A complete review of WPF GridView controls is beyond the scope of this article. Sources at the end of the article contain more WPF information.

DBAccess executes the generated TSQL, creating a DataSet from standard ADO.NET code.

Using the View attached to the SQLData Control on Window1, GridViewLayoutBuilder builds a GridView based on a ColumnConfig collection assembled by the GridViewColumnConfigBuilder class from column information in the DataSet. Code assembling the Grid columns appears below.

          public void SetColumns(List<COLUMNCONFIG> columnConfig)
          {
              GridViewColumn col = null;
  
              foreach (ColumnConfig config in columnConfig)
              {
                  col = new GridViewColumn();
  
                  col.Width = 50;
                  col.DisplayMemberBinding = new Binding(config.BindingPath);
                  col.Header = config.DisplayName;
  
                  View.Columns.Add(col);
              }
  
          }

GridViewColumnConfigBuilder formats the Binding path to point to the correct data inside the DataSet. WPF leverages the BindingPath to find the column data inside the DataSet.

Conclusion

In a prior article I showed how to incorporate a DSL written in M into a .NET application. This article tackled a more advanced task. In particular, it implemented a more advanced DSL called "Simple SQL" and demonstrated more sophisticated patterns for consuming data structures generated by the DSL and the Oslo SDK.

Sources

Oslo SDK documentation and Samples shipping with Oslo SDK CTP January 2009
MSDN - "Textual Domain Specific Languages for Developers - Part 2"
Software Patterns - Iterator
Software Patterns - Visitor
WPF sources - GridView Overview


About the Author

Jeffrey Juday

Jeff is a software developer specializing in enterprise application integration solutions utilizing BizTalk, SharePoint, WCF, WF, and SQL Server. Jeff has been developing software with Microsoft tools for more than 15 years in a variety of industries including: military, manufacturing, financial services, management consulting, and computer security. Jeff is a Microsoft BizTalk MVP. Jeff spends his spare time with his wife Sherrill and daughter Alexandra.

Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: October 29, 2014 @ 11:00 a.m. ET / 8:00 a.m. PT Are you interested in building a cognitive application using the power of IBM Watson? Need a platform that provides speed and ease for rapidly deploying this application? Join Chris Madison, Watson Solution Architect, as he walks through the process of building a Watson powered application on IBM Bluemix. Chris will talk about the new Watson Services just released on IBM bluemix, but more importantly he will do a step by step cognitive …

  • Live Event Date: October 29, 2014 @ 1:00 p.m. ET / 10:00 a.m. PT It's well understood how critical version control is for code. However, its importance to DevOps isn't always recognized. The 2014 DevOps Survey of Practice shows that one of the key predictors of DevOps success is putting all production environment artifacts into version control. In this eSeminar, Gene Kim will discuss these survey findings and will share woeful tales of artifact management gone wrong! Gene will also share examples of how …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds