Full Text Search: The Key to Better Natural Language Queries for NoSQL in Node.js
Date: 1/31/2018 @ 2 p.m. ET
T-SQL has been the conventional method of writing database objects such as stored procedures, triggers, aggregates, and so forth. SQL Server 2005 introduces a new array of possibilities because of its tight integration with the .NET Framework. It allows you to write stored procedures, triggers, and other such objects in a .NET-compliant language such as C# and VB.NET, compile them as a dynamic link library, and register them inside SQL Server.
T-SQL is a great tool for managing relational data and working with set-based operations. It leverages upon numerous database features, such as query plans and the caching of query plans and their relevant results at many steps inside SQL Server, which make it a natural choice for data-related operations inside SQL Server. However, the unwieldiness of T-SQL makes it less than a pleasure to deal with in situations that involve work such as complex mathematical computations, recursive operations, and heavily procedural tasks. The usual workaround goes as follows:
- Extract the data out of SQL Server.
- Pull it into your business or data layer.
- Work with it in the layer using a .NET-compliant language such as C# or VB .NET.
- Pass the processed data back to a database object written in T-SQL, such as a stored procedure, to save the data for you.
This may not be the desirable solution when the amount of data being ferried across a network boundary is large or chatty in nature. With the ability to write and host .NET code inside SQL Server 2005, you now can prevent the network roundtrips and host the necessary .NET code right inside SQL Server.
This leads to the question that if you can write C# and VB .NET code and host it right inside SQL Server 2005, do you even need the business layer anymore? Before the water gets any muddier, let me address this first and have you look at the appropriate use of CLR integration with SQL Server.
Appropriate Use of CLR Integration
It is quite tempting to think that now that you can write and host .NET code right inside SQL Server 2005; you don't even need a business or data layer anymore. So, let me make this absolutely clear: CLR integration is not a replacement for your business or data layer. SQL Server 2005 is still a database; it is not intended to become an application server in your architecture.
But, .NET code inside SQL Server does have its rightful place and usage in your system architecture. The eventual decision to use it is dictated by ease of programming or maintainability and the performance of code in general. Look at performance first, followed by ease of programming and maintainability.
T-SQL versus CLR: a performance comparison
When it comes to core performance, remember three cardinal rules when choosing between T-SQL and CLR:
- Set-based operations tend to work better in T-SQL. This is because Microsoft has tweaked T-SQL and the query engine in general over many years to use advanced algorithms for processing and caching set-based result sets effectively inside the database. In comparison, CLR tends to use an object-based data manipulation and access strategy unless CLR is using T-SQL underneath.
- Procedural and recursive code works much better in CLR. Historically, CLR code and higher-level programming languages such as C# and VB .NET have offered better support for procedural and recursive code. This is because such languages are designed from the ground up to support per method call-based stack frames and activation records, and they do not suffer from limits such as a maximum call depth of 32 in recursive operations. In other words, if you tried to calculate a factorial using recursion in T-SQL, you could not calculate the factorial of 33 because T-SQL cannot call deeper than 32 stored procedures, but CLR wouldn't have any problem doing such an operation. Also, because stack frames are not shared, you would also get much better concurrent performance.
- You cannot take the above two rules as gospel. Too many other factors, such as compiled code versus interpreted code, the overhead of loading the CLR inside SQL Server, access restriction enforcement, library of helper functions, optimization logic that you may build inside your code, and much more, may have an effect on performance. The only definitive way to find out is to write the two objects, but the above two rules can serve as a rough yardstick.
Note: You can still do recursion in T-SQL using CTE (Common Table Expressions), which are subject to a default (but configurable) call depth of 100 in SQL Server 2005.
T-SQL versus CLR: ease of programming and maintainability
Hardcore performance is important, but it is not the only important facet of system architectures. In fact, when it comes to deciding between writing maintainable code and writing code that simply saves a few micro or even nanoseconds, most developers will—and should—opt for easy-to-program and -maintain code.
Consider the following common scenarios in which you would have to make this choice:
- Imagine doing an XSLT transform in T-SQL versus CLR. With a lot of pain, you could do an XSLT transform in T-SQL, but an equivalent CLR stored procedure would do that with no problem at all. In other words, the availability of the entire .NET framework in CLR objects presents significant and compelling reasons to prefer CLR over T-SQL.
- Think about doing outer joins between two tabular result sets. CLR will have to compare them on a row-by-row, column-by-column basis, moving and copying matching rows around. Not only would this be extremely slow, it would be difficult to write a generic enough routine to cover all possible cases. T-SQL would be the preferred choice here.
- T-SQL can leverage database transactions, or it can leverage the Microsoft Distributed Transaction Coordinator (MSDTC) to create a distributed transaction that enlists only another database using the BEGIN DISTRIBUTED TRANSACTION command. It is unable to enlist, say, an Oracle database or a non-database resource manager within the same transaction. CLR, with the help of System.Transactions, has no problem achieving this.
- T-SQL offers very limited support for nested transactions and breaking away from a current running transaction. This is possible in CLR by creating a loop back connection by adding "enlist = false" in the connection string.
- Anytime a transition occurs from CLR to T-SQL, a nested stack frame or execution scope is created. The security restrictions are not chained across such a call, and such code is treated as dynamic SQL, which means its query plans cache may not work effectively if the structure of the query keeps changing, nor is it subject to compile-time syntax checking.
- CLR has the ability to pipeline results. In other words, if a stored procedure returns 100 rows, the first row cannot be read in T-SQL until the stored procedure has finished returning the hundredth row. CLR, on the other hand, gives you fine level control to begin streaming results without having finished processing the entire result set.
CLR Integration Versus Extended Stored Procedures
Writing code in a higher-level language isn't a brand new concept. In fact, even in SQL Server 2000 you could write extended stored procedures in a language such as C++ and run them directly inside the database. So, what does CLR integration offer that extended stored procedures don't? Consider the following:
- CLR objects inside SQL Server allow you to neatly categorize them into three buckets of access restrictions. The developers at Microsoft took a long, hard look at every available class in the .NET framework and categorized them into three categories. Depending upon the access restriction your CLR object is under, the CLR object is restricted from using certain features of the framework. The three categories are as follows:
- SAFE: This is the default level and it is the most restrictive. This means that your code does not need any external resources in addition to the operation. Safe code can access data from the local SQL Server databases or perform computations and business logic.
- EXTERNAL_ACCESS: This level signifies that certain external resources, such as files, networks, Web services, environmental variables, and the Registry, are accessible.
- UNSAFE: This level, which you should try very hard to avoid, specifies that your code is allowed to do anything. In other words, you are requesting to be free of any granular-level control, and thus have the same permissions as an extended stored procedure.
- Extended stored procedures are written in C++. CLR objects can be written in .NET-compliant languages such as C# or VB.NET, which are safer and easier to use.
- CLR code is much more reliable than the native unmanaged code that extended stored procedures are written in. This is because you are freed from issues such as memory management, and your code is tied to appropriate access restrictions based on the code access security model of the .NET Framework.
- CLR code has the ability to work with newly introduced data types such as varchar(max), varbinary(max), and XML.
- CLR code has the ability to latch on to the current running database connection, also referred to as the context connection. Extended procedures have no option but to create a loop back connection, which involves a significant overhead.
- CLR objects can be of various types, stored procedures, triggers, UDFs, TVFs, and aggregates, whereas extended stored procedures can only be extended stored procedures.
- Native unmanaged code runs faster than .NET 2.0 code in most cases. This means that unless your code explicitly needs to work with the local data store it is operating on, native unmanaged code or extended stored procedures will in general perform better than CLR objects. There may be other factors affecting the actual performance, such as native-to-managed code transitions. This, however, is a very weak reason to give up the compelling advantages that the CLR presents you in general. In fact, with the introduction of CLR integration, you should almost never have to write extended stored procedures.
Just the Introduction
This article presented a high-level introduction to CLR integration with SQL Server 2005. My next article will demonstrate writing a few common objects in CLR, a few practical debugging tips, and pitfalls to watch out for. Stay tuned!
About the Author
Sahil Malik has worked for a number of top notch clients in Microsoft technologies ranging from DOS to .NET. He is the author of Pro ADO.NET 2.0 and co-author of Pro ADO.NET with VB.NET 1.1. Sahil is currently also working on a multimedia series on ADO.NET 2.0 for Keystone Learning. For his community involvement, contributions, and speaking, he has also been awarded the Microsoft MVP award.