Writing Database Objects in CLR: Advanced Scenarios

My previous article demonstrated how to write a simple table valued function (TVF) that accepts an integer and returns a table of results, breaking down the input of value into its factors. Thus, if you pass in 2, you get 1 and 2, but if you pass in 15, you get 1, 3, 5, and 15.

But you may ask, 'what does that have to do with a database?' It turns out—not much. And when you write code to be run inside SQL Server 2005, more often than not, you will want to deal with other database objects, much in the same way T-SQL would.

This article carries the discussion forward by writing another object: a CLR-stored procedure that uses the TVF written in the previous article. This stored procedure will accept a positive integer and return all prime numbers less than the input. A prime number is a number that is divisible by 1 and itself, so you easily could use the TVF, find the number of factors, and return the numbers that qualify as prime numbers.

You can further optimize this method by filtering a list of previously found prime numbers and trying to factorize using only them. Yet another way would be to use the "Sieve of Eratosthenes." Eratosthenes (276-196 B.C.) invented a method for efficiently constructing tables of prime numbers. Using this method, write down a list of integers beginning with 2 and ending with some number, say, 15:

  2 3 4 5 6 7 8 9 10 11 12 13 14 15

First, mark all multiples of 2:

  2 3 4 5 6 7 8 9 10 11 12 13 14 15
      *   *   *   *     *     *

Find the next unmarked number, which in this case is 3, and then mark all its multiples:

  2 3 4 5 6 7 8 9 10 11 12 13 14 15
      *   *   * * *     *     *  *

Keep doing this until no more new unmarked numbers are left. The final unmarked numbers are the primes less than 15:

  2 3 5 7 11 13

A few things are apparent from the problem description:

  1. For the first approach, you need to call another object on the same database (the TVF that you already wrote).
  2. In the first approach of calculating prime numbers, you also need to maintain and work with previously found prime numbers, possibly in a database table. But, you will be able to send back prime numbers as you find them, one by one.
  3. In the Sieve of Eratosthenes approach of calculating prime numbers, you do not need to maintain a list of prime numbers, but you won't be able to send results back row by row. Instead, you have to finish finding all prime numbers before you can send the first one back.

The second approach doesn't seem to talk to underlying database components, so it sounds simpler. Let's look at the Sieve of Eratosthenes first.

Sending Results Back One by One

This article skips the steps for creating and debugging the CLR-stored procedure because they are fairly similar to writing a TVF. You can refer to my previous article for those.

The following is the stored procedure that uses Sieve of Eratosthenes to calculate prime numbers:

[Microsoft.SqlServer.Server.SqlProcedure]
public static void FindPrimes(int input)
{
   // NumberSieve is a 2-based strongly typed collection
   // See code download for full implementation.
   NumberSieve mySieve = new NumberSieve(input);

   // The enumerator in NumberSieve will return only unmarked numbers
   foreach (SieveEntry unmarkedNumber in mySieve) {
      for (int i = unmarkedNumber.Number + 1; i <= input; i++) {
         if (mySieve[i].IsDivisibleBy(unmarkedNumber.Number))
            mySieve[i].IsMarked = true;
      }
   }

   // So at this point all that is left are prime numbers
   SqlDataRecord record = 
      new SqlDataRecord(
         new SqlMetaData("PrimeNumber", SqlDbType.Int)) ;
   SqlContext.Pipe.SendResultsStart(record);
   foreach (SieveEntry prime in mySieve) {
      record.SetInt32(0, prime.Number)  ;
      SqlContext.Pipe.SendResultsRow(record);
   }
   SqlContext.Pipe.SendResultsEnd();
}

Interestingly, the above code is very simple. It is simply split in two parts. The first part calculates prime numbers using a strongly typed custom collection called NumberSieve that is 2-based. The NumberSieve holds SieveEntry instances.

Note: You can check the associated code download for the actual implementation of the associated business objects, NumberSieve and SieveEntry. Because that code demonstrates a rather interesting use of Generics and strongly typed collections, it is worth a look, although out of scope of this article.

Once the prime numbers are filtered out, the second part of the above stored procedure sends the results back to the caller. You can pick this code apart to see what it is doing.

The first thing the code does is set up the structure of the records that the tabular data will contain:

   SqlDataRecord record =
      new SqlDataRecord(
         new SqlMetaData("PrimeNumber", SqlDbType.Int)) ;
   SqlContext.Pipe.SendResultsStart(record);

The next thing it does is start sending the prime numbers one by one. Note that the custom enumerator implemented on NumberSieve instance, mySieve, returns only unmarked numbers:

   foreach (SieveEntry prime in mySieve) {
      record.SetInt32(0, prime.Number)  ;
      SqlContext.Pipe.SendResultsRow(record);
   }

At this point, the only unmarked numbers should be prime numbers.

The final thing the code does is send an end marker signifying that this tabular data has completed:

   SqlContext.Pipe.SendResultsEnd();

The first notable thing about this stored procedure is that you have to calculate all results before you can send the first. So, it places a lot of pressure on memory. Also, once you do have the results to send, you have to send them one by one using SqlContext.Pipe.SendResultsRow.

The second thing to note is that this operation is wholly contained in memory—nothing written on the disk, no interaction with the TVF you have already written. Well, one big advantage of running a completely in-memory operation is speed—up to a point. Why up to a point? Because after a certain point, you use up so much memory representing your business object instances that your performance degrades asymptotically. I ran a quick performance test on my rather average desktop, and you can see the amount of memory calculating primes took in Figure 1.

Figure 1. Sieve of Eratosthenes Stored Procedure Performance Results

As you can see, the performance degradation is not linear. The more numbers you pass in, the worse it gets. This is primarily because so much memory is being allocated in the NumberSieve instance that the garbage collector is firing up way too often, which affects your performance negatively.

So next, look at an alternate mechanism that is better suited to larger input numbers using persistent storage: a database table.

Writing Database Objects in CLR: Advanced Scenarios

Sending Results Back in One Shot

An alternative to the Sieve of Eratosthenes for calculating prime numbers is to simply find them the brute force way: by finding divisors. You can do this easily using the divisors TVF written in my previous article. That means if the following query gives the output "1", the number is prime:

Select count(*) from divisor(@Input)

So, as you find prime numbers, you can keep inserting them into a table and return them in one shot at the end. But this means you need a table in which to store the prime numbers. So, set that up first by using the following script:

Create Table PrimeNumber
(
   PrimeNumber Int
)

With that set up, you can write the stored procedure that will populate the above table as it finds primary numbers and reads from the above table to return the results. The following is the code for this stored procedure:

[Microsoft.SqlServer.Server.SqlProcedure]
public static void FindPrimes2(int input)
{
   using (SqlConnection contextConnection = 
      new SqlConnection("context connection=true"))
   {
      try
      {
         contextConnection.Open();

         // Setup commands
         SqlCommand divisorCountCmd = 
            contextConnection.CreateCommand();
         divisorCountCmd.CommandText = 
            "Select count(*) from divisor(@Input)";
         divisorCountCmd.Parameters.AddWithValue(
            "@Input", 2);// set dummy value

         SqlCommand insertPrimeCommand = 
            contextConnection.CreateCommand();
         insertPrimeCommand.CommandText = 
            "Insert into PrimeNumber Values (@Input)";
         insertPrimeCommand.Parameters.AddWithValue(
            "@Input", 2); // dummy value

         for (int i = 2; i < input; i++)
         {
            divisorCountCmd.Parameters[0].Value = i;
            int numDivisors = (int) divisorCountCmd.ExecuteScalar();
            if (numDivisors == 1) // Its a prime number
            {
               insertPrimeCommand.Parameters[0].Value = i;
               if (insertPrimeCommand.ExecuteNonQuery() != 1)
                  throw new Exception(
                     "Cannot insert into PrimeNumber");
            }
         }

         // By now we should have filtered all the numbers out.
         SqlCommand fetchPrimes = contextConnection.CreateCommand();
         fetchPrimes.CommandText =
            "Select PrimeNumber from PrimeNumber";
         SqlDataReader primeRdr =
            fetchPrimes.ExecuteReader(CommandBehavior.CloseConnection);
         SqlContext.Pipe.Send(primeRdr);
         primeRdr.Close();
      }
      catch (Exception)
      {
         throw; // no exception logging infrastructure for demo :-/
      }
      finally
      {
         contextConnection.Close();
      }
   }
}

Notice something strange? Observe the connection string for the connection being used:

using (SqlConnection contextConnection =
   new SqlConnection("context connection=true"))

Well, you need a connection because you have a few commands (divisorCountCmd, insertPrimeCmd, and fetchPrimes) that actually work with underlying database objects. Also, you need to specify a connection string to get a valid and usable connection object. So, you should specify a connection string that looks like this:

"Server=Win2k3-smalik; Database=Test; Integrated Security=SSPI"

Although that would certainly work, it sounds incredibly wasteful to go over the entire networking stack, re-authenticate, and then create a brand new connection—when you are already on a connection. So, wouldn't it be nice if you could simply get a hold of the existing connection? That is exactly what the connection string "context connection = true" would do. It hands you back the connection you are already on.

Once you do have a hold of the context connection, the rest is plain ADO.NET code. Put simply, if the number of divisors of a number (other than the number itself) is 1, the number indeed is a prime number. When a prime number is found, the code simply inserts it into the PrimeNumber table using the insertPrimeCmd SqlCommand.

Finally, it uses a simple SqlDataReader to stream the results back in one shot using the following line of code:

SqlContext.Pipe.Send(primeRdr);

Also, because you are trying to ensure that the number of divisors other than the number itself is only 1, don't try to divide by every single number out there. Instead, try to focus on only the prime numbers less than the input number. This can easily be done by modifying the TVF code to the following:

[Microsoft.SqlServer.Server.SqlFunction(
   FillRowMethodName = "RowFiller", DataAccess = DataAccessKind.Read,
   TableDefinition = "Divisor int")]
public static IEnumerable Divisor(Int32 input)
{
   ArrayList divisors = new ArrayList();
   using (SqlConnection contextConnection = 
      new SqlConnection("context connection = true"))
   {
      try
      {
         contextConnection.Open();
         SqlCommand fetchPrimes = contextConnection.CreateCommand();
         fetchPrimes.CommandText = 
            "Select PrimeNumber from PrimeNumber";
         SqlDataReader primeRdr = 
            fetchPrimes.ExecuteReader(CommandBehavior.CloseConnection);
         if (primeRdr.HasRows)
         {
            while (primeRdr.Read())
            {
               int i = primeRdr.GetInt32(0);
               if (input % i == 0)
               {
                  divisors.Add(i);
               }
            }
         }
         primeRdr.Close();
      }
      finally
      {
         contextConnection.Close();
      }
      return divisors;
   }
}

public static void RowFiller(object row, out int Divisor)
{
   Divisor = (int)row ;
}

Now, if you rerun the stored procedure and check its performance degradation over an increasing set of numbers, you'll see the results in Figure 2.

Figure 2. Divisors TVF Stored Procedure Performance Results

As you can see, the downgrade in performance is far more linear this time. In fact, because the bottleneck is the PrimeNumber table, you can use all sorts of conventional database tricks to speed up inserts into that table.

Note: Writing to a table can never be as fast as working completely in memory.

Another interesting point is that the context connection was opened in two places: one in the stored procedure and second in the TVF. This doesn't mean you can have as many instances of context connections open at any given time. In fact, you can have only one context connection concurrently open inside one stack frame that was opened within the same stack frame. Thus, because the stored procedure calling the T-SQL, which in turn calls the TVF, nests such stack frames or data activation records, you can concurrently maintain two open context connections—even though they are really the same connection.

Your other option, of course, is to simply use a non-context connection. It will be more expensive than context connections, but you have the full power of connection pooling right inside SQL Server 2005. After all, it is the same CLR inside or outside SQL Server 2005.

Some Lingering Issues

Of course, the remaining issue with this code is a PrimeNumber table that needs to be cleaned between runs of the stored procedure. Wouldn't it be nice if somehow you could just roll back the entire transaction? But, what if your operation required some work being done in T-SQL, some in SQLCLR (or multiple SQLCLR objects), and some in your business layer or data layer using SqlCommand running under a SqlTransaction? What if you were not using a context connection but a connection to a separate SQL Server 2005 database? What if it was a SQL Server 2000 database, or even an Oracle database being used through SQLCLR? Or what if it wasn't a database at all, what if it was some non-database operation that needed to be wrapped up inside a transaction?

I will cover these possibilities and more in my next article, which will discuss the integration of System.Transactions and SQLCLR.



Downloads

Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: October 29, 2014 @ 11:00 a.m. ET / 8:00 a.m. PT Are you interested in building a cognitive application using the power of IBM Watson? Need a platform that provides speed and ease for rapidly deploying this application? Join Chris Madison, Watson Solution Architect, as he walks through the process of building a Watson powered application on IBM Bluemix. Chris will talk about the new Watson Services just released on IBM bluemix, but more importantly he will do a step by step cognitive …

  • Protecting business operations means shifting the priorities around availability from disaster recovery to business continuity. Enterprises are shifting their focus from recovery from a disaster to preventing the disaster in the first place. With this change in mindset, disaster recovery is no longer the first line of defense; the organizations with a smarter business continuity practice are less impacted when disasters strike. This SmartSelect will provide insight to help guide your enterprise toward better …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds