Writing Database Objects in CLR: Advanced Scenarios

My previous article demonstrated how to write a simple table valued function (TVF) that accepts an integer and returns a table of results, breaking down the input of value into its factors. Thus, if you pass in 2, you get 1 and 2, but if you pass in 15, you get 1, 3, 5, and 15.

But you may ask, ‘what does that have to do with a database?’ It turns out—not much. And when you write code to be run inside SQL Server 2005, more often than not, you will want to deal with other database objects, much in the same way T-SQL would.

This article carries the discussion forward by writing another object: a CLR-stored procedure that uses the TVF written in the previous article. This stored procedure will accept a positive integer and return all prime numbers less than the input. A prime number is a number that is divisible by 1 and itself, so you easily could use the TVF, find the number of factors, and return the numbers that qualify as prime numbers.

You can further optimize this method by filtering a list of previously found prime numbers and trying to factorize using only them. Yet another way would be to use the “Sieve of Eratosthenes.” Eratosthenes (276-196 B.C.) invented a method for efficiently constructing tables of prime numbers. Using this method, write down a list of integers beginning with 2 and ending with some number, say, 15:

  2 3 4 5 6 7 8 9 10 11 12 13 14 15

First, mark all multiples of 2:

  2 3 4 5 6 7 8 9 10 11 12 13 14 15
      *   *   *   *     *     *

Find the next unmarked number, which in this case is 3, and then mark all its multiples:

  2 3 4 5 6 7 8 9 10 11 12 13 14 15
      *   *   * * *     *     *  *

Keep doing this until no more new unmarked numbers are left. The final unmarked numbers are the primes less than 15:

  2 3 5 7 11 13

A few things are apparent from the problem description:

  1. For the first approach, you need to call another object on the same database (the TVF that you already wrote).
  2. In the first approach of calculating prime numbers, you also need to maintain and work with previously found prime numbers, possibly in a database table. But, you will be able to send back prime numbers as you find them, one by one.
  3. In the Sieve of Eratosthenes approach of calculating prime numbers, you do not need to maintain a list of prime numbers, but you won’t be able to send results back row by row. Instead, you have to finish finding all prime numbers before you can send the first one back.

The second approach doesn’t seem to talk to underlying database components, so it sounds simpler. Let’s look at the Sieve of Eratosthenes first.

Sending Results Back One by One

This article skips the steps for creating and debugging the CLR-stored procedure because they are fairly similar to writing a TVF. You can refer to my previous article for those.

The following is the stored procedure that uses Sieve of Eratosthenes to calculate prime numbers:

[Microsoft.SqlServer.Server.SqlProcedure]
public static void FindPrimes(int input)
{
   // NumberSieve is a 2-based strongly typed collection
   // See code download for full implementation.
   NumberSieve mySieve = new NumberSieve(input);

   // The enumerator in NumberSieve will return only unmarked numbers
   foreach (SieveEntry unmarkedNumber in mySieve) {
      for (int i = unmarkedNumber.Number + 1; i <= input; i++) {
         if (mySieve[i].IsDivisibleBy(unmarkedNumber.Number))
            mySieve[i].IsMarked = true;
      }
   }

   // So at this point all that is left are prime numbers
   SqlDataRecord record =
      new SqlDataRecord(
         new SqlMetaData("PrimeNumber", SqlDbType.Int)) ;
   SqlContext.Pipe.SendResultsStart(record);
   foreach (SieveEntry prime in mySieve) {
      record.SetInt32(0, prime.Number)  ;
      SqlContext.Pipe.SendResultsRow(record);
   }
   SqlContext.Pipe.SendResultsEnd();
}

Interestingly, the above code is very simple. It is simply split in two parts. The first part calculates prime numbers using a strongly typed custom collection called NumberSieve that is 2-based. The NumberSieve holds SieveEntry instances.

Note: You can check the associated code download for the actual implementation of the associated business objects, NumberSieve and SieveEntry. Because that code demonstrates a rather interesting use of Generics and strongly typed collections, it is worth a look, although out of scope of this article.

Once the prime numbers are filtered out, the second part of the above stored procedure sends the results back to the caller. You can pick this code apart to see what it is doing.

The first thing the code does is set up the structure of the records that the tabular data will contain:

   SqlDataRecord record =
      new SqlDataRecord(
         new SqlMetaData("PrimeNumber", SqlDbType.Int)) ;
   SqlContext.Pipe.SendResultsStart(record);

The next thing it does is start sending the prime numbers one by one. Note that the custom enumerator implemented on NumberSieve instance, mySieve, returns only unmarked numbers:

   foreach (SieveEntry prime in mySieve) {
      record.SetInt32(0, prime.Number)  ;
      SqlContext.Pipe.SendResultsRow(record);
   }

At this point, the only unmarked numbers should be prime numbers.

The final thing the code does is send an end marker signifying that this tabular data has completed:

   SqlContext.Pipe.SendResultsEnd();

The first notable thing about this stored procedure is that you have to calculate all results before you can send the first. So, it places a lot of pressure on memory. Also, once you do have the results to send, you have to send them one by one using SqlContext.Pipe.SendResultsRow.

The second thing to note is that this operation is wholly contained in memory—nothing written on the disk, no interaction with the TVF you have already written. Well, one big advantage of running a completely in-memory operation is speed—up to a point. Why up to a point? Because after a certain point, you use up so much memory representing your business object instances that your performance degrades asymptotically. I ran a quick performance test on my rather average desktop, and you can see the amount of memory calculating primes took in Figure 1.

Figure 1. Sieve of Eratosthenes Stored Procedure Performance Results

As you can see, the performance degradation is not linear. The more numbers you pass in, the worse it gets. This is primarily because so much memory is being allocated in the NumberSieve instance that the garbage collector is firing up way too often, which affects your performance negatively.

So next, look at an alternate mechanism that is better suited to larger input numbers using persistent storage: a database table.

More by Author

Get the Free Newsletter!

Subscribe to Data Insider for top news, trends & analysis

Must Read