Building a Regular Expression Stream Search with the .NET Framework

String pattern matching techniques abound in Computer Science doctrine. Regular Expressions are probably the most well-known string pattern matching syntaxes. Most development tools, languages, frameworks, and libraries on all platforms contain some form of Regular Expression-like features.

In the .NET Framework, classes in the System.Text.RegularExpressions namespace contain the framework’s Regular Expression support. On a recent project, we tapped the .NET framework’s Regular Expression capabilities to search byte Streams in BizTalk POP3 messages using simple Regular Expressions. In a recent article, I covered the BizTalk aspects of our solutionm “Building a BizTalk Pipeline Content Enricher with SQL Server 2005.” In this article, I’m going to explain how we implemented a Regular Expression byte Stream search class to search the POP3 message supplied by BizTalk.

A Simple Search

You can read my prior article for a more complete explanation of our solution. So, I’m just going to outline the pattern matching requirements.

Repetitive, boring tasks are not something people generally excel at or care to do. Reviewing a daily log file received via email for infrequent errors is boring, repetitive, and above all, the activity most users eventually neglect after a period of time.

So, we decided to delegate review duties to a BizTalk POP3 Receive Port configured with a custom pipeline component. BizTalk supplied all but the pattern searching capability. All we needed to do was build a class to scan the byte Stream exposed by the underlying BizTalk classes for a pattern matching an error message in the log.

BizTalk custom component development leverages the capabilities of the .NET framework. Naturally, we first turned to the .NET framework for our solution.

Regular Expresssions and RegEx

A complete introduction to Regular Expressions is beyond the scope of this article. You can review the sources at the end of this article for more details. Our requirements dictated simple patterns such as searching for the word “Error.” Regular Expression syntax, though, can support complicated multi-word, partial words, and optional word patterns.

The RegEx class is the Regular Expression workhorse in the .NET Framework. Some of RegEx methods and properties appear below.

[Serializable]
public class Regex : ISerializable
{
   public Regex(string pattern);
   public Regex(string pattern, RegexOptions options);
   public RegexOptions Options { get; }
   public bool IsMatch(string input);
   public bool IsMatch(string input, int startat);
   public static bool IsMatch(string input, string pattern);
   public static bool IsMatch(string input, string pattern,
      RegexOptions options);
   public Match Match(string input);
   public Match Match(string input, int startat);
   public static Match Match(string input, string pattern);
   public Match Match(string input, int beginning, int length);
   public static Match Match(string input, string pattern,
      RegexOptions options);
   public MatchCollection Matches(string input);
   public MatchCollection Matches(string input, int startat);
   public static MatchCollection Matches(string input,
      string pattern);
   public static MatchCollection Matches(string input,
      string pattern, RegexOptions options);

RegEx sports a variety of static methods to do different types of Regular Expression string searches. I’ll cover how we used RegEx later in this article.

RegEx can be instantiated or you can opt to use the static functions. Instantiating the class allows you to save the Regular Expression inside the instantiated class.

As you may have noticed, though, RegEx methods only accept strings. We needed to work with Streams. Luckily, though, .NET Streams can be converted to and from strings.

Working with Streams

A complete introduction to the Stream class is beyond the scope of this article. Because our goals were to search Streams for a Regular Expression and the .NET RegEx class only accepts strings, I’m going to focus on how we converted Streams to strings.

Streams are simply raw bytes of data. In .NET, the Stream class in the System.IO namespace is the base class for a variety of other Stream classes. Methods and properties of the Stream class appear below.

[Serializable]
[ComVisible(true)]
public abstract class Stream : MarshalByRefObject, IDisposable
{
   public abstract bool CanRead { get; }
public abstract bool CanSeek { get; }
   public virtual bool CanTimeout { get; }
   public abstract bool CanWrite { get; }
   public abstract long Length { get; }
   public abstract long Position { get; set; }
   public virtual int ReadTimeout { get; set; }
   public virtual int WriteTimeout { get; set; }
   public virtual IAsyncResult BeginRead(byte[] buffer,
      int offset, int count, AsyncCallback callback, object state);
   public virtual IAsyncResult BeginWrite(byte[] buffer,
      int offset, int count, AsyncCallback callback, object state);
   public virtual void Close();
   protected virtual WaitHandle CreateWaitHandle();
   public void Dispose();
   public virtual int EndRead(IAsyncResult asyncResult);
   public virtual void EndWrite(IAsyncResult asyncResult);
   public abstract void Flush();
   public abstract int Read(byte[] buffer, int offset, int count);
   public virtual int ReadByte();
   public abstract long Seek(long offset, SeekOrigin origin);
   public abstract void SetLength(long value);
   public static Stream Synchronized(Stream stream);
   public abstract void Write(byte[] buffer, int offset,
      int count);
   public virtual void WriteByte(byte value);
}

As you can see, the methods above, as fitting for a byte stream, read and write byte data. Converting bytes to a string is the role of some encoding classes in the System.Text namespace. In our solution, we used the ASCIIEncoding class. The following example illustrates how you can convert bytes to a string by using the ASCIIEncoding class.

ASCIIEncoding encoder = new ASCIIEncoding();
copyValue = encoder.GetString(data);

At this point, we have all the tools to compose a solution. There is, however, one other issue to address before we’re ready to assemble the solution.

More by Author

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Must Read