Emitting Code with Reflection

In the "Matrix" the character played by Lawrence Fishburne, alluding to the rabbit hole in Carroll's "Alice in Wonderland", asks Neo if he ever wondered how deep the rabbit hole goes. Neither the world in the "Matrix" is what it seems, nor is Lewis Carroll what he seems. Carroll was really Charles Dodgson, a Cambridge mathematician and Neo's world is a world of artificial intelligence and human Duracell batteries.

Reflection is Alice's rabbit hole. (One pill makes you smaller and one pill makes you tall, or is it taller.) How deep is Reflection? I think the answer is no one knows yet. On the surface Reflection can by used to support dynamic type discovery. Dig a little deeper and reflection can be used to emit IL (Intermediate Language) dynamically. In twentieth century vernacular emitting IL equates to a compiled program writing compiled code.

Microsoft's .NET Reflection technology allows you to write code that effectively writes more code. Reflection can write complex types; for example, you can emit classes. (It will be interesting to see if someone writes a super worm that evolves over time by emitting IL. The security model is supposed to prevent managed code from ever being able to do this, but there are a lot of smart people that don't work for Microsoft. I digress.)

If a programmer can write code that emits more code then surely a computer program can be written to dynamically write code. And, a smart computer program will likely be written that employs common Patterns and uses Reflection to emit smart code. It is likely that a fundamental step in true artificial intelligence is the ability for a computer to be able to identify and codify smart abstractions. I'm not an expert on AI, but my understanding of the current state of the technology is that data is dynamic, and dynamic code is supported to a limited extent. With emitted IL, reflection, and the ability to emit complex types, there may be significant advances in AI in the near future.

The implications of .NET and the impact of Reflection is supposition, but imagination is the first step toward innovation.

What does all of this have to do with the average programmer? That is a good question. Part of the answer is that reflection is used for dynamic type conversion, different kinds of type binding, dynamic type discovery, and emitting IL. A second part of the answer is that it is unknown, and that is what makes reflection and emitting IL exciting. To discover the answer to the second half of the riddle you will have to learn to use reflection and figure out where it is handy for your individual projects. This article provides you with a brief overview of classes in the System.Reflection.Emit namespace and demonstrates an advanced aspect of reflection by showing you how to emit a compiled regular expression to an assembly at runtime.

A Brief Explanation of Reflection

There was a concept known as RTTI introduced-if memory serves-in the 1990s. RTTI stands for runtime type information. Microsoft provided a variation of RTTI through the interface method QueryInterface. QueryInterface allowed a programmer to inquire if an object supported a specific COM interface.

Reflection is an evolution of run time type information and the ability to query interfaces in the simplest understanding of it. Reflection allows a programmer to write code that supports inquiring about all of the namespaces, types, methods, fields, properties, events, modules, parameters, constructors, and references in an assembly. (An assembly is basically an application that carries around extra information referred to as metadata.)

Reflection supports using this discovered information to invoke and interact with the discovered types.

A Brief Explanation of Emitting IL

IL is an intermediate language that is somewhere between the Visual Basic .NET code you write and assembly code for the specific platform you are writing on. (Yes, I am suggesting that there will be a Visual Basic .NET for Linux, but it is an educated assumption.) Think of IL as roughly analogous to Java byte code. You can use the ildasm.exe utility to view IL code in an assembly. Listing 1 provides an example of MSIL (Microsoft IL).

Listing 1: Microsoft IL code viewed in the ildasm.exe utility.

.method public hidebysig static void  Main() cil managed
{
  .entrypoint
  .custom instance void [mscorlib]System.STAThreadAttribute::.ctor() = 
               ( 01 00 00 00 ) 
  // Code size       14 (0xe)
  .maxstack  8
  IL_0000:  nop
  IL_0001:  newobj     instance void EmitDemo.Form1::.ctor()
  IL_0006:  call       
     void [System.Windows.Forms]System.Windows.Forms.Application::Run(
       class [System.Windows.Forms]System.Windows.Forms.Form)
  IL_000b:  nop
  IL_000c:  nop
  IL_000d:  ret
} // end of method Form1::Main

If you are familiar with assembly then the IL in listing 1 will look a bit familiar. The code in listing 1 is the IL code that represents a shared Main method, constructs an instance of a form, and passes that form object to the shared Application.Run method.

When you compile your Visual Basic .NET application the source code is written and stored as MSIL. When you actually run your application the IL is JITted (Just-In-Time) compiled to machine code and executed.

When you use reflection to emit code you will be writing code to emit IL, such as demonstrated in listing 1. You will have to write code that emits all of the elements an assembly is likely to have, such as classes, methods, and lines of code. Fortunately there are classes and types, like the Builders, ILGenerator, and OpCodes, that do a significant amount of work for you. Unfortunately this is still not a task to take lightly. Writing code that emits IL requires care and a significant amount of work, but there are people working on tools that are built on top of the existing classes in the Emit namespace. (We'll get back to this in a moment.)

Builders, the ILGenerator, and Opcodes

The System.Reflection.Emit namespace contains classes that end with the Builder suffix, and ILGenerator class, and an OpCodes class that contains individual MSIL instructions. Combining the specific kinds of builders you can emit specific kinds of code. For example, the MethodBuilder class is used to emit methods. The OpCodes class contains shared fields that represent individual IL statements. For instance, the newobj statement in listing 1 is emitted using the OpCodes.Newobj field. The ILGenerator is used to emit the IL represented by the builder objects and the OpCodes to an assembly.

I am not going to create a verbose example in this column. (For an example of emitted IL you can check out my upcoming book, "The Visual Basic .NET Developer's Book" from Addison-Wesley available Fall 2002.) Instead what we will do is look at an example of tools that already exists to emit regular expressions to an assembly. If you read "Programming with Regular Expressions" earlier this month then you know that compiled regular expressions can support advanced string pattern matching and replacing. You will also know that compiled regular expressions provide improved performance. Let's examine the capabilities already in .NET that support emitting regular expressions to an assembly.

Emitting Compiled Regular Expressions

The System.Text.RegularExpressions.Regex class contains a shared method named CompileToAssembly. Create a RegexCompilationInfo object and pass it to the CompileToAssembly method. A tool builder at Microsoft, or on behalf of Microsoft, used the System.Reflection.Emit classes to simplify emitting a regular expression to IL and an assembly. Listing 2 demonstrates an example that will emit a regular expression to a DLL assembly.

Listing 2: Emitting regular expressions to an external assembly.

Private Sub EmitRegularExpressionAssembly(ByVal Expression As String, _
  ByVal Title As String)

  Dim CompilationInfo() As RegexCompilationInfo = _
    {New RegexCompilationInfo(Expression, RegexOptions.Compiled, _
    Title, "CompiledRegularExpressions", True)}

  Dim Name As AssemblyName = New AssemblyName()
  Name.Name = "Regex"
  Regex.CompileToAssembly(CompilationInfo, Name)

End Sub

Regex.CompileToAssembly accepts an array of RegexCompilationInfo objects to write to the assembly. Listing 2 uses an inline example that demonstrates how to create an array as an inline argument. Here is the verbose version of the argument to the RegexCompilationInfo constructor.

Dim Info As RegexCompilationInfo = New RegexCompilationInfo(Expression, _
  RegexOptions.Compiled, Title, "CompiledRegularExpressions", True)
Dim CompilationInfo() As RegexCompilationInfo = New RegexCompilationInfo() _
  {Info}

Generally, we avoid temporary variables because they simply add overhead. The Expression argument represents a regular expression, such as ^\d+$. The expressions ^\d+$ can be used by Regex.IsMatch to determine if a string is comprised of numeric characters. The RegexOptions.Compiled indicates that the expression will be a compiled regular expression (refer to "Programming with Regular Expressions" in VB Tech Notes April 2002 for more information.) The Title argument is the name of the regular expression, and the literal "CompiledRegularExpressions" will be the namespace in the emitted assembly. The final argument indicates the access modifier applied to the regular expression class.

The second argument to Regex.CompileToAssembly is an AssemblyName object. The AssemblyName object includes information like cultural information as well as the name of the assembly. After CompileToAssembly executes a DLL is written to disk.

The emitted assembly will contain the namespace "CompiledRegularExpressions" with a class named whatever the Title argument was. The class in the emitted assembly will be subclassed from System.Text.RegularExpressions.Regex and initialized with the regular expression represented by the Expression argument. After the regular expression assembly is emitted you can use reflection to dynamically load the assembly and create an instance of the class, or you can add a reference to the emitted assembly and use it as you would any other assembly. The code in listing 3 demonstrates how to invoke the method in listing 2 and dynamically load the assembly and invoke the IsMatch method in the new class.

Listing 3: Loading an emitted regular expression assembly and testing the regular expression.

EmitRegularExpressionAssembly("^\d+$", "Digits")
Dim A As [Assembly] = [Assembly].LoadFrom("Regex.dll")

Dim Expression As Object = _
    A.CreateInstance("CompiledRegularExpressions.Digits")
MsgBox(Expression.IsMatch("123"))

The first statement emits the new assembly. The second statement loads the DLL containing our assembly. The third statement creates a dynamic instance of the new type, Digits. The fourth statement tests the regular expression. Assuming everything went okay the MsgBox method should display True.

Summary

Regular expressions provide powerful pattern matching and text replacement capabilities. If you are providing regular expressions to your customers you might elect to compile those that are used heavily. You do not need to emit regular expressions to use them. Nor do you need to emit regular expressions to use the RegexOptions.Compiled option to compile a regular expression. However, you may want to emit regular expressions that are used heavily or discovered by users to provide optimal performance and a unique way of sharing the expressions, simply share the emitted assemblies.

Reflection, emitting IL, and regular expressions are just three advantages that those who adopt .NET will have over their competition. My intuition tells me these are significant advantages. Experience with .NET software development has proven time and again that my intuition was right.

About the Author

Paul Kimmel is a freelance writer for Developer.com and CodeGuru.com. Look for his recent book Visual Basic .Net Unleashed at a bookstore near you.



Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: September 10, 2014 @ 11:00 a.m. ET / 8:00 a.m. PT Modern mobile applications connect systems-of-engagement (mobile apps) with systems-of-record (traditional IT) to deliver new and innovative business value. But the lifecycle for development of mobile apps is also new and different. Emerging trends in mobile development call for faster delivery of incremental features, coupled with feedback from the users of the app "in the wild". This loop of continuous delivery and continuous feedback is …

  • The first phase of API management was about realizing the business value of APIs. This next wave of API management enables the hyper-connected enterprise to drive and scale their businesses as API models become more complex and sophisticated. Today, real world product launches begin with an API program and strategy in mind. This API-first approach to development will only continue to increase, driven by an increasingly interconnected web of devices, organizations, and people. To support this rapid growth, …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds