Application Modernization: What Is It and How to Get Started
The Common Language Runtime (CLR) — or more precisely any implementation of the Common Language Infrastructure (CLI) specification — executes code inside the bounds of a well-defined type system, called the Common Type System (CTS). The CTS is part of the CLI, and is maintained via the ECMA and International Organization for Standardization (ISO) international standards bodies. It defines a set of structures and services that programs targeting the CLR may use, including a rich type system for building abstractions out of built-in and custom abstract data-types. The CTS constitutes the interface between managed programs and the runtime itself, in a language agnostic manner.
As a brief example of the diversity of languages that the CTS supports, consider four examples, each of which has a publicly available compiler targeting the CLR: C#, C++/CLI, Python, and F#:
- C# is a (mostly) statically typed, imperative, C-style language. It offers very few features that step outside of the CLR's verifiable type-safety, and employs a heavily object-oriented view of the world. C# also offers some interesting functional language features such as first class functions and their close cousins, closures, and continues to move in this direction with the addition of, for example, type inferencing and lambdas in new versions of the language. This is, at the time of this writing, the most popular programming language on the CLR platform.
- C++/CLI is an implementation of the C++ language targeting the CTS instruction set. Programmers in this language often step outside of the bounds of verifiable type safety, directly manipulating pointers and memory segments. The compiler does, however, support compilation options to restrict programs to a verifiable subset of the language. The ability to bridge the managed and unmanaged worlds with C++ is amazing, enabling many existing unmanaged programs to be recompiled under the CLR's control, of course with the benefits of Garbage Collection and (mostly) verifiable IL.
- Python, like C#, deals with data in an object-oriented fashion. But unlike C# — and much like Visual Basic — it prefers to infer as much as possible and defer as many decisions until runtime that would have traditionally been resolved at compile time. Programmers in this language never deal directly with raw memory, and always live inside the safe confines of verifiable type safety. Productivity and ease of programming are often of utmost importance for such dynamic languages, making them amenable to scripting and lightweight program extensions. But they still must produce code that resolves typing and other CLR-related mapping issues somewhere between compile- and runtime. Some say that dynamic languages are the way of the future. Thankfully, the CLR supports them just as well as any other type of language.
- Lastly, F# is a typed, functional language derived from O'Caml (which is itself derived from Standard ML), which offers type inferencing and scripting-like interoperability features. F# certainly exposes a very different syntax to the programmer than, say, C#, VB, or Python. In fact, many programmers with a background in C-style languages might find the syntax quite uncomfortable at first. It offers a mathematical style of type declarations and manipulations, and many other useful features that are more prevalent in functional languages, such as pattern matching. F# is a great language for scientific- and mathematical-oriented programming.
Each of these languages exposes a different view of the type system, sometimes extreme yet often subtle, and all compile into abstractions from the same CTS and instructions from the same Common Instruction Language (CIL). Libraries written in one language can be consumed from another. A single program can even be composed from multiple parts, each written in whatever language is most appropriate, and combined to form a single managed assembly. Also notice that the idea of verification makes it possible to prove type safety, yet work around entire portions of the CTS when necessary (such as manipulating raw memory pointers in C++). The security system provides facilities for placing restrictions on the execution of unverifiable code.
The Importance of Type Safety
Not so long ago, unmanaged assembly, C, and C++ programming were the de facto standard in industry, and types — when present — weren't much more than ways to name memory offsets. For example, a C structure is really just a big sequence of bits with names to access precise offsets from the base address. That is, fields. Pointers to structures can be used to point at incompatible instances and data can be indexed into and manipulated freely. C++ is admittedly a huge step in the right direction. But there generally wasn't any runtime system enforcing that memory access followed the type system rules at runtime. In all unmanaged languages, there was a way to get around the illusion of type safety.
This approach to programming has proven to be quite error prone, leading to hard bugs and a movement toward completely type-safe languages. (To be fair, languages with memory safety were available well in advance of C. LISP, for instance, uses a virtual machine and garbage collected environment similar to the CLR.) Over time, safe languages and compilers have grown in popularity, as has using static detection to notify developers about operations that could lead to memory errors. Other languages such as VB6 and Java, for example, fully employ type safety through a runtime, to increase programmer productivity and robustness of programs. If language constructs were permitted to bypass compiler type checking, the runtime will catch and deal with illegal casts in a controlled manner at runtime, for instance by throwing an exception. The CLR follows in this spirit.
Proving Type Safety
The CLR execution environment takes the responsibility of ensuring that type safety is proven prior to executing any code. This safety cannot be subverted by untrusted malicious programs, ensuring that memory corruption is not possible. This only strictly applies to verifiable code. By using unverifiable code constructs, you can create programs that violate these restrictions wholesale. Doing so generally means that your programs won't be available to execute in partial trust without a special security policy.
There are also situations where unmanaged interoperability supplied by a trusted library can be tricked into performing incorrect operations. For example, if a trusted managed API in the Base Class Libraries (BCL) blindly accepts an integer and passes it to an unmanaged bit of code, that unmanaged code might use the integer to index into an array. A malicious user could intentionally pass an invalid index to provoke a buffer overflow. It is the responsibility of trusted library developers to ensure that such program errors are not present.