Full Text Search: The Key to Better Natural Language Queries for NoSQL in Node.js
Many developers (especially C/C++ developers) are concerned about JIT compilation because the end-user's machine is compiling code on the fly, hurting the application's overall performance. I used to be very concerned about this performance cost as well, but having written, built, and used many managed assemblies for several years now, my experience has been that the performance loss due to JIT compiling is not that great. In fact, I believe that JIT compiling is awesome and is capable of producing code that is more efficient and more optimized than a compiler that produces native CPU instructions. For example:
- The JIT compiler knows precisely which CPU the user has installed in their computer causing the JIT compiler to produce native instructions that are specific to the user's machine. When you use an unmanaged compiler, the compiler usually emits code for an Intel Pentium Pro, Pentium II, Pentium III, or Pentium 4 processor.
- The JIT compiler knows if the machine has a single CPU or multiple CPUs installed. If a single CPU is installed, certain thread synchronization mechanisms don't have to be employed.
- When a method is JIT compiled, the compiler emits the native CPU instructions. Some of these instructions contain memory addresses that refer to variables or methods. By contrast, an unmanaged compiler and linker emit native CPU instructions that contain memory addresses when building the resulting file. This file must contain relocation information; if Windows can't load the file at its preferred base address, then the embedded memory addresses are incorrect and they must be fixed-up by the Windows' loader. Dynamic relocation (rebasing) significantly hurts the load time of unmanaged code.
For these reasons and more, JIT compiled code is poised to be a clear performance winner when compared to unmanaged compilers and linkers. And, of course, Microsoft is working quite hard at improving the CLR and its JIT compiler so that it runs faster, produces more optimized code, and uses memory more efficiently.
By far, the most common way to run NGen.exe is by simply specifying the pathname of an .exe or .dll assembly file (using no command-line switches). When you invoke NGen.exe, it loads the CLR and tells the CLR to load the assembly. NGen.exe then forces the JIT compiler to compile every method's IL code into native CPU instructions. Each method's native method code is then collected and all of the code is emitted into a new file that NGen.exe creates. This new file is placed in a directory (something like C:\Windows\Assembly\NativeImages1_v1.0.3705). NGen.exe could be run as part of the installation process of an assembly.
On the surface, this sounds great! It sounds like you get all the benefits of managed code (garbage collection, verification, type safety, and so on) without all the performance problems of managed code. However, in reality JIT compilation is only a small portion of the performance cost related to managed code.
Meanwhile, there are several potential problems with respect to NGen'd files:
- No Intellectual Property Protection. Many people believe that it might be possible to ship NGend files without shipping the files containing the original IL code thereby keeping their intellectual property a secret. Unfortunately, this is not possible. At runtime, the CLR requires access to the assemblys metadata and the NGend files do not contain the metadata.
- NGend Files Can Get Out-Of-Sync. When the CLR loads an NGen'd file it compares a number of attributes about the previously-compile code and the current execution environment. If any of the attributes don't match then the NGen'd file cannot be used and the normal JIT compiler process is used instead.
- Poor Administration. NGen'd file are not automatically deleted when an assembly is uninstalled adversely affecting the .NET Frameworks easy administration and XCOPY deployment story.
- Inferior Load-Time Performance (Rebasing). When Windows loads an NGend file, it checks to see if the file loads at its preferred base address. If the file cant load at its preferred base address, then Windows relocates the file, fixing-up all of the memory address references. This is extremely time consuming because Windows must load the entire file into memory and modify various bytes within the file. For more information about rebasing please see my book: Programming Applications for Microsoft Windows, 4th Edition (Microsoft Press).
- Inferior Execution-Time Performance. When compiling code, NGen cant make as many assumptions about the execution environment as the JIT compiler can. This causes NGen.exe to produce code with a number of memory-reference indirections that arent necessary for JIT compiled code.
- Ignored NGen'd File in Some Domain Load Scenarios. In brief, assemblies can be loaded in a domain-neutral or non-domain-neutral fashion. NGen.exe produces code that assumes that the only assembly loaded in a domain-neutral fashion is MSCorLib.dll (which contains the definition for Object, Int32, String, and more). If, at runtime, it is the case that other assemblies are loaded in a domain neutral fashion, the CLR cannot use code produced by NGen.exe and will resort to JIT compilation. For ASP.NET applications (Web Forms and XML Web services), strongly-named assemblies are always loaded in a domain-neutral fashion and therefore they gain no performance benefit from having a corresponding NGend file.
Due to all the issues listed above, I recommend that NGen.exe only be used for client applications and only when testing shows a measurable difference in load time. In terms of runtime performance NGen.exe will actually hurt performance instead of improve it. Certainly, for server-side applications, NGen.exe makes no sense since only the first client request experiences a performance hit; future client requests run at excellent speed.
About the Author
Jeffrey Richter has concentrated on Windows development since version 1.0 (the version in which all windows were tiled and there was no color). Jeffrey is a Wintellect cofounder and a member of the .NET team at Microsoft. He has also worked on Windows 9x, Windows NT/2000, Microsoft Golf, Visual Studio and Visual C++, and other projects for companies such as Intel and DreamWorks.
Jeffrey has written several books on Windows programming, including Advanced Windows, Programming Applications for Microsoft Windows(formerly Advanced Windows), and Programming Server-Side Applications for Microsoft Windows, all published by Microsoft Press.
# # #