Performance Matters: Choose Your Library Wisely

Recently, someone forwarded me a link to an article describing itself as an independent benchmark, designed to compare the performance of Java, C#, and VB.NET. The conclusion was that Java was faster than VB but slower than C#. This caused a lot of consternation among folks who weren't expecting to see a performance difference between VB and C# at all. Sure enough, it turned out that the difference was that the code in the comparison apps was not, in fact, the same. The C# code looked like this:

/**
 * Write max lines to a file, then read max lines back in from file.
 */
static long io(int ioMax)
{
   long elapsedMilliseconds;
   startTime = DateTime.Now;

   String fileName = "C:\\TestCSharp.txt";
   String textLine = 
          "abcdefghijklmnopqrstuvwxyz1234567890
           abcdefghijklmnopqrstuvwxyz1234567890abcdefgh";
   int i = 0;
   String myLine = "";
   try
   {
     StreamWriter streamWriter = new StreamWriter(fileName);
      while (i++ < ioMax)
      {
         streamWriter.WriteLine(textLine);
      }
      streamWriter.Close();

      i = 0;
      StreamReader streamReader = new StreamReader(fileName);
      while (i++ < ioMax)
      {
         myLine = streamReader.ReadLine();
      }
   }
   catch (IOException e)
   {
     System.Console.Write(e.Message);
   }

   stopTime = DateTime.Now;
   elapsedTime = stopTime.Subtract(startTime);
   elapsedMilliseconds = (long)elapsedTime.TotalMilliseconds;

   Console.WriteLine("IO elapsed time: " + elapsedMilliseconds + 
                     " ms with max of " + ioMax);
   return elapsedMilliseconds;
}

This code writes a long line of text to a file over and over, and then reads it. It's using a StreamReader and a StreamWriter, from the System::IO namespace. Now, from the same benchmark, here's the VB code:

Function IO(ByVal ioMax As Integer) As Long
   Dim milliseconds As Long
   startTime = Now
   Dim fileName As String = "C:\TestVB.txt"
   Dim i As Integer = 0
   Dim myString As String = _
       "abcdefghijklmnopqrstuvwxyz1234567890 _
        abcefghijklmnopqrstuvwxyz1234567890abcdefgh"
   Dim readLine As String
   FileOpen(1, fileName, Microsoft.VisualBasic.OpenMode.Output)
   Do While (i < ioMax)
      PrintLine(1, myString)
      i += 1
   Loop
   FileClose(1)
   FileOpen(2, fileName, Microsoft.VisualBasic.OpenMode.Input)
   i = 0
   Do While (i < ioMax)
      readLine = LineInput(2)
      i += 1
   Loop
   FileClose(2)

   stopTime = Now
   elapsedTime = stopTime.Subtract(startTime)
   milliseconds = CLng(elapsedTime.TotalMilliseconds)
   Console.WriteLine("I/O elapsed time: " & milliseconds & _
                     " ms with a max of " & ioMax)
   Console.WriteLine(" i: " & i)
   Console.WriteLine(" readLine: " & readLine)
   Return milliseconds
End Function

This code does the same thing: Writes text to a file and then reads it back, but it's using FileOpen and the like. These are "compatibility" functions for Visual Basic programmers: They're in the Microsoft::VisualBasic namespace and you have to add a reference to Microsoft.VisualBasic.dll to use them. VB.NET apps get the reference automatically, which makes life simpler for VB 6 programmers who are moving to VB.NET. There's something else you should know about these backward-compatible functions: They're slow. Much slower than the StreamReader and StreamWriter combination.

To prove it, I decided to convert both these code snippets into Managed C++, so there's no language issue in the comparison. I wrote a console application to keep everything simple, and put the timing and exception-handling code in the main(), then called separate functions for the actual I/O loops. The main reads like this:

int _tmain()
{
    DateTime startTime, stopTime;
    TimeSpan elapsedTime;
    long elapsedMilliseconds;

    startTime = DateTime::Now;
    try
    {
        IOStreamWriter::main();
    }
    catch (IOException* e)
    {
        Console::WriteLine(e->Message);
    } 
    stopTime = DateTime::Now;
    elapsedTime = stopTime.Subtract(startTime);
    elapsedMilliseconds = (long)elapsedTime.TotalMilliseconds;

    Console::WriteLine("StreamWriter elapsed time: {0} ms with
                        max of {1}" ,
                        __box(elapsedMilliseconds), __box(ioMax));

    startTime = DateTime::Now;
    try
    {
        IOFileOpen::main();
    }
    catch (IOException* e)
    {
        Console::WriteLine(e->Message);
    }
    stopTime = DateTime::Now;
    elapsedTime = stopTime.Subtract(startTime);
    elapsedMilliseconds = (long)elapsedTime.TotalMilliseconds;
    Console::WriteLine("FileOpen elapsed time: {0} ms with max
                        of {1}" , 
        __box(elapsedMilliseconds), __box(ioMax));
}

(To be honest, it's a little more complicated than this, because on each run of main(), I want to run only one of these blocks to be sure they don't interfere with each other, but I'm leaving that out here to keep it simple.)

The C# block reads like this in C++:

int i = 0;

String* fileName = "C:\\TestStreamWriter.txt";
String* textLine = 
        "abcdefghijklmnopqrstuvwxyz1234567890
         abcdefghijklmnopqrstuvwxyz1234567890abcdefgh";
String* Line = "";

StreamWriter* streamWriter = new StreamWriter(fileName);
while (i++ < ioMax)
{
    streamWriter->WriteLine(textLine);
}
streamWriter->Close();

i = 0;
StreamReader* streamReader = new StreamReader(fileName);
while (i++ < ioMax)
{
    Line = streamReader->ReadLine();
}

All I had to do to convert the C# was to change all managed objects to managed pointers (String in C# is String* in C++.) I added a using namespace directive to keep this code shorter:

using namespace System::IO;

The VB block was harder to convert. I added the reference to Microsoft.VisualBasic.dll and made all the necessary rearrangements for the language syntax (Dim i As Integer becomes int i;) but that wasn't the end of it. The optional parameters in the VB definition of the function aren't optional from C++, so I had to look in the online help for the default values of those parameters and pass them in. What's more, PrintLine doesn't take a string as its second parameter; it takes an array of Object references. I had to create an array and then set the first element to the string I wanted to print. (If you ever have to write C++ code to call a .NET Base Class Library method that takes an array of objects, remember this sample.) The same thing happened with FileClose, which takes an array of integers, not just one integer. Here's how the VB code looked when the conversion to C++ was complete:

    String* fileName = "C:\\TestFileOpen.txt";
    String* textLine = _
   "abcdefghijklmnopqrstuvwxyz1234567890 _
    abcdefghijklmnopqrstuvwxyz1234567890abcdefgh";
    String* Line = "";

    Microsoft::VisualBasic::FileSystem::FileOpen(1, fileName,
         Microsoft::VisualBasic::OpenMode::Output, 
         Microsoft::VisualBasic::OpenAccess::Default, 
         Microsoft::VisualBasic::OpenShare::Default,-1);
    String* array __gc[]= { textLine };
    int i=0;
    while (i < ioMax)
    {
        Microsoft::VisualBasic::FileSystem::PrintLine(1, array);
        i += 1;
    }
    int  handle __gc[]= { 1 };
    Microsoft::VisualBasic::FileSystem::FileClose(handle);
    Microsoft::VisualBasic::FileSystem::FileOpen(2, fileName, 
        Microsoft::VisualBasic::OpenMode::Input, 
        Microsoft::VisualBasic::OpenAccess::Default,
        Microsoft::VisualBasic::OpenShare::Default,-1);
    i = 0;
    while (i < ioMax)
    {
        Line = Microsoft::VisualBasic::FileSystem::LineInput(2);
        i += 1;
        //Console::WriteLine(" i: {0}", __box(i));
        //Console::WriteLine(" Line: {0}", Line);
    }
    handle[0] = 2;
    Microsoft::VisualBasic::FileSystem::FileClose(handle);

A using namespace directive here would make things simpler but I want to emphasize that this C++ code is using the VB library. It's weird, but you can do it. I'm about to show you why you don't want to.

I ran this code repeatedly, a Release build, and I ignored the first few runs to be sure that all the code was JITted, and that anything likely to be cached was cached. After jumping around a bit at first, it settles into a steady pattern: The originally VB code takes about seven times as long as the originally C# code. The difference isn't the language, it's the library—and it's a big difference!

Another pattern you're likely to see in originally VB code is references set to Nothing instead of a call to Close() or Dispose(), often with horrendous performance consequences. Does that mean VB.NET is a slow language? Of course not. It does mean that VB 6 programmers need to learn what to do to write high-performance code in VB.NET. If you knew how to use FileOpen and the other methods, why would it occur to you that you needed to go and learn a completely different technique with StreamReader and StreamWriter?

So at this point, perhaps you're thinking that C++ programmers are immune to this sort of old-style thinking that leads them to use the slow, old way instead of the shiny, new, fast way that's in the Base Class Libraries. After all, if we wanted to write to a file, couldn't we just use the ofstream and ifstream approach, with << and >>? That should be nice and fast, shouldn't it?

Well, I tried it. I found the STL way really frustrating with String* and so I redid the whole thing with char*. Here's how that looks:

char* cfileName = "C:\\TestCRT.txt";
char cLine[255];
char* ctextLine =
      "abcdefghijklmnopqrstuvwxyz1234567890
       abcdefghijklmnopqrstuvwxyz1234567890abcdefgh";

int i = 0;

ofstream of;
of.open(cfileName);
while (i++ < ioMax)
{
    of << ctextLine << '\n';
}
of.close();

i = 0;
ifstream ifs;
ifs.open(cfileName);
while (i++ < ioMax)
{
    ifs.getline(cLine,250);
}
ifs.close();

I bet you expect this to be the fastest of the bunch. Well, it's not! Sure, it's quicker than the FileOpen approach, but it's about three times as long as the StreamWriter. (What makes it slower? I'd have to see the insides of all these functions to be sure, but it doesn't seem to be interop transitions, because I tried pulling this whole function into another file and compiling that file as unmanaged, but didn't improve the performance.) So now, who needs to learn the new libraries to make sure their managed applications are as fast as possible?

There are two morals to this story. The first, and most important, is that benchmarking is not for the faint of heart. You have to know what you're measuring—and it isn't always what you think you're measuring. In the original article, the author reported a "difference" between two languages that was actually just a difference between the library choices some users of those languages are likely to make. When I write in VB.NET, I don't use those compatibility functions because I never did large volumes of VB 6 programming, so I don't have those habits. Don't assume all VB.NET programmers will do things the old, slow way.

The second is that the classes in System::IO are the fastest way to work with files. It's worth taking the time, when you find two ways to do the same thing, to discover which is faster. And if you only find one way? Maybe it's worth putting a little effort into seeing if there's another way.

About the Author

Kate Gregory is a founding partner of Gregory Consulting Limited (www.gregcons.com). In January 2002, she was appointed MSDN Regional Director for Toronto, Canada. Her experience with C++ stretches back to before Visual C++ existed. She is a well-known speaker and lecturer at colleges and Microsoft events on subjects such as .NET, Visual Studio, XML, UML, C++, Java, and the Internet. Kate and her colleagues at Gregory Consulting specialize in combining software develoment with Web site development to create active sites. They build quality custom and off-the-shelf software components for Web pages and other applications. Kate is the author of numerous books for Que, including Special Edition Using Visual C++ .NET.




Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • With JRebel, developers get to see their code changes immediately, fine-tune their code with incremental changes, debug, explore and deploy their code with ease (both locally and remotely), and ultimately spend more time coding instead of waiting for the dreaded application redeploy to finish. Every time a developer tests a code change it takes minutes to build and deploy the application. JRebel keeps the app server running at all times, so testing is instantaneous and interactive.

  • Hurricane Sandy was one of the most destructive natural disasters that the United States has ever experienced. Read this success story to learn how Datto protected its partners and their customers with proactive business continuity planning, heroic employee efforts, and the right mix of technology and support. With storm surges over 12 feet, winds that exceeded 90 mph, and a diameter spanning more than 900 miles, Sandy resulted in power outages to approximately 7.5 million people, and caused an estimated $50 …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds