Changes in System.IO classes in .NET Framework 4.0


Earlier versions of .NET Framework (prior to .NET Framework 4.0 and going back to .NET 2.0) had many APIs in the System.IO namespace for enumerating lines in a file, enumerating files in a directory, etc. These APIs returned arrays which could then be looped over to process each item.

For example, if one desires to print all the lines in a file, he/she can use the File.ReadAllLines API to get an array of strings (each string representing a line) which can then be printed by iterating over the array.

  string[] allLines = File.ReadAllLines("foo.txt");
  foreach(string line in allLines)

Similarly, to get a list of all files/directories in a directory, you call the GetFiles/GetDirectories API on DirectoryInfo class.

  DirectoryInfo dirInfo = new DirectoryInfo(@"c: windowssystem32");
  FileInfo[] arrayFileInfo = dirInfo.GetFiles();
  DirectoryInfo[] arrayDirectoryInfo = dirInfo.GetDirectories();

Issues With the Old APIs

While the above APIs worked as expected, there was a considerable performance hit when they were exercised on large files.
The performance hit is coming from the fact that the APIs mentioned above are synchronous. i.e. The operation is blocked until all the lines in a file are read (to populate the array). Imagine the time the operation will take when you are parsing a 1 GB log file. Another issue on a memory-constrained execution environment will be the amount of memory needed to allocate the array. If you were only interested in the first few lines, you still have to pay the penalty of loading all the lines in memory.

New APIs in .NET Framework 4.0

To overcome these issues, in .NET Framework 4.0, the Base Class Library folks over at the Common Language Runtime team built new APIs with enumerators rather than arrays. These new APIs were extremely efficient because they didn’t read all the lines into memory at once. Also since it read only one line at a time into memory, you can abrupt your iteration at any point without having to pay the late-comers tax we saw in the older APIs.

The new APIs are:

  • File.ReadLines (+1 overload)
  • File.WriteAllLines (+1 overload)
  • File.AppendAllLines (+1 overload)
  • DirectoryInfo,EnumerateDirectories(+2 overloads)
  • DirectoryInfo,EnumerateFiles (+2 overloads)
  • DirectoryInfo.EnumerateFileSystemInfos(+2 overloads)
  • Directory.EnumerateDirectories (+2 overloads)
  • Directory.EnumerateFiles (+2 overloads)
  • Directory.EnumerateFileSystemEntries (+2 overloads)

These new work by returning an IEnumerable <t> which is a much more performant operation than an array of objects returned by the earlier methods.

The application developer can use the returned iterator to iterate, reducing the startup disk I/O experienced in the older APIs.

Hands On

Here is how the APIs can be used:

  using System;
  using System.Collections.Generic;
  using System.Linq;
  using System.Text;
  using System.IO;

  namespace FileEnumerators
      class Program
          static void Main(string[] args)
              DateTimeOffset tstart = new DateTimeOffset(DateTime.Now);

              string[] oldlines = File.ReadAllLines(Environment.ExpandEnvironmentVariables(@"%TEMP%registry.reg"));

              DateTimeOffset tstop = new DateTimeOffset(DateTime.Now);
              TimeSpan difference = tstop - tstart;
              Console.WriteLine("Time taken with old API = " + difference.ToString());
              tstart = new DateTimeOffset(DateTime.Now);
              for (int i = 0; i < oldlines.Length; i++)
                  // Dont do anything. Just cycle through
              tstop = new DateTimeOffset(DateTime.Now);
              TimeSpan cycleDifference = tstop - tstart;
              Console.WriteLine("Cycle time taken with old API = " + cycleDifference.ToString());
              tstart = new DateTimeOffset(DateTime.Now);
              IEnumerable<string> allLines = File.ReadLines(Environment.ExpandEnvironmentVariables("%TEMP%\registry.reg"));
              tstop = new DateTimeOffset(DateTime.Now);
              difference = tstop - tstart;
              Console.WriteLine("Time taken with new API = " + difference.ToString());
              tstart = new DateTimeOffset(DateTime.Now);
              foreach (string str in allLines)
                  // Dont do anything. Just cycle through
              tstop = new DateTimeOffset(DateTime.Now);
              cycleDifference = tstop - tstart;
              Console.WriteLine("Cycle time taken with new API = " + cycleDifference.ToString());

The results are very obvious. On my system (currently already pegging one of the two CPUs on a dual-core at constant value), I exported my registry to a temp file and ran the above code on that file. On a DEBUG build, the results are below:

Time taken with old API = 00:00:11.6660156
Cycle time taken with old API = 00:00:00.0087891
Time taken with new API = 00:00:00
Cycle time taken with new API = 00:00:09.9238281

The new API (which has the enumerators) does not block to read all the contents of a large file. Instead it immediately returns the enumerator (highly performant). The old API loads everything in memory so it takes an initial hit, however its cycle time is almost zero since it does not have to hit the hard drive to get the values of the strings again.

Usage of the Performant APIs

The new APIs can be very useful when you want to list all files/directories in a directory which has a lot of content including lot of sub-directories.

The new APIs can also be used where we want to read lines in a very large text file.


In the above article, we saw how the new APIs in System.IO help improve the performance by using enumerators, hence reducing the initial lookup time.

Related Article

More by Author

Must Read