Changes in System.IO classes in .NET Framework 4.0 | CodeGuru

Changes in System.IO classes in .NET Framework 4.0

Introduction Earlier versions of .NET Framework (prior to .NET Framework 4.0 and going back to .NET 2.0) had many APIs in the System.IO namespace for enumerating lines in a file, enumerating files in a directory, etc. These APIs returned arrays which could then be looped over to process each item. For example, if one desires […]

Written By
CodeGuru Staff
CodeGuru Staff
Nov 23, 2010
3 minute read
CodeGuru content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

Introduction

Earlier versions of .NET Framework (prior to .NET Framework 4.0 and going back to .NET 2.0) had many APIs in the System.IO namespace for enumerating lines in a file, enumerating files in a directory, etc. These APIs returned arrays which could then be looped over to process each item.

For example, if one desires to print all the lines in a file, he/she can use the File.ReadAllLines API to get an array of strings (each string representing a line) which can then be printed by iterating over the array.

  string[] allLines = File.ReadAllLines("foo.txt");
  foreach(string line in allLines)
  {
  	Console.WriteLine(line);
  }

Similarly, to get a list of all files/directories in a directory, you call the GetFiles/GetDirectories API on DirectoryInfo class.

  DirectoryInfo dirInfo = new DirectoryInfo(@"c: windowssystem32");
  FileInfo[] arrayFileInfo = dirInfo.GetFiles();
  DirectoryInfo[] arrayDirectoryInfo = dirInfo.GetDirectories();

Issues With the Old APIs

While the above APIs worked as expected, there was a considerable performance hit when they were exercised on large files.
The performance hit is coming from the fact that the APIs mentioned above are synchronous. i.e. The operation is blocked until all the lines in a file are read (to populate the array). Imagine the time the operation will take when you are parsing a 1 GB log file. Another issue on a memory-constrained execution environment will be the amount of memory needed to allocate the array. If you were only interested in the first few lines, you still have to pay the penalty of loading all the lines in memory.

Advertisement

New APIs in .NET Framework 4.0

To overcome these issues, in .NET Framework 4.0, the Base Class Library folks over at the Common Language Runtime team built new APIs with enumerators rather than arrays. These new APIs were extremely efficient because they didn’t read all the lines into memory at once. Also since it read only one line at a time into memory, you can abrupt your iteration at any point without having to pay the late-comers tax we saw in the older APIs.

The new APIs are:

  • File.ReadLines (+1 overload)
  • File.WriteAllLines (+1 overload)
  • File.AppendAllLines (+1 overload)
  • DirectoryInfo,EnumerateDirectories(+2 overloads)
  • DirectoryInfo,EnumerateFiles (+2 overloads)
  • DirectoryInfo.EnumerateFileSystemInfos(+2 overloads)
  • Directory.EnumerateDirectories (+2 overloads)
  • Directory.EnumerateFiles (+2 overloads)
  • Directory.EnumerateFileSystemEntries (+2 overloads)

These new work by returning an IEnumerable <t> which is a much more performant operation than an array of objects returned by the earlier methods.

The application developer can use the returned iterator to iterate, reducing the startup disk I/O experienced in the older APIs.

Hands On

Here is how the APIs can be used:

  using System;
  using System.Collections.Generic;
  using System.Linq;
  using System.Text;
  using System.IO;

  namespace FileEnumerators
  {
      class Program
      {
          static void Main(string[] args)
          {
              DateTimeOffset tstart = new DateTimeOffset(DateTime.Now);

              string[] oldlines = File.ReadAllLines(Environment.ExpandEnvironmentVariables(@"%TEMP%registry.reg"));

              DateTimeOffset tstop = new DateTimeOffset(DateTime.Now);
              TimeSpan difference = tstop - tstart;
              Console.WriteLine("Time taken with old API = " + difference.ToString());
              tstart = new DateTimeOffset(DateTime.Now);
              for (int i = 0; i < oldlines.Length; i++)
              {
                  // Dont do anything. Just cycle through
              }
              
              tstop = new DateTimeOffset(DateTime.Now);
              TimeSpan cycleDifference = tstop - tstart;
              Console.WriteLine("Cycle time taken with old API = " + cycleDifference.ToString());
              
              tstart = new DateTimeOffset(DateTime.Now);
              IEnumerable<string> allLines = File.ReadLines(Environment.ExpandEnvironmentVariables("%TEMP%\registry.reg"));
              tstop = new DateTimeOffset(DateTime.Now);
              difference = tstop - tstart;
              Console.WriteLine("Time taken with new API = " + difference.ToString());
              tstart = new DateTimeOffset(DateTime.Now);
              foreach (string str in allLines)
              {
                  // Dont do anything. Just cycle through
              }
              tstop = new DateTimeOffset(DateTime.Now);
              cycleDifference = tstop - tstart;
              Console.WriteLine("Cycle time taken with new API = " + cycleDifference.ToString());
          }
      }
  }

The results are very obvious. On my system (currently already pegging one of the two CPUs on a dual-core at constant value), I exported my registry to a temp file and ran the above code on that file. On a DEBUG build, the results are below:

Time taken with old API = 00:00:11.6660156
Cycle time taken with old API = 00:00:00.0087891
Time taken with new API = 00:00:00
Cycle time taken with new API = 00:00:09.9238281

The new API (which has the enumerators) does not block to read all the contents of a large file. Instead it immediately returns the enumerator (highly performant). The old API loads everything in memory so it takes an initial hit, however its cycle time is almost zero since it does not have to hit the hard drive to get the values of the strings again.

Advertisement

Usage of the Performant APIs

The new APIs can be very useful when you want to list all files/directories in a directory which has a lot of content including lot of sub-directories.

The new APIs can also be used where we want to read lines in a very large text file.

Summary

In the above article, we saw how the new APIs in System.IO help improve the performance by using enumerators, hence reducing the initial lookup time.

Related Article

CodeGuru Logo

CodeGuru covers topics related to Microsoft-related software development, mobile development, database management, and web application programming. In addition to tutorials and how-tos that teach programmers how to code in Microsoft-related languages and frameworks like C# and .Net, we also publish articles on software development tools, the latest in developer news, and advice for project managers. Cloud services such as Microsoft Azure and database options including SQL Server and MSSQL are also frequently covered.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.