Using ExcelPackage to Create Excel 2007 Files on the Server

Introduction

The adoption of XML as the native file format for Excel 2007 has opened up a whole host of new possibilities. Now, for the first time, you can use standard tools to create native Excel files on the server. No longer is there a need to pollute your beautiful server room with a headless PC running Excel and an old VB6 app that uses OLE Automation to create reports. Such things can be consigned to the trash heap of IT history.

However, the Office Open XML standard is so new there are precious few good code samples. This article aims to correct the situation by providing a flexible set of open source class libraries that you can use to create Excel spreadsheets on the server.

Why Create Excel Spreadsheets on the Server?

Excel has long been recognized as the de facto standard when it comes to presenting management reports. The unique combination of great calculation engine, excellent charting facilities, and the possibility to perform "what if" analysis make it the "must have" business intelligence tool.

So, when I came to replace my aging management reporting infrastructure, I set one key requirement: the new system must be web-based and provide a "download in Excel" option. For mt business intelligence project, I built the data warehouse using SQL Server 2005 populated from PeopleSoft and Novient implementations using SQL Server 2005 Integration Services (SSIS). The OLAP cube was built on SQL Server 2005 Analysis Services (SSAS). SQL Server 2005 Reporting Services (SSRS) provides the web-based access to management reports and the all important "download in Excel" option. So, why do you need to create Excel on the server?

The problem with SQL Server 2005 Reporting Services is that it the Excel spreadsheets it generates are "dumb." They contain no formula—just the raw data. So, the recipient cannot perform a "what if" analysis by changing a few values and expecting the spreadsheet to recalculate.

I considered a number of ways to overcome this issue, but by far the most attractive is to create the Excel spreadsheet on the server, straight from the OLAP cube data. So, I created a web-part for SharePoint Server 2007 so the user could enter their criteria and view the report online via Excel Services. Of course, users can download the file for off-line viewing in Excel 2007 or even Excel 2003 file format. This SharePoint web-part and its associated web service that does the file format conversion will be the topic of another article.

The Open Source ExcelPackage Assembly

The ExcelPackage assembly is a set of classes and wrappers around the .NET 3.0 System.IO.Packaging API and the new SpreadsheetML file format. It extracts away the complexity of dealing with the individual XML components that make up the new Excel 2007 file format. The classes are published as an assembly called ExcelPackage that you can install in the GAC and use as the basis of your own applications. In the sprit of open source projects, if you want to help extend the functionality offered by the ExcelPackage assembly, join the team over at the ExcelPackage Open XML project.

Download sample code here.

Creating an Excel Spreadsheet from Scratch

Sample 1 shows how to create a new Excel spreadsheet containing some basic data and calculations. See how this is achieved.

using OfficeOpenXml;    // namespace for the ExcelPackage assembly
b&
FileInfo newFile = new FileInfo(@"C:\mynewfile.xlsx");
using (ExcelPackage xlPackage = new ExcelPackage(newFile)) { b& }

This creates a new instance of the all-important ExcelPackage class that gives you access to the Excel workbook and worksheets. If mynewfile.xlsx already exists, ExcelPackage will open the existing file. Otherwise, mynewfile.xlsx will be created from scratch.

Start by adding a new worksheet called "Tinned Goods" and adding some basic data and a simple calculation:

ExcelWorksheet worksheet =
   xlPackage.Workbook.Worksheets.Add("Tinned Goods");
// write some titles into column 1
worksheet.Cell(1, 1).Value = "Product";
b&
worksheet.Cell(4, 1).Value = "Peas";
worksheet.Cell(5, 1).Value = "Total";

// write some values into column 2
worksheet.Cell(1, 2).Value = "Tins Sold";

ExcelCell cell = worksheet.Cell(2, 2);
cell.Value = "15";                    // tins of Beans sold
string calcStartAddress =             // we want this for the formula
   cell.CellAddress;
worksheet.Cell(3, 2).Value = "32";    // tins of Carrots sold
b&
worksheet.Cell(5, 2).Formula = string.Format(SUM({0}:{1})",
   calcStartAddress, calcEndAddress);

If all this seems a bit too easy—well, yes it is! The ExcelPackage assembly does all the hard work of creating the XML elements that are needed to represent an Excel worksheet, the Excel rows, the Excel cells, and so forth. All you need to do is connect in the data! The ExcelWorksheet class has all the properties and methods needed to create and manipulate worksheets. A number of supporting classes (such as ExcelCell, ExcelRow, ExcelColumn, ExcelHeaderFooter, and so on) provide properties and methods of each worksheet component. They also provide helper functions that make it easy to manipulate Excel data. For example, the ExcelCell.GetCellAddress(iRow, iColumn) method turns your row and column integers into Excel-style cell addresses.

Ok, so in the sample some of the data is too wide for the column; you can change the column size:

worksheet.Column(1).Width = 15;

Next, add some headers and footers to the spreadsheet. Note how to use the PageNumber and NumberOfPages constants to insert codes into the footer text. This causes Excel to insert the page number and the number of pages in the document footer.

worksheet.HeaderFooter.oddHeader.CenteredText = "Tinned Goods Sales";
// add the page number to the footer plus the total number of pages
worksheet.HeaderFooter.oddFooter.RightAlignedText =
   string.Format("Page {0} of {1}", ExcelHeaderFooter.PageNumber,
                 ExcelHeaderFooter.NumberOfPages);

Okay, write some real hard code. Insert a line into the worksheet so you can add some more data. This will screw up your formula because it will reference the wrong set of rows (in other words, the new row will not be included in the total).

worksheet.InsertRow(3);

Well no, the formula is correct. The InsertRow method not only updates all the row and cell references in the underlying XML, but also updates all the formulas in the spreadsheet! Okay, you now have your report, but you want to ensure your corporate search engine can find the file later. So, add some standard and custom document properties.

xlPackage.Workbook.Properties.Title  = "Sample 1";
xlPackage.Workbook.Properties.Author = "John Tunnicliffe";
xlPackage.Workbook.Properties.SetCustomPropertyValue
   ("EmployeeID", "1147");

Now, save the file and all its components.

xlPackage.Save();

Below is a screenshot of the final output showing the header and the document properties.

Download sample code here.

Using ExcelPackage to Create Excel 2007 Files on the Server

Reading Data from an Excel Spreadsheet

Sample 2 shows how to read data from an existing Excel spreadsheet. You will use the spreadsheet generated by Sample 1 as the source document. To output the contents of column 2 to the console, this is all you need:

using (ExcelPackage xlPackage = new ExcelPackage(existingFile))
{
   // get the first worksheet in the workbook
   ExcelWorksheet worksheet = xlPackage.Workbook.Worksheets[1];
   int iCol = 2;    // the column to read

   // output the data in column 2
   for (int iRow = 1; iRow < 6; iRow++)
      Console.WriteLine("Cell({0},{1}).Value={2}", iRow, iCol,
                        worksheet.Cell(iRow, iCol).Value);

   // output the formula in row 6
   Console.WriteLine("Cell({0},{1}).Formula={2}", 6, iCol,
                     worksheet.Cell(6, iCol).Formula);

}    // the using statement calls Dispose(), which closes the
     // package.

[sample2_output.jpg]

Download sample code here.

Using a Template to Create an Excel Spreadsheet

Sample 3 shows how to create a new Excel spreadsheet based on an existing file and populate it with data from a database. This is a much better approach because you can quickly create a spreadsheet with the right formula and the correct corporate 'look and feel' using Excel 2007. You then can have the calculations in your template validated by the business before starting to write any code. This whole approach saves a lot of coding time!

Before running the code sample, open the template and take a look at its content. You will see it already has the desired layout and all the formula and formatting required for the title and total lines. However, it only has room for three "data rows" (in other words, rows 5, 6, & 7). You will see how to cope with this later.

[sample3_template2.jpg]

Start by creating a new Excel spreadsheet based on a template.

using OfficeOpenXml;    // namespace for the ExcelPackage
assembly
...
FileInfo newFile = new 
FileInfo(@"C:\sample3.xlsx");
FileInfo template = new 
FileInfo(@"C:\sample3template.xlsx");
using (ExcelPackage xlPackage = new 
ExcelPackage(newFile, template)) {...}

Behind the scenes, the ExcelPackage constructor simply copies the template and opens the new package. Now, obtain a reference to the existing worksheet and initialize some variables:

ExcelWorksheet worksheet = xlPackage.Workbook.Worksheets["Sales"];
ExcelCell cell;
const int startRow = 5;
int row = startRow;

Next, open a connection to the database and run the query. This example uses data from the AdventureWorks sample database, so you will need this installed if you want to run the sample.

while (sqlReader.Read())
{
   int col = 1;
   // we have our total formula on row 7, so push them down so we
   // can insert more data
   if (row > startRow) worksheet.InsertRow(row);

   // our query has the columns in the right order, so simply
   // iterate through the columns
   for (int i = 0; i < sqlReader.FieldCount; i++)
   {
      // use the email address as a hyperlink for column 1
      if (sqlReader.GetName(i) == "EmailAddress")
      {
         // insert the email address as a hyperlink for the name
         string hyperlink =
            "mailto:" + sqlReader.GetValue(i).ToString();
         worksheet.Cell(row, 1).Hyperlink =
            new Uri(hyperlink, UriKind.Absolute);
      }
      else
      {
         // do not bother filling cell with blank data
         // (also useful if we have a formula in a cell)
         if (sqlReader.GetValue(i) != null)
         worksheet.Cell(row, col).Value =
            sqlReader.GetValue(i).ToString();
        col++;
      }
   }
   row++;
}

Now you have filled your worksheet with the entire dataset. Note how you use the e-mail address as a hyperlink. Using hyperlinks is useful when you want to link one report up with another.

The purist among you will notice that all the data is written into the cell as a string. However, the Cell(row, col).Value = "xxx" property assignment code checks whether the value is a number or a string and sets the cell's data type accordingly.

As mentioned earlier, the template only has room for three data rows. You cope with this by simply inserting rows into the template, thereby pushing the "Total" row down the sheet. The InsertRow method automatically updates the formula in the "Total" row so that they take into account the extra rows.

Because you have inserted a whole set of rows into the spreadsheet, they will not have the correct style. Youe correct this by simply iterating through the new rows and copying the style from the first row to all the other rows.

// First copy the styles from startRow to the new rows.
for (int iCol = 1; iCol <= 7; iCol++)
{
   cell = worksheet.Cell(startRow, iCol);
   for (int iRow = startRow; iRow <= row; iRow++)
   {
      worksheet.Cell(iRow, iCol).StyleID = cell.StyleID;
   }
}

Download sample code here.

Using ExcelPackage to Create Excel 2007 Files on the Server

The Power of Named Styles

Anyone familiar with styling HTML with CSS will understand the power and flexibility of using named styles rather than updating the style of every individual element. With named styles, the look and feel of the whole spreadsheet can be altered by changing one style definition. This capability was introduced in Excel 2003, but Excel 2007 goes one step further and makes it a dream to create a template using named styles.

You apply two built-in named styles to highlight the top achieving and the worst performing sales reps.

// style the first row as they are the top achiever
worksheet.Cell(startRow, 6).Style = "Good";
// style the last row as they are the worst performer
worksheet.Cell(row, 6).Style = "Bad";

The biggest problem with named styles in Excel 2007 is that, if they are not used in your template, Excel strips out the definition when the file is saved. This is a real headache. There are two ways to cope with this: (1) add extra rows that have styles applied and delete them later (which is the technique used in this sample) or (2) load your own style.xml file that contains all the definitions you want to use.

Shared Formulas

Excel 2007 has a neat feature which saves a lot of coding when it comes to applying the same formula to a range of cells. A formula in one cell can be marked as "shared" and all cells referenced by the shared formula obtain their own version of the formula. So, if cell E5 has the formula D5*12, cell E6 would have the formula D6*12, and so forth. To set up a shared formula, simply call the CreateSharedFormula method. In the following example, the formula in cell E5 is marked as "shared" and all the other cells in the range E5:E21 are assigned their own variation of the formula.

worksheet.CreateSharedFormula(worksheet.Cell(5, 5),
                              worksheet.Cell(21, 5));

Ensuring that Your Formulas Are Recalculated on File, Open

One problem I came across with Excel 2007 is that it does not automatically re-calculate the spreadsheet when it is re-opened—even when the Calculate option set to automatic! This is because the existing cells in the template have both a formula and a value in the cell. So, Excel just assumes the value is correct and does not attempt to re-compute the formula. Of course, you have just added twenty rows of data and updated the formula references in the XML—but Excel has no way of knowing this, so it assumes the values must be right!

The only way to force the recalculation is to ensure the cell has no value, just a formula. So, the RemoveValue() method becomes very useful for all formula in the worksheet. Hence:

worksheet.Cell(22, 5).RemoveValue();

Because of this phenomenon, you changed the ExcelCell.Formula property assignment code so that it removes the cell's value when you assign the cell a formula.

The final output of Sample 3 code should look something like this—much more professional than anything that can be achieved by starting from scratch.

[sample3_output2.jpg]

Integrity Issues

As soon as you start deleting rows or even worksheets from the package, you have the potential for integrity issues. Your formulas will reference cells (or worksheets) that no longer exist. The ExcelPackage assembly does a good job of tidying up after you, but it cannot cope with complex situations. You will soon know if you have an integrity problem; Excel will complain bitterly when opening the newly created file.

A classic problem is the calcChain.xml file. This tells Excel in what order the calculations should be processed. So, if you delete a row that is referenced by the calcChain, Excel will complain. However, if you simply remove the calcChain.xml,from the package, Excel re-creates it when the file is opened—and does not complain! This is an easy fix. The ExcelPackage assembly does exactly that—deletes the calcChain.xml file from the template so that Excel simply re-creates it when the file is opened.

The ExcelPackage assembly also provides you with direct access to each of the XML documents that make up the package. You can write your own code to manipulate the XML directly. However, if you choose to do this, be careful to ensure the XML conforms to the new Office Open XML standard. Otherwise, Excel will simply strip out your all hard work as "badly formed."

Debugging Your Application

If you want to understand exactly what is been written into each of the component XML files by the ExcelPackage assembly, simply add the following line of code:

xlPackage.DebugMode = true;

This will cause the assembly to output the raw XML files in the same location as the output file. You will see a sub-folder called 'xl' and another callled 'docProps'.

Summary

This article has demonstrated just how easy it is to create Excel-based reports on the server using the open source ExcelPackage assembly. I hope you will be able to join me in extending the functionality of the assembly over at the ExcelPackage Open XML project. There is plenty still to do; charting, conditional formatting, inserting comments, to name just a few!

Good luck with your project!

Dr. John Tunnicliffe



About the Author

John Tunnicliffe

Dr John Tunnicliffe is an experienced system designer with hands-on experience of implementing business intelligence solutions based on the latest Microsoft SQL Server OLAP technologies, delivered to the end-user through SharePoint 2007 portals.

Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • IBM Worklight is a mobile application development platform that lets you extend your business to mobile devices. It is designed to provide an open, comprehensive platform to build, run and manage HTML5, hybrid and native mobile apps.

  • New IT trends to support worker mobility — such as VDI and BYOD — are quickly gaining interest and adoption. But just as with any new trend, there are concerns and pitfalls to avoid.  Download this paper to learn the most important considerations to keep in mind for your VDI project.

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds