How to Read a MS Outlook (.msg) File Using ATL and MFC

Introduction

This is my first attempt to read a .msg file that is generated upon saving an email from a MS Outlook 2000 file. In this article, I tried to explain how to read the .msg file, which is an OLE2 compound document.

Before going to the discussion of how to extract message body from .msg file, I suggest that you go through this article, http://www.fileformat.info/format/outlookmsg/. It gives the detailed structure of a .msg file.

.MSG File Structure

I used VC++ 6.0 as my development platform to parse the .msg file. The VC++ 6.0 development environment ships with 'DFVIEW.EXE'. If you open this tool to open any .msg file, you will get the following output.

Now, it is clear that this compound file is made up of nodes and sub nodes. Your interest is in the __substg1.0_xxxxxxxx node. Each of these nodes holds a piece of information. Definitely, all information is not readable, but some nodes will have information in ASCII text format and can be readable. It is interesting to point that the __substg1.0_xxxxxxxx nodes ending with 001E contain data in ASCII text format. So, the main focus will be on these nodes.

How to Extract Mail Information from the .msg File

The main objective in this article to identify nodes that hold recipients' information, subject, and mail body. Here are all those __substg1.0_xxxxxxxx nodes that are important to you:

  1. The __substg1.0_0E04001E node holds recipients' address.
  2. The __substg1.0_0E1D001E node holds the subject of the mail.
  3. The __substg1.0_1000001E node holds the mail body if mail has been sent as plain/text format; otherwise, this node may not be there.
  4. The __substg1.0_1013001E node holds the mail body in HTML format. If mail node __substg1.0_1000001E is absent, you will look into this node to extract the mail body.

Double-click on these nodes in the DocFile viewer to see the corresponding information.

Implementation in MFC & ATL to Read a .msg File

I suggest that you go through the article "An Introduction to structured storage", by Kenn Scribner, in which I found a simple generic approach in reading from and writing to a structure file. I have used Scribner's approach to read a .msg file and added functionality to extract the mail body, subject, and recipient address. I hope this series will help others who wants to read .msg files from a VC++ application.

My program has two functions. The first part checks whether the file supplied is a OLE2 Document file or not. If it is a OLE2 Document file, it Open the storage file using Win32 API StgOpenStorage(). Then, the second function tries to read information from the above-mentioned four nodes. Here is the code. Before writing the code, please include the 'atlbase.h' file in your class header file.

// Opened the storage(.msg file) for reading.

void CMSGFileDlg::OnOK()
{
   // TODO: Add extra validation here
   HRESULT hr = S_OK;
   try
   {
      USES_CONVERSION;
      TCHAR strFilePath[MAX_PATH+1] = {0};
      _tcscpy(strFilePath, _T("C:\\Test Mail.msg"));

      // Check for valid storage file.
      hr = StgIsStorageFile(T2W(strFilePath));

      if(FAILED(hr))
      {
         // Error, This is not a valid storage file!
         AfxMessageBox(_T("Not a Storage file"),
                       MB_OK | MB_ICONINFORMATION);
         return;
      }
      else
      {
         // This is a storage file

         CComPtr<IStorage> pIStorage;
         hr = StgOpenStorage(T2W(strFilePath),
            NULL,
            STGM_DIRECT |
            STGM_READ |
            STGM_SHARE_EXCLUSIVE,
            NULL,
            0,
            &pIStorage);

         if(FAILED(hr))    // Error in Opening storage file
            throw hr;
         else              // Read Storage file
         {
            hr = ReadStorage(pIStorage);
            if(FAILED(hr))
            {
               throw hr;
            }
         }
      }
   }
   catch(HRESULT hrError)
   {
      LPVOID lpMsgBuf = NULL;
      ::FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER |
         FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS,
            NULL, hrError,
         MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT),
         (LPTSTR)&lpMsgBuf,
         0,
         NULL);

      // Display Error message.
         if ( lpMsgBuf )
      {
         AfxMessageBox((LPTSTR)lpMsgBuf,MB_OK | MB_ICONINFORMATION);
         // Free the buffer.
         LocalFree(lpMsgBuf);
      }    // end of if
   }
   CDialog::OnOK();
}

How to Read a MS Outlook (.msg) File Using ATL and MFC

This part extracts the mail body, subject, and recipient address from the .msg file and then displays it in a message box.

HRESULT CMSGFileDlg::ReadStorage(IStorage *pIStorage)
{
   HRESULT hr = S_OK;
   STATSTG statstg;
   BOOL bNodeFound = FALSE;

   try
   {
      // Get this storage's name
      hr = pIStorage->Stat(&statstg,STATFLAG_DEFAULT);

      TCHAR strNode[MAX_PATH+1] = {0};
      USES_CONVERSION;
      _tcscpy(strNode,W2T(statstg.pwcsName));

      // Freeing block of memory allocated thru OLE allocator
      CoTaskMemFree(statstg.pwcsName);

      // Iterate storage node and trying to collect info
      // from selected node.

      CComPtr<IEnumSTATSTG> pIEnumStatStg;
      hr = pIStorage->EnumElements(0,NULL,0,&pIEnumStatStg);

      if(FAILED(hr))
         throw hr;

      hr = pIEnumStatStg->Next(1,&statstg,NULL);
      if(FAILED(hr))
         throw hr;

      while(hr != S_FALSE)
      {
         TCHAR strNode[MAX_PATH + 1] = {0};
         _tcscpy(strNode,W2T(statstg.pwcsName));

         if((_tcscmp(strNode, _T("__substg1.0_0E04001E")) == 0)
            || (_tcscmp(strNode, _T("__substg1.0_0E1D001E")) == 0)
            || (_tcscmp(strNode, _T("__substg1.0_1000001E")) == 0)
            || (_tcscmp(strNode, _T("__substg1.0_1013001E")) == 0))
         {
            bNodeFound = TRUE;
         }
         else
            bNodeFound = FALSE;

         if(bNodeFound)
         {
            switch(statstg.type)
            {
               case STGTY_STORAGE:
               {
                  CComPtr<IStorage> pIChildStorage;
                  hr = pIStorage->OpenStorage(statstg.pwcsName,
                     0,
                     STGM_READ |
                     STGM_SHARE_EXCLUSIVE,
                     0,
                     0,
                     &pIChildStorage);

                  if(FAILED(hr))
                     throw hr;

               }
               break;
               case STGTY_STREAM:
               {
                  CComPtr<IStream> pIStream;
                  hr = pIStorage->OpenStream(statstg.pwcsName,
                     0,
                     STGM_READ |
                     STGM_SHARE_EXCLUSIVE,
                     0,
                     &pIStream);

                  if(FAILED(hr))
                     throw hr;

                  TCHAR strData[MAX_PATH+1] = {0};
                  DWORD dwRead = 0;
                  hr = pIStream->Read(strData,MAX_PATH,&dwRead);
                  // Displaying information in message box
                  AfxMessageBox(strData);
               }
               break;
            }    // End of Switch
         } // End of if...

         // Freeing block of memory allocated thru OLE allocator
         CoTaskMemFree(statstg.pwcsName);

         hr = pIEnumStatStg->Next(1, &statstg, NULL);
      }    // End of While Loop...
   }
   catch(HRESULT hrError) 
   {
      // Freeing block of memory allocated thru OLE allocator
      if ( statstg.pwcsName )
         CoTaskMemFree(statstg.pwcsName);

      hr = hrError;
   }

   return hr;
}


About the Author

Mufti Mohammed

Small-Talk and Small-Programming working together. My Blog

Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Best-in-Class organizations execute on a strategy that supports the multi-channel nature of customer requests. These leading organizations do not just open up their service infrastructures to accommodate new channels, but also empower their teams to deliver an effective and consistent experience regardless of the channel selected by the customer. This document will highlight the key business capabilities that support a Best-in-Class customer engagement strategy.

  • Packaged application development teams frequently operate with limited testing environments due to time and labor constraints. By virtualizing the entire application stack, packaged application development teams can deliver business results faster, at higher quality, and with lower risk.

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds