Outlook Express (OE) Reader Class

Introduction

This article presents a Outlook Express (IE4) class reader that can read through your OE files and retrieve such things as folder names, message subjects and message bodies.

Copyright Notice

Copyright (C) 2000 Mladen Bonev
The author does not take any responsability of ANY damages caused of using this source!

Acknowledgements

Based on initial work by Tom Gallagher. Please refer to the More Detail Documentation Notes section below in order to view Mr. Gallagher's documentation.

COE4Reader Class

Here's a partial definition for the (COE4Reader) class.
class COE4Reader //Outlook Express 4 Reader Class
{
public:
  COE4Reader();
  ~COE4Reader();

 BOOL OpenFile(LPCSTR szFileName);
 void Close();

 inline const BOOL IsOpen() { return m_bOpened; }
 inline const BOOL IsClosed() { return m_bClosed; }
 inline const LPMAILBOX GetMailbox() { return m_pMailbox; }
 inline BOOL GetMailbox(LPMAILBOX lpMailbox);
 inline const int GetMessageCount() { return m_pMailbox->IdxHdr.nItems; }
 const LPMAILMSG GetMessageAt(int nIndex);
 BOOL GetMessageAt(int nIndex, LPMAILMSG lpMsg);
};

For more information, please refer to the comments in the attached source.

More Detail Documentation Notes

Because much of my code is based on Tom Gallagher's previous work and as a courtesy to him, I am presenting his documentation below "as is".

Copright (c) 1998 Tom Gallagher (teg@cableinet.co.uk)

Based on initial work by Michael Santovec (michael_santovec@prodigy.net) and Jeff Evans (evansj@shaw.wave.ca)

** Standard Disclaimers Apply ! **

** Use this information at your own risk etc etc **

Outlook is a trademark of Microsoft (probably) ! :)

Outlook File Format Decoder

This is not a very complete program but there should be enough info here for people to create programs to browse their outlook messages from other OS's if required. Could also be used to import outlook messages into other mail readers.

see outlook.c and outlook.h for detailed information on the file format.

I used RHIDE and DJGPP to build this program under Windows 98.

GCC under Linux was also used for testing.

Comments & improvements welcome

Below is a brief description of the file formats.

Notes:

DWORD is a 32 bit unsigned value
WORD is a 16 bit unsigned value
BYTE is an 8 bit unsigned value

The source assumes a 32 bit natural integer size.

This can be changed in common.h

A lot of the structure members are called pad or flags. These are values which I am not too sure of their usage. There is probably a lot of information in these files to do with message deletion and compaction. I.e. when a message is deleted it is only flagged (need to find this flag!!) and it is only truly deleted when the mail box is compacted.

In the file outlook.c there is a macro called DBG

define DBG as follows to look at exactly what the parser is doing:-

#define DBG(s) s

I stole some code from WINE to convert a Win32 FILETIME structure into a UNIX time_t.

Remember its single byte packing.

IDX File

The IDX file maintains various pointers into the actual message data file (MBX) and stores some information that has been parsed from the message data file. This information is stored in the IDX file so that it does not have to be reparsed every time there fore speeding up access to the mail folder.

{
 DWORD Magic;    JMF9 (IDX)
 DWORD Version;  A file format version number?? 
 DWORD nItems;   No. entries in this IDX file (no. msgs in MBX file) 
 DWORD nBytes;   No. bytes in the IDX file 
 DWORD Unk1;
 DWORD Unk2;
 BYTE  Pad[40];
 BYTE  Pad2[16];

 nItems * {
  DWORD Flags;
  DWORD Unk1;         Always zero ? 
  DWORD EntryNum;     Index Number 
  DWORD FilePos;      Position from the start of the file to this header 
  DWORD nBytes;       Number of bytes in this index entry 
  DWORD MBXOffset;    Offset of message data in the mbx file 
  DWORD MBXSize;      Total Size of the message + header in MBX file 
  BYTE  Pad5[6];
  WORD  Attach;       Flags to do with Attachments ??
  BYTE  Pad1[4];
  WORD  Pad2;
  DWORD MsgSize;      Total data length of message 

  Total Number of bytes in the next attachment section in this file
  (nSeparators * sizeof(next structure) + some padding)

  DWORD nAttachBytes; 

  BYTE  Pad3[48];

 Number of MIME or (UUENCODE) attachments + 1
 I called it separators to start with because its actually the number of
 boundary markers, but really the initial text in the message with
 attachments counts as an attachment so this could be renamed to the
 number of attachments.

 DWORD nSeparators;  

 BYTE  Pad4[12];
 DWORD   Flag1;
 DWORD   Flag2;

 Offset to actual message data (skipping Internet headers)

 DWORD   Offset;     
 DWORD   Pad[15];
        
 nSeparators * {

 The next few fields are offsets into the MBX file.

 To get an absolute file offset :-

 DataStart + MBXOffset + MBX_MSG_HDR_LEN

 Where MBX_MSG_HDR_LEN is 0x10 (a tiny header at the start
 of each mail message in the MBX file).

 DWORD Pad[7];

 Offset to the data for this part of a multi part MIME

 DWORD DataStart;    

 Offset to the end of the data (points one char past the end)

 DWORD DataEnd;      

 Start of Internet headers for this part of the multi part MIME 
 or UUENCODED message for this section only.  (e.g the 
 MIME Content type etc or the UUENCODE begin line).
            
 DWORD HeaderStart;  

 I couldn't figure these two out (anyone else?!?)
 Maybe flags and an index value.  Perhaps the MIME type
 encoded in here as a binary value somehow?
    
 DWORD Pad1;
 DWORD Idx;

 Pointer to the start of the multi part boundary line
 For UUENCODED messges this is the same as header start because
 the start of the headers represents the boundary too.
                    
 DWORD BoundaryOffset;
        
 These appear to be zero all the time
            
 DWORD Pad2[10];
}
        
1 * {
 The length of this structure + the total length of strings
 following this structure.

 DWORD DataSize;     

 This DWORD is always zero perhaps its a 64 bit data size ??

 DWORD Pad6;         

 Win32 FILETIME structures (size 8 bytes)

 FILETIME Received;
 FILETIME Sent;

 This is the priority as specified in the Internet headers

 WORD  Priority;
}
        
There is then 7 variable length strings in this order:-

 Subject    
 Sender
 POPServer
 Username
 MailAccount
 POP3 Login Name
 Account Description

 7 * {
  WORD Length;
  Length * {
   BYTE Data;
  }
 }
}

MBX File

The MBX file stores the actual message data that comes from the POP3 server.
{
 DWORD Magic;            JMF6
 DWORD Version;          ? 03000101 ?
 DWORD nMsgs;            No. msgs including those marked for deletion
 DWORD LastUsedMsgNum;
 DWORD nBytes;           Size of the MBX file in bytes
 BYTE  Pad[64];

 nMsgs * {
  DWORD Magic;        0x7F007F00
  DWORD Idx;          Index in the MBX file
  DWORD TotalLen;     Total Length taken by this entry
  DWORD MsgLen;       Total length of all message data
    
    
  TotalLen * {

   The message data exactly as it comes from the POP3 server

   BYTE Data
  }
 }
}

Downloads

Download demo project - 16 Kb
Download source - 6 Kb


Comments

  • How to extract emails from outook express 5.0, 6.0

    Posted by Legacy on 04/17/2001 12:00am

    Originally posted by: gopal


    I working on a project where i require to decode outlook express 5.0,6.0 etc
    the above code works with outlook express 4.0 file format.
    but in outlook express 5.0 and later the file format has been changed

    so kindly help me to decode it.

    your suggestion is valuable for me.

    thanks in advance.

    gopal.

    Reply
Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • The first phase of API management was about realizing the business value of APIs. This next wave of API management enables the hyper-connected enterprise to drive and scale their businesses as API models become more complex and sophisticated. Today, real world product launches begin with an API program and strategy in mind. This API-first approach to development will only continue to increase, driven by an increasingly interconnected web of devices, organizations, and people. To support this rapid growth, …

  • As mobile devices have pushed their way into the enterprise, they have brought cloud apps along with them. This app explosion means account passwords are multiplying, which exposes corporate data and leads to help desk calls from frustrated users. This paper will discover how IT can improve user productivity, gain visibility and control over SaaS and mobile apps, and stop password sprawl. Download this white paper to learn: How you can leverage your existing AD to manage app access. Key capabilities to …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds