Cleaning Your Address List

Environment:

Introduction
Three Reasons Why You Must Clean Your List Now
A Simple Answer to a Complex Problem
Three Easy Steps
The Samples
Conclusion
Downloads

Introduction

Lately at Quiksoft, we have been talking a lot about cleaning up our e-mail address list. Many of our customers have been asking how to reliably track the status of outbound e-mail messages, and how to update their address database when a message is returned undeliverable, otherwise known as a bounce.

In this article you will learn:

Three very important reasons why your must clean your e-mail address list now
What you need to know about how SMTP servers route bounced messages
The secret to automatically matching bounced messages to addresses in your database
The difference between hard and soft bounces and why you should track both
Bonus secret to tracking failures on a mailing-by mailing basis

This article also contains downloadable sample code that will:

Encode your outbound messages with the proper information so that they can be matched to your address database if they are returned undeliverable
Scan your bounced messages and flag the addresses in your database
Provide you with tons of phrases found in typical bounced messages, which can be used to programmatically discover their meaning

Three Reasons Why You Must Clean Your List Now

I used to think that the quality of my list didn’t matter. I thought It would be better to send to the entire list and let failures take care of themselves. But that was then, and this is now, and over the years experience has taught me three important reasons why it is important to keep a clean list:

Some popular mail servers may block all mail from you if you repeatedly send mail to a bad address on their domain.
Repeatedly sending e-mail to bad addresses wastes bandwidth. Even if bandwidth is not an issue now, this problem will grow in scale with time.
If you are going to do any type of response tracking, you must subtract out the failures for an accurate report.

So with these reasons in mind, I set out to clean our address list. But how to do it reliably was the question…

A Simple Answer to a Complex Problem

To clean our address list I would have to identify bad addresses and flag them in our address database so that I did not send e-mail to them anymore. I decided that I did not want to delete the bad addresses, I just wanted to flag them as being bad. But how do you determine that an address is bad?

Most SMTP servers will accept mail addressed to just about anyone in their domain, and only later figure out that the user does not exist. That means that whatever app you use to send mail will almost never know that there is a problem. As far as your app is concerned, the SMTP server accepted the message—period.

I tried looking at so called “address verifier” components. These components check the e-mail address for syntactical errors and for non-existent domains, but they can not actually tell if the user part of the address is valid. I used several of these to validate buggs.bunny@microsoft.com and was excited to find that Buggs does work at Microsoft these days, but when I sent him an e-mail, it bounced back with the following message: “Delivery to the following recipients failed: buggs.bunny@microsoft.com”. The truth is that these “address verifier” components were no better at verifying addresses than my app was, so they were of no use to me.

So how do you reliably determine if an address is good? The answer is—you can’t. But you can determine if an address is bad when a message sent to it is returned undeliverable (bounced), and that is the key to solving this problem.

The best part of this solution is that it is not dependant on extended SMTP features. It will work all the time provided that the recipient’s mail server correctly adheres to RFC-821, the minimum requirements for any SMTP server. The SMTP protocol as outlined in RFC-821 provides for a notification mechanism when a message can not be delivered. This notification mechanism works by creating a new e-mail message which is sent to the original sender to inform them that their message was not delivered. This e-mail message is commonly referred to as a bounce. The first step to cleaning our address list is to funnel the bounced messages into a central location where they can be programmatically analyzed.

The following three-step process will enable you to capture bounced messages, figure out which address in your database they belong to, and flag the record.

Three Easy Steps

Step 1. Use a bounce box

The first step in cleaning your list is to trap bounced messages in a central location. We suggest that you create a “bounce box.” A bounce box is a dedicated e-mail account that is set up to trap returned messages; for example, bounce@yourdomain.com. To be sure that returned messages find their way to your bounce box, you must understand how these messages are routed by SMTP servers.

When a message is submitted to an SMTP server, it is tagged with a reverse-path. The reverse-path is specified by the sending application with the MAIL FROM: command as outlined in the SMTP RFC-821. The reverse-path is the path the the server should use to communicate with the original sender of the message, and therefore the reverse-path is typically the e-mail address of the sender (the from address).

The SMTP sever stores the reverse-path internally, not in the actual message, and forwards it with the message through any relay servers as necessary until the message encounters an error or reaches its destination. Because the return-path is not recorded in the actual message, it is typical to add a From: header to the e-mail message which contains the address of the sender and an optional friendly name, as in “Joe Sender” <joe.sender@domain.com>. Mail readers use the From: header to display who a message is from.

It is very important to understand that the reverse-path and the address in the From: header need not be the same. Therefore, it is possible to send a message which will be displayed by mail readers as coming from joe.sender@domain.com, but has a reverse-path of some_other_address@domain.com.

Once you understand the difference between the reverse-path and the From: header, and the roles they play, you are on your way to building messages that will be displayed in a friendly manner if delivered, or will be returned to your centralized bounce box if there is a failure.

Step 2. Add custom data to bounced messages

This step requires that your mail server is capable of being configured to use a wildcard address. In other words, it needs to be able to route all mail to bounce*@yourdomain.com to one specific account such as bounce@yourdomain.com. If your mail server does not support wildcard addresses, you can accomplish the same thing by using a “catch-all” box and a dedicated domain.

You can then append custom data to the end of the account name portion of the return-path and it will still be delivered to the bounce@yourdomain.com account. For example, suppose each e-mail address in your database is identified by a unique numerical id. You can then encode this id into your bounce address. For example, suppose that the recipient address is jane.recipient@domain.com, and the id of this address in your database is 1063. You then could build an address such as bounce_1063@yourdomain.com.

You then can send a message to jane.recipient@domain.com and specify bounce_1063@yourdomain.com as the reverse-path by passing that address to the SMTP server with the MAIL FROM command. i.e. MAIL FROM:<bounce_1063@yourdomain.com>. To provide a friendly “from” name or address for Jane’s mail reader to display, you can add a From: header to the message. You could use From: “Joe Sender” <joe.sender@domain.com>.

The sample at the end of this article shows how easily this can be done.

If the message is delivered successfully, Jane’s mail reader will display it as coming from Joe Sender. If for some reason the message is undeliverable, a “undeliverable mail” notification message will be sent to bounce_1063@yourdomain.com. Because your mail server has been instructed to deliver all messages for bounce*@yourdomain.com to bounce@yourdomain.com, this returned message should now land in your bounce box.

Additionally, because returned messages are returned to the address specified by its reverse-path, each of these messages should have your custom bounce address in the To: header. In other words, each of the messages in the bounce box will be addressed to bounce_<id>@yourdomain.com, where <id> represents the id of the e-mail address in your database which is related to the bounce. Our testing has indicated, however, that some mail servers use the From: address of the original message as the To: address of its resulting bounce. This is not what should be going on according to the RFC, but we have a fix for that too. If the To: header address does not begin with bounce_, you can scan the message’s “Received” headers and find your bounce address there. The sample code shows you how this is done.

Following these rules, you can now easily match bounced messages up to your database, as you will see…

Step 3. Retrieve the bounced messages and update your database

At this point, assuming you have sent mail as prescribed above, and some of those messages were returned, you will have one or more messages in your bounce box. Each of these messages will be addressed to bounce_<id>@yourdomain.com, where <id> represents the id of the e-mail address in your database which is related to the bounce.

Now, it is important to understand that there are two types of bounces: hard and soft. Permanent failures, such as a nonexistent account or domain, are considered hard bounces. Other failures, such as a full mailbox or blocked domain, are considered soft bounces. Instead of flagging your addresses as good or bad, your database can keep a running count of hard and soft bounces for each address. That way, your mailing application can be more intelligent about determining which addresses to exclude from future mailings. For example, you might only want to send mail to any addresses with fewer than eight soft bounces and fewer than two hard bounces. I usually do not like to exclude someone from future mailings unless they have had more than one hard bounce. Just to be sure that the address is really invalid, I look for at least two hard bounces.

Your application will have to scan the text of the bounced messages, looking for phrases that indicate the reason for the bounce. It will look for such phrases as “delivery failure,” “box full,” and so forth. (The downloadable sample code includes a database of the phrases we have discovered in typical bounced messages.) Your app will determine if each bounce is hard or soft based on the phrase it finds in the message.

Once your app determines whether the bounce is hard or soft, it can increment the bounce_hard and bounce_soft fields in the database accordingly. It then can delete the message from the bounce box. If your app cannot determine whether the message is a hard or soft bounce, the message can be left in the bounce box. Periodically, the messages remaining in the bounce box can be analyzed by a human who can visually determine why they were not identified by the phrase scanner algorithm. The algorithm can then be updated to catch this type of message. Once your app is run again, it should handle this message properly and clear it from the bounce box. As time goes on, your phrase scanning algorithm should improve more and more. If you start with the phrases included with the downloadable sample code, your app should immediately id just about every bounced message.

The Samples

The following VB script samples interface with an Access database that contains the e-mail addresses. The second sample also interfaces with an XML file that contains the phrases typically found in bounced messages. The downloadable code includes the source code shown below along with the Access and XML files. The samples listed on this page vary slightly from the downloadable code, as the code below has been edited to fit the newsletter format.

SAMPLE 1: Constructing and sending the message

In this sample, we will send a message with a friendly address in the From: header, and our bounce address specified as the reverse-path. This example uses VBScript and the EasyMail SMTP object. The The SMTP object contains a FromAddr property, and by default the SMTP object will use the value specified by this property for both the reverse-path and automatic creation of the From: header. We will override this behavior by setting the OptionFlags property to 1, which turns off the automatic creation of the From: header. We will then create the From: header ourselves with the AddCustomHeader() method.

'To do: Set the following variables:
strLicenseKey = "Newsletter Sample/02V4BFDSFFDFSD62"
strMailServer="mail.yourdomain.com"
strBounceBoxDomain="yourdomain.com"
strFriendlyFromName="Joe Sender"
strFriendlyFromAddress="joe.sender@domain.com"
'End To Do

Dim objSMTP, Data, RS, nRetVal

'create EasyMail SMTP object and set basic properties
Set objSMTP = CreateObject("EasyMail.SMTP")
objSMTP.LicenseKey = strLicenseKey
objSMTP.MailServer = strMailServer
objSMTP.OptionFlags = 1
objSMTP.AddCustomHeader "From", _
           """" & strFriendlyFromName & """" &_
           " <" & strFriendlyFromAddress & ">"
objSMTP.Subject = "Subject..."
objSMTP.BodyText = "Message text"

'set up database and select addresses.
'This sample uses an Access database.
Set cnnData = CreateObject("ADODB.Connection")
strConnection = "DBQ=email_database.mdb"
cnnData.Open "DRIVER=" &_
           "{Microsoft Access Driver (*.mdb)};" &_
            strConnection
Set RS = CreateObject("ADODB.RecordSet")
RS.Open "SELECT hard_bounces,id, name, address" &_
        " FROM email_table" &_
        " where hard_bounces < 2" &_
        " and soft_bounces < 4", cnnData, 1, 3"

'send to each address selected
Do While RS.EOF = False

  'encode record id in from address
  objSMTP.FromAddr = "bounce_" & RS("id") &_
                      "@" & strBounceBoxDomain
  objSMTP.AddRecipient RS("name"), RS("address"), 1
  nRetVal = objSMTP.Send

  'if the recipient's address fails right
  'away, we mark it as a hard bounce now.
  If nRetVal = 8 Then
     RS("hard_bounces") = RS("hard_bounces") + 1
  End If

  'remove the recipients
  objSMTP.Clear 1

  RS.MoveNext

Loop

'free remaining resources
RS.Close
cnnData.Close

Sample 2: Scanning the bounced messages and updating your database

This sample uses the EasyMail POP3 object to download each message in our bounce box. Each message is parsed and the body text is scanned for specific phrases to determine whether the message is a hard or a soft bounce. Once the code determines the type of bounce, it parses the id off of the To: address, which identifies the address in our database. If the To: address does not begin with “bounce,” it scans the received headers for the bounce address by using the TimeStamps collection. The sample then updates the bounce_soft and bounce_hard fields in the database accordingly before deleting the message from the bounce box. If the type of bounce can not be determined, it is left in the bounce box for human analysis that will be used to improve the phrase scanning code in the future. The phrases used to identify bounced messages are read from an XML file.

'To do: Set the following variables:
strLicenseKey = "Newsletter Sample/02E00220B529204B62"
strMailServer= "mail.yourdomain.com"
strAccount= "bounce_account"
strPassword= "bounce_password"
'End To Do

Main

Sub Main()

  Dim objPOP3, nCnt
  Dim nBounceType, nId, nPos1, nPos2
  Dim strBodyText, strToAddr, nOrdinal
  Dim strConnection, nRetVal

  'create the EasyMail POP3 object and assign
  'the basic properties
  Set objPOP3 = CreateObject("EasyMail.POP3")
  objPOP3.LicenseKey = strLicenseKey
  objPOP3.MailServer = strMailServer
  objPOP3.Account = strAccount
  objPOP3.Password = strPassword

  'connect to the mail server
  nRetVal = objPOP3.Connect()
  If Not nRetVal = 0 Then
     MsgBox "Error connecting to mail server."
     exit sub
  End If

  'prepare the database and select our e-mail table
  Set cnnData = CreateObject("ADODB.Connection")
  strConnection = "DBQ=email_database.mdb"
  cnnData.Open "DRIVER=" &_
           "{Microsoft Access Driver (*.mdb)};" &_
            strConnection

  Set rs = CreateObject("ADODB.RecordSet")
  rs.Open "SELECT * FROM email_table", cnnData, 1, 3

  'get the count of messages waiting in the
  'bounce box and download and process each one
  nCnt = objPOP3.GetDownloadableCount()
  For x = 1 To nCnt
    nOrdinal = objPOP3.DownloadSingleMessage(x)
    If nOrdinal < 0 Then
       MsgBox "There was an error downloading " &_
              "the message. " & nOrdinal
       exit sub
    End If
    strBodyText = objPOP3.Messages(nOrdinal).BodyText

    'get id from To: address
    set objMsgs = objPOP3.Messages
    For Each Recip In objMsgs(nOrdinal).Recipients
       strToAddr = Recip.Address
       If LCase(Left(strToAddr, 6)) = "bounce" Then
          Exit For
       End if
    Next

    'if address is not found, try searching
    'timestamps (AKA received headers)
    If Not LCase(Left(strToAddr, 6)) = "bounce" Then
       For Each TimeS In objMsgs(nOrdinal).Timestamps
         strToAddr = TimeS.For
         If LCase(Left(strToAddr, 6)) = "bounce" Then
            Exit For
         End if
       Next
    End If

    'if it is a bounce message we will process it
    If Left(strToAddr, 6) = "bounce" And _
                     InStr(strToAddr, "_") Then
       nPos1 = InStr(strToAddr, "_") + 1
       nPos2 = InStr(strToAddr, "@")

       If nPos2 > nPos1 Then
          nId = Mid(strToAddr, nPos1, nPos2 - nPos1)
       End If

       'call the IdentifyBounce routing which scans
       'the bodytext for the phrases found in our
       'xml file
       nBounceType = IdentifyBounce(strBodyText)

       If nBounceType > 0 Then

         'the message has been identified as a hard
         'or soft bounce so update the database
         rs.Find ("id=" & nId)
         If rs.EOF = False and rs.BOF=False Then
           If nBounceType = 1 Then
               rs("soft_bounces")=rs("soft_bounces")+1
           Else
               rs("hard_bounces")=rs("hard_bounces")+1
           End If
           'update changes
           rs.update
         End If
         'delete the message from the bounce box
         objPOP3.DeleteSingleMessage x

       elseif nBounceType = 0 then

          'If nBounceType is 0, it is a warning
          'message or auto-response, so we will
          'delete the message from the bounce box.
          objPOP3.DeleteSingleMessage x
       End If
    End If

    'free resources used by the parsed message. This
    'call does not delete messages from the server.
    objPOP3.Messages.DeleteAll

 Next

 'disconnect from mail server and free remaining resources
 objPOP3.Disconnect
 rs.Close
 msgbox "Operation Complete."

End sub

Function IdentifyBounce(strBodyText)

   Set st = CreateObject("ADODB.Stream")
   Set rs = CreateObject("ADODB.RecordSet")

   st.Open
   st.LoadFromFile ("bounce_signatures.xml")

   rs.Open st
   rs.Sort = "weight DESC"

   IdentifyBounce = -1

   Do While Not rs.EOF
      If InStr(1, strBodyText, rs("signature"), _
         vbTextCompare) Then
         IdentifyBounce = rs("weight")
      End If
      rs.MoveNext
   Loop
   rs.Close

End Function

Conclusion

I hope you found this article useful in your efforts to clean your address list. If you have any suggestions for future topics, please let me know. You can find my contact information at the bottom of this page.

Bonus. Measuring failures from a specific mailing

Some of our customers want to measure the count of delivery failures for each mailing they do. We showed you how to embed an id into the “reverse-path” so that it is easy to match the bounced message up with the address in your database, but you can even go a step further by inserting a mailing identifier as well.

Let’s say you want to keep track of the number of bounced messages for a specific mailing, and let’s assume that each mailing is represented by a row in a table. The row has a unique id field which is the mailing identifier. You can encode the mailing identifier onto the account portion of the reverse-path like this: bounce_1063_34@yourdomain.com, where 1063 is the id of the address and 34 is the id of the mailing. You can then modify your database update routine to flag the number of hard and soft bounces for each mailing as well as each address.

^©2002 Quiksoft Corporation. All rights reserved. Unauthorized duplication or distribution prohibited. Quiksoft, EasyMail, EasyMail Objects, EasyMail .Net Edition, EasyMail Advanced API, EasyMail SMTP Express, and MailStore are trademarks of Quiksoft Corporation. Other trademarks mentioned are the property of their legal owner.