I Set Up an Email Server and I Liked it

February 24, 2014

Seantis is still one of those dinosaurs that hosts its own email. For years, an aging Postfix installation served us well. We recently rebuilt the server using Puppet. This is the tale of how it went, how it was done and how test driven development helped.

But why?

Running your own Email Server is something many people don't want to concern themselves with anymore. I myself use Fastmail for my private addresses.

But we kind of like to do our own services at Seantis and we've been hosting emails for maybe 20 customers and 200 users for years now.

We never had to work a lot to keep everything going. And it's nice to not depend on someone else's support when things go awry.

Complexity, thy Name is Email Servers.

I'm pretty new at managing email servers, but cautionary tales of woe, from people who have, left an impression on me. So to prepare for the migration I read through most of "The Book of Postfix".

It's not a terribly exciting read, but if you need to administer a Postfix server it's a must. It's definitely preferable to tutorials where the how is often more important then the why.

The hard thing about Postfix's configuration is to understand that the same configuration for sending email as an authorized user is used for receiving email as an external sender.

Because of this complexity adding features and fixing issues has a good chance of having unintended side effects.

This is why I opted to use test driven development from the get go.

Puppet Test Image with Test-Data

We're using Puppet and Vagrant to setup and test our infrastructure before we deploy it.

To setup test-only data and for using the same password for everything in development only, we use a custom Facter fact which allows us to write puppet code which is only run in our Vagrant images.

This fact simply checks for the presence of the /vagrant folder:

require 'facter'
Facter.add(:running_in_vagrant) do
  setcode do
    Facter::Util::Resolution.exec("test -e /vagrant && echo 'yes'")
  end
end

We can then use this fact in puppet:

if $::running_in_vagrant {
    notify{"We're running vagrant here."}
}

In the case of our Postfix server we have something a bit more elaborate, allowing us to add test email addresses and domains to the postfix mail server running in Vagrant:

class postfix::test {
    if $::running_in_vagrant {
        $mailboxes = {
            'example.org' => {
                'root@example.org' => [],
                'user@example.org' => [],
                'outbound@example.org' => [],
                'alias@example.org' => [
                    'root@example.org',
                    'user@example.org'
                ],
                'spammer@example.org' => [],
                'any@example.org' => []
            }
        }
        $spam_senders = ['spammer@example.org']
        $client_whitelist = []
        $send_as_any_domain_user = ['any@example.org']
    } else {
        $mailboxes = {}
        $spam_senders = []
        $client_whitelist = []
        $send_as_any_domain_user = []
    }
}

Those values are merged when we look up our real addresses from Hiera. For example:

$mailboxes = merge(hiera('mailboxes'), $postfix::test::mailboxes)

In production, $postfix::test::mailboxes will be empty and will therefore not have an impact.

Using Python to Test the Mailserver

Python has a lot of built-in as well as third-party modules that deal with POP3, Telnet, IMAP, Sievescript and so on. Using these modules allows us to easily test all the current features and security checks our email server supports.

I uploaded most of our tests to a gist: https://gist.github.com/href/9185284

Since this is a lot to take in I want to focus on two tests to show off the concept behind this.

Checking for an Open Relay

A mail server which is an open relay is on the fast track of being added to spam blacklists. An open relay is a mail server that sends emails for clients which are not authorized to do so. The example below shows this quite well.

If our email server was an open relay, a spammer could use it to send an e-mail from bill.gates@microsoft.com to mark.zuckerberg@facebook.com. Of course this is not what we want. Our server should only send emails from users managed by him.

import unittest
from smtplib import SMTP

class TestPostfixServer(unittest.Testcase):

    server = 'mail.seantis.dev'

    def smtp(self, server):
        smtp = SMTP()
        smtp.connect(server or self.server, port)
        return smtp

    def test_not_an_open_relay(self):
        smtp = self.smtp()
        smtp.ehlo('microsoft.com')
        smtp.docmd('mail from: bill.gates@microsoft.com')
        code, msg = smtp.docmd('rcpt to: mark.zuckerberg@facebook.com')

        self.assertIn('Relay access denied', msg)
        self.assertEqual(code, 554)

In this test we connect to the test image, pretend to be bill.gates@microsoft.com and we try to send an e-mail to mark.zuckerberg@facebook.com. The recipient is not really important in this. What we really need to check is that unauthorized senders cannot send an email from our servers.

As part of the test suite, this test is run every time a change is made in our Puppet code. So we have can be certain that we don't enter a security nightmare when we change something.

Since this is a very important test we do however also check it through MxToolbox, along with blacklist monitoring.

Testing Spam Detection

To keep spam at bay we use Spamassassin. To test it, we send an E-Mail with the GTUBE signature, which always triggers a very high spam score.

We do something similar with our virus scanner which also supports a trigger signature.

If an email was classified as spam, Dovecot, our preferred IMAP server, puts the E-Mail into the spam folder.

This test is more involved because Postfix, Dovecot and Spamassassin are involved. The open relay test above only needed to connect to Postfix, which kept it short and sweet.

To concentrate on the test I kept all imports and class methods out of this one. If you would like to see the whole code, refer to the gist at https://gist.github.com/href/9185284

def test_receive_spam(self):

    # the GTUBE signature
    signature = (
        'XJS*C4JDBQADN1.NSBN3*2IDNEN*GTUBE-STANDARD'
        '-ANTI-UBE-TEST-EMAIL*C.34X'
    )

    # generate a unique id so we can find the message again later
    text = uuid4().hex

    # send the email through python's smtp sendmail function
    self.send_mail(
        sender='user@example.org',
        recipient='root@example.org',
        subject=text,
        body=signature
    )

    # use a short timeout to wait for the processing to finish
    self.wait_until_processed()

    # use Imbox (<https://github.com/martinrusev/imbox>) to easily
    # get all messages from an account
    imbox = Imbox(
        self.server,
        username='root@example.org',
        password='test',
        ssl=True
    )

    # Ensure that the message is not in the normal folder
    messages = [m for m in imbox.messages() if m[1].subject == text]
    self.assertEqual(len(messages), 0)

    # Ensure that the message was put into the spam folder
    messages = [
        m for m in imbox.messages(folder='Spam') if m[1].subject == text
    ]
    self.assertEqual(len(messages), 1)

    # Ensure that the spam status header was set correctly
    headers = messages[0][1].headers
    headers = dict((h['Name'], h['Value']) for h in headers)

    self.assertTrue(headers['X-Spam-Status'].startswith('Yes'))

Running this test results in a message being delivered to Postfix, where it is handed off to Spamassassin and then given to Dovecot. Dovecot checks the spam header and puts the message into the spam folder as a result.

Not Everything was Tested that Way

Unfortunately, these tests can only cover so much. I almost pulled the trigger on the migration when I realized that the UUIDs from the old server where not correctly migrated to the new server (I switched from Courier to Dovecot).

I pretty much did the rest correctly, but there was still quite a bit of thrill involved in the final migration. Though our outage for the bulk of our clients was maybe 30 minutes. No mails where lost.

At the end it was luckily only a number of small issues which had to be resolved and for which automated testing was no hurdle.

I know there are tools out there who check your mail server setup without writing your own code, but writing code that checks if something works is one of the funner ways of really learning about what's going on.