blog.shukriadams.com

Game devops and other things

ZFS email alerting that actually works

ZFS is that reliable old-timey filesystem that is easy to set up and keeps your data pretty safe, but you'll likely have noticed that getting ZFS email alerts to work isn't always straightforward. And you do want alerts when one of your ZFS disk arrays starts failing, right? Well, this is how we do it here at Conglomerated Game Co.

Requirements

Our setup needs to do the following basic things.

  • it must be installed in unattended mode, so we can get this into Ansible
  • it should work with a real-world SMTP service like Amazon SES
  • it should work on Debian-based OS', in this case both Ubuntu and Proxmox, because that's our flavor

To that end we use sSMTP as our mail client - it can be installed and configured unattended (unlike say, Postfix), and it's configuration process isn't garbage like Sendmail, which expects you to change actual ColdFusion code. We're also using an external SMTP provider that just works, as opposed to self-hosted SMTP (uncontrollable laughter) or Gmail's magic now-you-see-it-now-you-don't SMTP service.

Send a mail

Install sSMTP

apt-get install ssmtp -y

Confirm that it's standing in for sendmail

sendmail -V

> sSMTP x.xx (Not sendmail at all)

Configure sSMTP

nano /etc/ssmtp/ssmtp.conf

And ensure these values

hostname=myemaildomain.com
mailhub=email-smtp.someregion.amazonaws.com:587
UseSTARTTLS=YES
AuthUser=<username>
AuthPass=<password>
[email protected]

In my case I'm using Amazon SES, hostname is the domain my AWS SES account is authorized to send emails from. This is a spam prevention thing in this case.

root is a user alias, and here we're setting the email address root will recieve ZFS alerts at. More on this later, but set it for now. The address given must be one you can receive mail at.

Let's send a test email from this machine

nano mail.txt

set its content

From: [email protected]
Subject: testing

This is a test mail.

Note that once again, my sender (from) address is the domain I am authorized to send emails from. Email this to yourself with

ssmtp [email protected] < mail.txt

Here, [email protected] is, again, the email address you can receive mail at. If you got it, congratulations, you've still able to configure 80's era technology that everyone relies on but no one can easily self-host anymore.

ZFS

Assuming you've already gotten ZFS installed and your pools configured etc, let's get straight to configuring the ZFS daemon, which is responsible for mail alerts.

nano /etc/zfs/zed.d/zed.rc

This needs the following values

ZED_EMAIL_ADDR="root"
ZED_EMAIL_PROG="sendmail"
ZED_EMAIL_OPTS=" @ADDRESS@ "
ZED_NOTIFY_VERBOSE=1

root seems to be the only name ZFS understands and uses. I've tried countless other options and values, the ZFS documentation states you can add any email address or even a list of them, but then the ZFS documentation makes a great many claims. This field does not work. This also means you can only ever have one receiver, so you'll need to figure out receiver groups on the receiver side - them's the breaks. Remember that root alias in the sSMTP config above? That root corresponds to this root.

ZED_EMAIL_PROG is sendmail, which is actually being handled by sSMTP.

ZED_EMAIL_OPTS can contain an address template only, you cannot include a subject as ssmtp does not accept subject as a command line argument. Therefore your emails will always be subject-less - once again, deal with it. Tis better to have received a confusing alert than none at all.

ZED_NOTIFY_VERBOSE should be 1 if you want emails every time a scrub is run, and 0 if you want emails only when errors occur. Your call, but you probably want 1 to test your setup, then you can revert to 0 once everything works.

Save and restart the ZFS daemon (you should do this every time your change zed config)

systemctl restart zfs-zed.service

Scrub a pool

zpool scrub mypool

You should get an email when the scrub exits. If you don't, you can normally see most ZFS daemon errors by running

systemctl status zfs-zed.service

This will for example let you know which user it tried to email, and in cases where ssmtp aliasing or domain aren't set properly, which values were actually used.

Advanced testing

Scrubbing large pools can take a long time, and if you're spending a lot of time waiting on scrub alerts, you can create a tiny ZFS pool from small files instead of disks. Create two 64MB files full of zero data, then mount these

mkdir /my-zfs-test

dd if=/dev/zero of=/my-zfs-test/disk1.img bs=1M count=64

dd if=/dev/zero of=/my-zfs-test/disk1.img bs=1M count=64

zpool create mypool mirror /my-zfs-test/disk1.img /my-zfs-test/disk1.img 

This pool should exit its scrub almost immediately. Even better, if you want to force the pool to fail, try the following. Install ZFS test tools

apt-get install zfs-test -y

Then set all access attempts to one of these fake disks to look like an IO error

zinject -d /my-zfs-test/disk1.img -e io -T all -f 100 mypool

Any attempt to change a file in the pool will now cause the pool to go into error state. Test your alerts, then put the pool back into working state with

zinject -c all
zpool clear mypool

Conclusion

There you have it. ZFS email alerts. The solution above isn't perfect, but it works for what it does, and that's a whole lot better than the gnawing worry that a server is slowly and silently dying in a corner, like what happened with Linus …