ZFS email alerting that actually works
ZFS is that reliable old-timey filesystem that is easy to set up and keeps your data pretty safe, but you'll likely have noticed that getting ZFS email alerts to work isn't always straightforward. And you do want alerts when one of your ZFS disk arrays starts failing, right? Well, this is how we do it here at Conglomerated Game Co.
Requirements
Our setup needs to do the following basic things.
- it must be installed in unattended mode, so we can get this into Ansible
- it should work with a real-world SMTP service like Amazon SES
- it should work on Debian-based OS', in this case both Ubuntu and Proxmox, because that's our flavor
To that end we use sSMTP as our mail client - it can be installed and configured unattended (unlike say, Postfix), and it's configuration process isn't garbage like Sendmail, which expects you to change actual ColdFusion code. We're also using an external SMTP provider that just works, as opposed to self-hosted SMTP (uncontrollable laughter) or Gmail's magic now-you-see-it-now-you-don't SMTP service.
Send a mail
Install sSMTP
apt-get install ssmtp -y
Confirm that it's standing in for sendmail
sendmail -V
> sSMTP x.xx (Not sendmail at all)
Configure sSMTP
nano /etc/ssmtp/ssmtp.conf
And ensure these values
hostname=myemaildomain.com
mailhub=email-smtp.someregion.amazonaws.com:587
UseSTARTTLS=YES
AuthUser=<username>
AuthPass=<password>
[email protected]
In my case I'm using Amazon SES, hostname
is the domain my AWS SES account is authorized to send emails from. This is a spam prevention thing in this case.
root
is a user alias, and here we're setting the email address root will recieve ZFS alerts at. More on this later, but set it for now. The address given must be one you can receive mail at.
Let's send a test email from this machine
nano mail.txt
set its content
From: [email protected]
Subject: testing
This is a test mail.
Note that once again, my sender (from) address is the domain I am authorized to send emails from. Email this to yourself with
ssmtp [email protected] < mail.txt
Here, [email protected]
is, again, the email address you can receive mail at. If you got it, congratulations, you've still able to configure 80's era technology that everyone relies on but no one can easily self-host anymore.
ZFS
Assuming you've already gotten ZFS installed and your pools configured etc, let's get straight to configuring the ZFS daemon, which is responsible for mail alerts.
nano /etc/zfs/zed.d/zed.rc
This needs the following values
ZED_EMAIL_ADDR="root"
ZED_EMAIL_PROG="sendmail"
ZED_EMAIL_OPTS=" @ADDRESS@ "
ZED_NOTIFY_VERBOSE=1
root
seems to be the only name ZFS understands and uses. I've tried countless other options and values, the ZFS documentation states you can add any email address or even a list of them, but then the ZFS documentation makes a great many claims. This field does not work. This also means you can only ever have one receiver, so you'll need to figure out receiver groups on the receiver side - them's the breaks. Remember that root alias in the sSMTP config above? That root corresponds to this root.
ZED_EMAIL_PROG
is sendmail
, which is actually being handled by sSMTP.
ZED_EMAIL_OPTS
can contain an address template only, you cannot include a subject as ssmtp does not accept subject as a command line argument. Therefore your emails will always be subject-less - once again, deal with it. Tis better to have received a confusing alert than none at all.
ZED_NOTIFY_VERBOSE
should be 1 if you want emails every time a scrub is run, and 0 if you want emails only when errors occur. Your call, but you probably want 1 to test your setup, then you can revert to 0 once everything works.
Save and restart the ZFS daemon (you should do this every time your change zed config)
systemctl restart zfs-zed.service
Scrub a pool
zpool scrub mypool
You should get an email when the scrub exits. If you don't, you can normally see most ZFS daemon errors by running
systemctl status zfs-zed.service
This will for example let you know which user it tried to email, and in cases where ssmtp aliasing or domain aren't set properly, which values were actually used.
Advanced testing
Scrubbing large pools can take a long time, and if you're spending a lot of time waiting on scrub alerts, you can create a tiny ZFS pool from small files instead of disks. Create two 64MB files full of zero data, then mount these
mkdir /my-zfs-test
dd if=/dev/zero of=/my-zfs-test/disk1.img bs=1M count=64
dd if=/dev/zero of=/my-zfs-test/disk1.img bs=1M count=64
zpool create mypool mirror /my-zfs-test/disk1.img /my-zfs-test/disk1.img
This pool should exit its scrub almost immediately. Even better, if you want to force the pool to fail, try the following. Install ZFS test tools
apt-get install zfs-test -y
Then set all access attempts to one of these fake disks to look like an IO error
zinject -d /my-zfs-test/disk1.img -e io -T all -f 100 mypool
Any attempt to change a file in the pool will now cause the pool to go into error state. Test your alerts, then put the pool back into working state with
zinject -c all
zpool clear mypool
Conclusion
There you have it. ZFS email alerts. The solution above isn't perfect, but it works for what it does, and that's a whole lot better than the gnawing worry that a server is slowly and silently dying in a corner, like what happened with Linus …