ZFS Health and Status Monitoring

What is recommended way to monitor ZFS Health and Status? I am shamelessly using knowledge transfer from FreeNAS to FreeBSD to configure S.M.A.R.T daemon.
However I don't see ZFSd on 10.0.3 TrueOS. However running

Code:
zpool status -x

seems rather impractical. Is there a collectd plug in for zpool status -x or something which reports to SNMP daemon either built in or net-snmp?
 
Last edited:
SirDice said:
Err.. So what's bsnmpd(1) doing then?
Thanks for bringing bsnmpd to my attention. That brain fart with "FreeBSD has no SNMP native daemon" is the second stupidest thing I wrote on this forum :r . Could you please address the ZFS monitoring part of my question?
 
getopt said:
Have a look on /etc/periodic/daily/404.status-zfs
I already saw it! Reading TrueOS and FreeNAS configuration files carefully. These people know what they are doing.
 
Have a look at devd(8) and /etc/devd.conf (or possibly /etc/devd/zfs.conf), which shows how you can react to various ZFS events. I have yet to find any comprehensive documentation about all available ZFS events, but the examples in the devd.conf files are a good start.
 
I run zpool scrub ... once a night, and capture the output in a logfile. If the output looks unusual, or if the error code from the command is non-zero, and e-mail is sent to the admin.

And my monitoring tool (100 lines of pedestrian, boring python) runs zpool status, looks for all the pools I expect, and makes sure they are shown as "online" with no extra messages. If yes, it gives ZFS a green light; otherwise it gets a yellow light, and the knowledgeable user has to look in the log file.

All very simply, yet surprisingly sturdy. No SNMP integration, just a bunch of boring cron jobs and short scripts.
 
You should rather watch if there is something that doesn't equal "0" in the tabular output of zpool status (but drive status is also important of course!). This is usually an early warning about problems. I run scrub about twice a month, it is not needed to do it every day, in my opinion. This zpool table is also updated during normal accesses. When you have checksums on, ZFS will notice errors at least as early as S.M.A.R.T. does. ZFS also won't lie to you, S.M.A.R.T. does sometimes, and sometimes it is just broken.
 
What is recommended way to monitor ZFS Health and Status? I am shamelessly using knowledge transfer from FreeNAS to FreeBSD to configure S.M.A.R.T daemon.
However I don't see ZFSd on 10.0.3 TrueOS. However running

Code:
zpool status -x

seems rather impractical. Is there a collectd plug in for zpool status -x or something which reports to SNMP daemon even though unlike OpenBSD FreeBSD doesn't have a built in SNMP?
 
If anybody is interested, I wrote a couple of scripts and templates for Zabbix to monitor ZFS. Pools and filesystems are detected using Zabbix' LLD. It's nowhere near finished but it does keep track of availability (status of RAID for example) and capacity. I'm planning to add some of the zpool iostat parameters but I haven't gotten around to it yet.
 
If anybody is interested, I wrote a couple of scripts and templates for Zabbix to monitor ZFS. Pools and filesystems are detected using Zabbix' LLD. It's nowhere near finished but it does keep track of availability (status of RAID for example) and capacity. I'm planning to add some of the zpool iostat parameters but I haven't gotten around to it yet.
Is it still possible to get a look at your Zabbix-ZFS scripts & templates?
 
This isn't what I was working on, I used it as a basis. But it should get you going:
Save this as /usr/local/etc/zabbix22/zabbix_agentd.conf.d/userparam_zfs.sh
Code:
UserParameter=vfs.zpool.discovery,/usr/local/bin/sudo /sbin/zpool list -H -o name | perl -e 'while(<>){chomp;push(@P,qq(\t\t{"{#POOLNAME}":).qq("$_"}));};print qq({\n\t"data":[\n);print join(",\n",@P)."\n";print qq(\t]\n}\n);'
UserParameter=vfs.zfs.discovery,/usr/local/bin/sudo /sbin/zfs list -H -o name | perl -e 'while(<>){chomp;push(@P,qq(\t\t{"{#FSNAME}":).qq("$_"}));};print qq({\n\t"data":[\n);print join(",\n",@P)."\n";print qq(\t]\n}\n);'
UserParameter=vfs.zfs.get[*],/usr/local/bin/sudo /sbin/zfs get -o value -Hp $2 $1 | sed -e 's/x//'
UserParameter=vfs.zpool.get[*],/usr/local/bin/sudo /sbin/zpool get -Hp -o value $2 $1 | sed -e 's/x//'
Save this as /usr/local/etc/sudoers.d/zabbix:
Code:
Defaults:zabbix !requiretty
zabbix  ALL=(root)  NOPASSWD:   /sbin/zpool
zabbix  ALL=(root)  NOPASSWD:   /sbin/zfs
And import the attached XML template and assign the "Template App ZFS" to a host with ZFS.
(Rename the file to .xml. The forum wouldn't let me save it as XML)
 

Attachments

Short explanation, the template only has discovery rules. It will dynamically find all pools and filesystems. It will then, for each pool or filesystem, fetch some of the interesting values like compression ratio, dedup ratio, usage, etc. It will also, for each pool and filesystem, create graphs for usage and compression ratio. There's also a trigger that will alert if a pool is in a degraded state. There's no trigger for >80% full yet but it should be relatively easy to add one.
 
I'm not sure if the commands work if you run them on the zabbix user account. It's possible it doesn't have enough access privileges to get the information.
 
You shouldn't need to elevate permissions for the zfs/zpool get/list commands. I run extremely similar commands from the zabbix user directly. (I think I used awk instead of perl to make the discovery response.)
 
Back
Top