dch
Developer
My system started hanging a few weeks ago, and I'm suspecting hardware problems. It's a hard hang, the whole system freezes but never reboots. This post is about finding a way to trigger a reboot if the system hangs, not the actual problem itself!
My mainboard has both IPMI, and a BIOS enabled hardware watchdog feature, which seems to be set around 5 minutes mark. I've not yet found how to inform the BIOS watchdog that the system is running, so I turned that off as 5 minutes of uptime is not my thing.
There's ichwd() and watchdogd() which in theory should be sufficient:
That's an ominous error message!
After loading these drivers, dmesg reports:
As an alternative, the sysutils/freeipmi port has bmc-watchdog() which reports:
which appears to be completely unrelated, but might do the job if it communicates with the BMC?
I will try this next, update thread as I go - the hang can be hours away so it could take some time.
My mainboard has both IPMI, and a BIOS enabled hardware watchdog feature, which seems to be set around 5 minutes mark. I've not yet found how to inform the BIOS watchdog that the system is running, so I turned that off as 5 minutes of uptime is not my thing.
There's ichwd() and watchdogd() which in theory should be sufficient:
Code:
# kldload ipmi
# kldload ichwd
# ls /dev/fido
/dev/fido
# watchdogd -d
watchdogd: mlockall failed: Cannot allocate memory
...
That's an ominous error message!
After loading these drivers, dmesg reports:
Code:
[23] ipmi0: <IPMI System Interface> port 0xca2,0xca3 on acpi0
[23] ipmi0: KCS mode found at io 0xca2 on acpi
[23] ipmi0: IPMI device rev. 1, firmware rev. 3.45, version 2.0, device support mask 0xbf
[23] ipmi0: Number of channels 2
[23] ipmi0: Attached watchdog
[23] ipmi0: Establishing power cycle handler
[25] ipmi1 failed to probe on isa0
[45] ichwd0: <Intel Wellsburg watchdog timer> on isa0
As an alternative, the sysutils/freeipmi port has bmc-watchdog() which reports:
Code:
root@wintermute /u/h/dch# bmc-watchdog --get
Timer Use: SMS/OS
Timer: Stopped
Logging: Enabled
Timeout Action: None
Pre-Timeout Interrupt: None
Pre-Timeout Interval: 0 seconds
Timer Use BIOS FRB2 Flag: Clear
Timer Use BIOS POST Flag: Clear
Timer Use BIOS OS Load Flag: Clear
Timer Use BIOS SMS/OS Flag: Clear
Timer Use BIOS OEM Flag: Clear
Initial Countdown: 0 seconds
Current Countdown: 0 seconds
which appears to be completely unrelated, but might do the job if it communicates with the BMC?
I will try this next, update thread as I go - the hang can be hours away so it could take some time.