Hello
Having ProLiant DL360 G5 + FreeBSD 11.0-RELEASE-p1 + MPD + OpenBGP
Server config:
CPU: Intel(R) Xeon(R) CPU 5160 @ 3.00GHz - 2x2 Core
RAM: 8GB
Network: HP NC373i Multifunction Gigabit Server Adapter (bce driver)
Turned up MPD service, incoming connections 550-580, also server runs BGP service,
Peak CPU load less than 40%
Peak Network load near 600Mbit (bce0 - uplink, bce1 - vlans, MPD incoming)
Peak PPS near 120k
System reboots strictly after 24 hours from it's start.
messages right before reboot:
then goes system starting messages.
kernel builded with no IPv6, I've tryed many varies of sysctl parameters.
current
/boot/loader.conf
/etc/sysctl.conf
There's absolutely no relations between reboots and system load. 14:00 PM is a time with less then quarter of peak system load. CPU ~10%, Network ~140Mbit/s, RAM ~2GB (of 8GB total RAM).
there's no visible reasons for that but exactly after 24 hours system reboots.
I seen examples with triple connections number to MPD and double traffic load but there was Intel 82576 (with igb driver).
I can't see another reason than network, but why 24 hrs?!
It breaks my brain.
At the lists.freebsd.org I found topic about bce Watchdog timeout
https://lists.freebsd.org/pipermail/freebsd-stable/2015-April/082268.html
Can it be helpful and if I will make these changes to if_bcereg.h don't it breaks network subsystem?
Did anyone seen something similar?
I really love FreeBSD and don't want to migrate to linux ((((
please F1
Having ProLiant DL360 G5 + FreeBSD 11.0-RELEASE-p1 + MPD + OpenBGP
Server config:
CPU: Intel(R) Xeon(R) CPU 5160 @ 3.00GHz - 2x2 Core
RAM: 8GB
Network: HP NC373i Multifunction Gigabit Server Adapter (bce driver)
Turned up MPD service, incoming connections 550-580, also server runs BGP service,
Peak CPU load less than 40%
Peak Network load near 600Mbit (bce0 - uplink, bce1 - vlans, MPD incoming)
Peak PPS near 120k
System reboots strictly after 24 hours from it's start.
messages right before reboot:
Code:
Apr 14 14:10:37 mpd-bgp kernel: bce0: /usr/src/sys/dev/bce/if_bce.c(7886): Watch
dog timeout occurred, resetting!
Apr 14 14:10:37 mpd-bgp kernel: bce0: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan3212: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: bce1: /usr/src/sys/dev/bce/if_bce.c(7886): Watch
dog timeout occurred, resetting!
Apr 14 14:10:37 mpd-bgp kernel: bce1: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan2010: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan2008: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan2009: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan1999: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan4: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan2680: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan2015: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan2152: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan7: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan2153: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan2013: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan2543: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan2320: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan2002: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan2542: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan2151: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan2003: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan2541: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan10: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan2540: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan2520: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan2021: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan2006: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan2521: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan2020: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan2007: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan2102: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan2760: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan2101: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan2004: link state changed to DOWN
Apr 14 14:10:37 mpd-bgp kernel: vlan2005: link state changed to DOWN
Apr 14 14:10:38 mpd-bgp kernel: bce0: discard frame w/o leading ethernet header (len 0 pkt len 0)
then goes system starting messages.
kernel builded with no IPv6, I've tryed many varies of sysctl parameters.
current
/boot/loader.conf
Code:
geom_mirror_load="YES"
kern.geom.label.disk_ident.enable="0"
kern.geom.label.gptid.enable="0"
zfs_load="YES"
net.link.ifqmaxlen=2048
net.isr.maxthreads=2
net.isr.bindthreads=1
kern.maxusers=1024
net.graph.maxdata=65536
net.graph.maxalloc=65536
net.inet.tcp.soreceive_stream=1
hw.bce.verbose=1
hw.bce.tso_enable=0
hw.pci.enable_msix=0
/etc/sysctl.conf
Code:
vfs.zfs.arc_max=4294967296
net.inet.tcp.sendspace=131072
net.inet.tcp.recvspace=131072
net.inet.icmp.drop_redirect=1
kern.ipc.somaxconn=32768
net.inet.tcp.sendbuf_inc=16384
kern.ipc.maxsockbuf=2621440
net.graph.recvspace=1024000
net.graph.maxdgram=1024000
net.inet.ip.portrange.first=1024
net.inet.ip.portrange.last=65535
kern.ipc.nmbclusters=262144
kern.maxvnodes=1000000
net.inet.tcp.maxtcptw=280960
net.inet.tcp.nolocaltimewait=1
net.inet.icmp.icmplim=2000
security.bsd.see_other_uids=0
security.bsd.unprivileged_read_msgbuf=0
security.bsd.unprivileged_proc_debug=0
There's absolutely no relations between reboots and system load. 14:00 PM is a time with less then quarter of peak system load. CPU ~10%, Network ~140Mbit/s, RAM ~2GB (of 8GB total RAM).
there's no visible reasons for that but exactly after 24 hours system reboots.
I seen examples with triple connections number to MPD and double traffic load but there was Intel 82576 (with igb driver).
I can't see another reason than network, but why 24 hrs?!
It breaks my brain.
At the lists.freebsd.org I found topic about bce Watchdog timeout
https://lists.freebsd.org/pipermail/freebsd-stable/2015-April/082268.html
Code:
This may be caused by DMA alignment problems.
See
https://docs.freebsd.org/cgi/getmsg.cgi?fetch=145859+0+archive/2015/freebsd-stable/20150419.freebsd-stable
for a recent thread about the msk driver. The msk maintainer Yonghyeon
Pyun has opted for super safe options of 32K alignment!
It's a long shot, but you could try increasing BCE_DMA_ALIGN and/or
BCE_RX_BUF_ALIGN in the include file if_bcereg.h, say up to 4096, to see
whether it makes any difference.
Can it be helpful and if I will make these changes to if_bcereg.h don't it breaks network subsystem?
Did anyone seen something similar?
I really love FreeBSD and don't want to migrate to linux ((((
please F1