Hi!
I have a production backup server that’s running a backup shell script every 5 minutes within crontab (root). It’s running scp to different servers depending on the time that get backup file from other servers. I installed this backup server for over a year ago, no problem. I upgrade it to 14.2 now on December 21, after that I see that the server is “halting/stopping”? around 0300-0430 almost every night and sometime around 1700 (few times), witch skew up the backups that time.
I reinstalled the backup server a couple of days ago from scratch. Same halting/drop/stopping.
It’s a FreeBSD 14.2 on a vm-bhyve host (14.2 - also updated Dec 21), everything have a lot of resource, running on a HPE DL380 G10, the server is idling. Other vm’s running normal, but they don’t run crontab every 5 min, so, yea.. I don’t know if the backup server is just halting in 10-15 minutes or what, but the server puts everything in a queue. I know that Zabbix don’t get any connections from the server and Zabbix can’t connect to it. It’s just noting in the charts in this time. And there is nothing in any log on any server, except one thing, queue (see below). I check them all, vm’s host.. nothing abnormal in any log. I don’t see the problem on other vm-bhyve host and vm’s so far.
Parts of the log file /var/log/cron on the backup server (the only abnormal I see, except for messages below). Today, something happen after 0405. But what? There is no errors or strange logs on the vm-bhyve host, only on the backup server below. My script is
As you can see,
The only “error" since December 21 (the 14.2 update) I see is this in /var/log/messages:
Which.. I may understand as my scripts put in a queue. This is the first, and last error I see. I may change the MaxStartups, but it’s not the problem.
I will install test servers around the retirement today and run crontab every 5 and 1 min to stress things on this host and other to see if I can see something. I will also make new cron scripts on the server that’s writes info in local files.
14.1 worked just fine (and 13.X before that update), 14.2 nope. Same configs the whole year, nothing change except for the 14.2 update. I will dig in in every logs on other servers, hosts if I missed something on the 14.2 updates server. I only updated 19 servers so far (normal servers, not running crontab every 5 min – this was the first one). I will not update any other prod-servers to 14.2 right know! :/
Could it be vm-bhyve on 14.2? It got updated in December 21. Another host is also updated, can’t see problem there, but it’s not on load. My other vm-bhyve is running 14.1 still that’s running heavy CPU stuff (vm’s up to 30 vCPU) and many vm-servers with 5 min cron jobs. No problem.
But I reach out here if anyone have the same queue-problem and some hints to debug this. It’s a production server, so I can’t do any crazy stuff. But I can install test servers (witch I will do).
Any hint?
I have a production backup server that’s running a backup shell script every 5 minutes within crontab (root). It’s running scp to different servers depending on the time that get backup file from other servers. I installed this backup server for over a year ago, no problem. I upgrade it to 14.2 now on December 21, after that I see that the server is “halting/stopping”? around 0300-0430 almost every night and sometime around 1700 (few times), witch skew up the backups that time.
I reinstalled the backup server a couple of days ago from scratch. Same halting/drop/stopping.
It’s a FreeBSD 14.2 on a vm-bhyve host (14.2 - also updated Dec 21), everything have a lot of resource, running on a HPE DL380 G10, the server is idling. Other vm’s running normal, but they don’t run crontab every 5 min, so, yea.. I don’t know if the backup server is just halting in 10-15 minutes or what, but the server puts everything in a queue. I know that Zabbix don’t get any connections from the server and Zabbix can’t connect to it. It’s just noting in the charts in this time. And there is nothing in any log on any server, except one thing, queue (see below). I check them all, vm’s host.. nothing abnormal in any log. I don’t see the problem on other vm-bhyve host and vm’s so far.
Parts of the log file /var/log/cron on the backup server (the only abnormal I see, except for messages below). Today, something happen after 0405. But what? There is no errors or strange logs on the vm-bhyve host, only on the backup server below. My script is
/home/backup253/get.sh >/dev/null 2>&1
in the log file that’s runs every 5 min. As you can see,
CMD (/usr/libexec/save-entropy)
running 0044 and 0055. It should runt 0011, but it’s run 04:15:45. It should runt 0022, but it run 04:24:57. My script is also queuing. Everything is put in a queue. Zabbix lost the connection 040611 and the backup server reconnected 042616. This happens once almost every night between 0300-0430, different times. My system logs is not big at all (rotating logs).
Code:
Jan 2 03:40:00 EDA-253-BackUp02 /usr/sbin/cron[65984]: (root) CMD (/home/backup253/get.sh >/dev/null 2>&1)
Jan 2 03:44:00 EDA-253-BackUp02 /usr/sbin/cron[89793]: (operator) CMD (/usr/libexec/save-entropy)
Jan 2 03:45:00 EDA-253-BackUp02 /usr/sbin/cron[97133]: (root) CMD (/usr/libexec/atrun)
Jan 2 03:45:00 EDA-253-BackUp02 /usr/sbin/cron[97976]: (root) CMD (/home/backup253/get.sh >/dev/null 2>&1)
Jan 2 03:50:00 EDA-253-BackUp02 /usr/sbin/cron[61123]: (root) CMD (/usr/libexec/atrun)
Jan 2 03:50:00 EDA-253-BackUp02 /usr/sbin/cron[61294]: (root) CMD (/home/backup253/get.sh >/dev/null 2>&1)
Jan 2 03:55:00 EDA-253-BackUp02 /usr/sbin/cron[18595]: (operator) CMD (/usr/libexec/save-entropy)
Jan 2 03:55:00 EDA-253-BackUp02 /usr/sbin/cron[19122]: (root) CMD (/usr/libexec/atrun)
Jan 2 03:55:00 EDA-253-BackUp02 /usr/sbin/cron[19142]: (root) CMD (/home/backup253/get.sh >/dev/null 2>&1)
Jan 2 04:00:00 EDA-253-BackUp02 /usr/sbin/cron[59380]: (root) CMD (/usr/libexec/atrun)
Jan 2 04:00:00 EDA-253-BackUp02 /usr/sbin/cron[59485]: (root) CMD (/home/backup253/get.sh >/dev/null 2>&1)
Jan 2 04:01:00 EDA-253-BackUp02 /usr/sbin/cron[58425]: (root) CMD (adjkerntz -a)
Jan 2 04:05:00 EDA-253-BackUp02 /usr/sbin/cron[42438]: (root) CMD (/usr/libexec/atrun)
Jan 2 04:05:00 EDA-253-BackUp02 /usr/sbin/cron[42735]: (root) CMD (/home/backup253/get.sh >/dev/null 2>&1)
Jan 2 04:12:36 EDA-253-BackUp02 /usr/sbin/cron[93476]: (root) CMD (/home/backup253/get.sh >/dev/null 2>&1)
Jan 2 04:15:45 EDA-253-BackUp02 /usr/sbin/cron[95036]: (root) CMD (/usr/libexec/atrun)
Jan 2 04:15:45 EDA-253-BackUp02 /usr/sbin/cron[95360]: (root) CMD (/home/backup253/get.sh >/dev/null 2>&1)
Jan 2 04:15:45 EDA-253-BackUp02 /usr/sbin/cron[95883]: (operator) CMD (/usr/libexec/save-entropy)
Jan 2 04:24:57 EDA-253-BackUp02 /usr/sbin/cron[99537]: (root) CMD (/usr/libexec/atrun)
Jan 2 04:24:57 EDA-253-BackUp02 /usr/sbin/cron[98962]: (root) CMD (/home/backup253/get.sh >/dev/null 2>&1)
Jan 2 04:24:57 EDA-253-BackUp02 /usr/sbin/cron[99894]: (operator) CMD (/usr/libexec/save-entropy)
Jan 2 04:24:58 EDA-253-BackUp02 /usr/sbin/cron[46]: (root) CMD (/usr/libexec/atrun)
Jan 2 04:25:00 EDA-253-BackUp02 /usr/sbin/cron[26536]: (root) CMD (/usr/libexec/atrun)
Jan 2 04:25:00 EDA-253-BackUp02 /usr/sbin/cron[26845]: (root) CMD (/home/backup253/get.sh >/dev/null 2>&1)
Jan 2 04:30:00 EDA-253-BackUp02 /usr/sbin/cron[36717]: (root) CMD (/usr/libexec/atrun)
Jan 2 04:30:00 EDA-253-BackUp02 /usr/sbin/cron[38612]: (root) CMD (/home/backup253/get.sh >/dev/null 2>&1)
Jan 2 04:30:00 EDA-253-BackUp02 /usr/sbin/cron[36717]: (root) CMD (/usr/libexec/atrun)
Jan 2 04:30:00 EDA-253-BackUp02 /usr/sbin/cron[38612]: (root) CMD (/home/backup253/get.sh >/dev/null 2>&1)
Jan 2 04:31:00 EDA-253-BackUp02 /usr/sbin/cron[69126]: (root) CMD (adjkerntz -a)
Jan 2 04:33:00 EDA-253-BackUp02 /usr/sbin/cron[9016]: (operator) CMD (/usr/libexec/save-entropy)
Jan 2 04:35:00 EDA-253-BackUp02 /usr/sbin/cron[54416]: (root) CMD (/usr/libexec/atrun)
The only “error" since December 21 (the 14.2 update) I see is this in /var/log/messages:
Code:
Jan 2 04:24:57 EDA-253-BackUp02 sshd[40124]: error: beginning MaxStartups throttling
I will install test servers around the retirement today and run crontab every 5 and 1 min to stress things on this host and other to see if I can see something. I will also make new cron scripts on the server that’s writes info in local files.
14.1 worked just fine (and 13.X before that update), 14.2 nope. Same configs the whole year, nothing change except for the 14.2 update. I will dig in in every logs on other servers, hosts if I missed something on the 14.2 updates server. I only updated 19 servers so far (normal servers, not running crontab every 5 min – this was the first one). I will not update any other prod-servers to 14.2 right know! :/
Could it be vm-bhyve on 14.2? It got updated in December 21. Another host is also updated, can’t see problem there, but it’s not on load. My other vm-bhyve is running 14.1 still that’s running heavy CPU stuff (vm’s up to 30 vCPU) and many vm-servers with 5 min cron jobs. No problem.
But I reach out here if anyone have the same queue-problem and some hints to debug this. It’s a production server, so I can’t do any crazy stuff. But I can install test servers (witch I will do).
Any hint?