Hello FreeBSD Forums! First time poster, please be gentle
I updated my physical server to the latest last night via the usual
Here's what telegraf has to say:
Interesting! Well we updated to chrony-4.0 from 3.<something> yesterday, maybe the tracking command syntax has changed?
Looks normal enough to me. Ok, so maybe there's a file permission issue:
^ so there's a core daemon running as root, and a forked child that collects data running as telegraf. Let's check the chrony executable:
Nope, anyone should be able to run it. Let's try running as telegraf:
Good output, but it took a very long time to run! Let's compare with various accounts:
^ root is fast
^ WTF? 7 seconds!
Ok, so maybe there's a problem with sudo - what with the recent vulnerability, maybe there's been some kind of regression? Let's try as me, without sudo:
^ Still slow, now with no sudo!
Ok, so I think we can rule sudo out. I ran truss against chronyc, but don't see any output - I guess the client doesn't need to make much in the way of system calls if it's just talking to the local server. Could be some sort of hostname lookup that's failing somewhere? tcpdump shows no port 53 traffic at all though.
Is the problem wider than just chronyc? Seems not, at least for ping:
I did hit the issue a few people have mentioned with an account being claimed as going missing during the pkg upgrade process (in my case _tss), but
I'm pretty stuck at this point. Any advice on how to delve deeper into this would be much appreciated!
I updated my physical server to the latest last night via the usual
freebsd-update fetch install
and pkg update; pkg upgrade
dance (I have no ports installed, just a few packages, including chrony and telegraf). All seemed to be well until I realised that telegraf was no longer recording my time stats! Not the end of the world, but I thought I'd investigate.
Code:
15:18:14 nick@filer:/home/nick/ uname -a
FreeBSD filer.int.aoeu.uk 12.2-RELEASE-p3 FreeBSD 12.2-RELEASE-p3 GENERIC amd64
15:18:19 [B]nick@filer:/home/nick/[/B] pkg info | grep "telegraf\|chrony"
chrony-4.0 System clock synchronization client and server
telegraf-1.17.0 Time-series data collection
Code:
15:15:16 root@filer:/root/ tail -n 2 /var/log/telegraf/telegraf.log
2021-02-14T15:14:50Z E! [inputs.chrony] Error in plugin: failed to run command /usr/local/bin/chronyc -n tracking: Command timed out. - 506 Cannot talk to daemon
2021-02-14T15:15:05Z E! [inputs.chrony] Error in plugin: failed to run command /usr/local/bin/chronyc -n tracking: Command timed out. - 506 Cannot talk to daemon
Code:
15:10:28 nick@filer:/home/nick/ sudo /usr/local/bin/chronyc -n tracking
Password:
Reference ID : D8EF2304 (216.239.35.4)
Stratum : 2
Ref time (UTC) : Sun Feb 14 15:08:35 2021
System time : 0.000031922 seconds slow of NTP time
Last offset : +0.000048788 seconds
RMS offset : 0.000102199 seconds
Frequency : 24.948 ppm slow
Residual freq : +0.002 ppm
Skew : 0.027 ppm
Root delay : 0.006732919 seconds
Root dispersion : 0.000752528 seconds
Update interval : 1044.4 seconds
Leap status : Normal
Code:
15:38:38 nick@filer:/home/nick/ sudo ps aux | grep telegraf | grep -v grep
root 6384 0.0 0.0 10844 2292 - Ss 01:37 0:00.09 daemon: /usr/local/bin/telegraf[6385] (daemon)
telegraf 6385 0.0 0.2 4991656 67228 - S 01:37 1:30.37 /usr/local/bin/telegraf --quiet --config=/usr/local/etc/telegraf.conf
Code:
15:18:20 [B]nick@filer:/home/nick/[/B] ls -lah /usr/local/bin/chronyc
-rwxr-xr-x 1 root wheel 91K Feb 6 12:44 /usr/local/bin/chronyc
Code:
15:21:21 [B]nick@filer:/home/nick/[/B] sudo -u telegraf /usr/local/bin/chronyc -n tracking | head -n 1
Reference ID : D8EF2304 (216.239.35.4)
Code:
15:21:32 [B]nick@filer:/home/nick/[/B] time sudo /usr/local/bin/chronyc -n tracking | head -n 1
Reference ID : D8EF2304 (216.239.35.4)
sudo /usr/local/bin/chronyc -n tracking 0.00s user 0.01s system 93% cpu 0.006 total
head -n 1 0.00s user 0.00s system 13% cpu 0.005 total
Code:
15:22:21 [B]nick@filer:/home/nick/[/B] time sudo -u telegraf /usr/local/bin/chronyc -n tracking | head -n 1
Reference ID : D8EF2304 (216.239.35.4)
sudo -u telegraf /usr/local/bin/chronyc -n tracking 0.01s user 0.00s system 0% cpu 7.127 total
head -n 1 0.00s user 0.00s system 0% cpu 7.126 total
Ok, so maybe there's a problem with sudo - what with the recent vulnerability, maybe there's been some kind of regression? Let's try as me, without sudo:
Code:
15:23:10 [B]nick@filer:/home/nick/[/B] time /usr/local/bin/chronyc -n tracking | head -n 1
Reference ID : D8EF2304 (216.239.35.4)
/usr/local/bin/chronyc -n tracking 0.00s user 0.00s system 0% cpu 7.076 total
head -n 1 0.00s user 0.00s system 0% cpu 7.076 total
Ok, so I think we can rule sudo out. I ran truss against chronyc, but don't see any output - I guess the client doesn't need to make much in the way of system calls if it's just talking to the local server. Could be some sort of hostname lookup that's failing somewhere? tcpdump shows no port 53 traffic at all though.
Is the problem wider than just chronyc? Seems not, at least for ping:
Code:
15:28:43 [B]nick@filer:/home/nick/[/B] time sudo -u telegraf /sbin/ping -c 1 1.1.1.1
Password:
PING 1.1.1.1 (1.1.1.1): 56 data bytes
64 bytes from 1.1.1.1: icmp_seq=0 ttl=55 time=1.696 ms
--- 1.1.1.1 ping statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 1.696/1.696/1.696/0.000 ms
sudo -u telegraf /sbin/ping -c 1 1.1.1.1 0.00s user 0.01s system 0% cpu 9.008 total
pwd_mkdb -p /etc/master.passwd
seemed to fix that. I've not messed with /etc/master.passwd before, maybe that has broken something in pam? My system is very vanilla, nothing will have been edited in pam, and the only sudo config I have in place is to allow %wheel to run as root with password auth. I've got a completely untouched /etc/nsswitch.conf in case that is likely to be suspect.I'm pretty stuck at this point. Any advice on how to delve deeper into this would be much appreciated!