How do I troubleshoot system freeze ?

Hello,

Looking for a guide and not to solve the particular issue

I installed 14.2 x64 with KDE 6.2.5 on a AMD Ryzen 8500G
System would just freeze every like 10 minutes and reboot
Started with the quarterly and then moved to latest but no change
Could not find any core files

Will give it another try with gnome but I suspect it is the video driver (also installed via pkg)

How would I even start troubleshooting something like this on FreeBSD ?

p.s. been using it for 20 years. never needed graphics till now.

Momchil
 
System freezes are probably the most difficult to diagnose. On the mainframe we had a tool, Stand Alone Dump (SAD), which we could IPL (boot) to take a dump of the running system. UNIX/Linux/*BSD are different as the system dump facility is part of the kernel and as such itself can be affected by the freeze.

If you have DDB with a break key combination enabled you can break into DDB to examine the system or take a system dump. But this assumes that DDB itself isn't affected.

You say that the system freezes, eventually rebooting after ten minutes. This suggests, strongly, that an overlay of memory has occurred. A memory overlay (I use the IBM mainframe terminology) could be a buffer overrun or an incorrect address in a pointer resulting in some random location in memory being written to. When the "owner" (another part of the kernel) then references that memory it may either hang or overwrite some other memory location. And the chain of corruption continues. Consider this like a malignant cancer overwriting kernel memory.

With no or next to no diagnostic information, how do we diagnose and fix this. In the UNIX/Linux/*BSD sense we must guess. Do you have any ports or packages that install kernel modules, i.e. drm-kmod, virtualbox-kmod or any other port/package that installs a kernel module? Were these ports installed using pkg or make install?

Try rebuilding and reinstalling all ports that install kernel modules, by hand, not through pkg, using a recently updated ports tree. In my experience this fixes 99% of hangs leading to crashes of this sort.

If the above doesn't fix the problem you will need to obtain a kernel dump otherwise our only other option is to guess. Hopefully educated guesses but guesses nonetheless.
 
If you know that the system freeze after 10min, I wonder if hosting a ssh server from that machine should help to access the logs from a remote client(if you have one of course) and then try to monitor what is happening just before the reboot? not sure if that can help though.

It would not be a surprise if the graphic part of your CPU would be involved, which means rebuild the problematic port manually like cy@ implied and also probably why nxjoseph ask you about your installed graphic modules.
 
If it reboots on its own it is not a freeze.

Make sure you have saving kernel cores on, as with the graphics up you cannot see a panic message live and in color.
 
If it reboots on its own it is not a freeze.

Make sure you have saving kernel cores on, as with the graphics up you cannot see a panic message live and in color.
Correct but it may appear to freeze to the customer.

Sometimes (many times) one might might experience a freeze, i.e. interrupts disabled, such that the machine appears to be frozen and may very well be, until it executes code that causes a panic.

On the mainframe I'd see (because we had the tools to do so) the machine in a tight compare-and-swap loop waiting on a lock while it itself held a lock wanted by another kernel thread. Again, the user would think this is a freeze where it isn't. In fact, there's almost always no such thing as a frozen O/S. It's always executing code, because the timer will interrupt the loop. The only time an O/S can actually be frozen is when interrupts are disabled and the kernel executes the HLT instruction. Then only an NMI (on Intel) can bring it out of the freeze. On some architectures (i.e. IBM mainframe) no such instruction exists and no such interrupt as NMI exists and so in that case it is literally frozen.

When we talk about a frozen kernel the context is from the customer's point of view.
 
Hi,

can you post outpuf of % pkg info -x -g \*kmod? Thanks.

drm-61-kmod-6.1.92.1401000_3
drm-kmod-20220907_3
gpu-firmware-kmod-20241114,1
realtek-re-kmod-1100.00.1401000_1

Frankly, I m impressed with the attention and time you guys spent on this. Thank you.

cy@, I m not comfortable using the kernel debugging facility. Went through the manual page and it does not seem that complicated but will need some tinkering with. I intend to test out bhyve so I m guessing I should invest the time required as there is a kernel component to it as well.

Fishfry, this seems like the way to go. replacing the installed graphics driver with some generic one and test it out.

The only "fancy" part of my setup is I get my audio via the HDMI to speakers embedded in my monitor.
but there is nothing I can do about that as there is no VGA port on the MB.
 
Back
Top