My board is a Gigabyte MJ11-EC1. That board is known for its incompatibility with ASPM. The solution is to turn off ASPM. On Linux I disable ASPM with the kernel parameter "pcie_aspm=off".
On OPNsense I already set the tunable equivalent "hw.pci.enable_aspm = 0". But I can't confirm whether it is working or not. On Linux I can list PCIe AER errors in journalctl with timestamps. In OPNSense those errors aren't even listed in dmesg.
(There is also "pcie_port_pm=off" on Linux which seems to only disable power management on the PCI bridges, which seem to fix the problem too without disabling ASPM completely. But I couldn't find the equivalent tunable for FreeBSD.)
The only place I found AER error counters are via "pciconf -lbcevV igb1" which only shows the counters without timestamps and the counters also doesn't reset after a reboot, unlike on Linux.
I would like to confirm whether the tunable fixed the problem now and therefore need a way of monitoring the PCIe error counters.
The devices with errors are as following:
On OPNsense I already set the tunable equivalent "hw.pci.enable_aspm = 0". But I can't confirm whether it is working or not. On Linux I can list PCIe AER errors in journalctl with timestamps. In OPNSense those errors aren't even listed in dmesg.
(There is also "pcie_port_pm=off" on Linux which seems to only disable power management on the PCI bridges, which seem to fix the problem too without disabling ASPM completely. But I couldn't find the equivalent tunable for FreeBSD.)
The only place I found AER error counters are via "pciconf -lbcevV igb1" which only shows the counters without timestamps and the counters also doesn't reset after a reboot, unlike on Linux.
Code:
ecap 0001[100] = AER 2 0 fatal 1 non-fatal 2 corrected
I would like to confirm whether the tunable fixed the problem now and therefore need a way of monitoring the PCIe error counters.
The devices with errors are as following:
Code:
pcib2@pci0:1:0:0: class=0x060400 rev=0x04 hdr=0x01 vendor=0x1a03 device=0x1150 subvendor=0x1a03 subdevice=0x1150
vendor = 'ASPEED Technology, Inc.'
device = 'AST1150 PCI-to-PCI Bridge'
class = bridge
subclass = PCI-PCI
cap 05[50] = MSI supports 1 message, 64 bit
cap 01[78] = powerspec 3 supports D0 D1 D2 D3 current D0
cap 10[80] = PCI-Express 2 PCI bridge max data 128(256) RO NS
max read 512
link x1(x1) speed 5.0(5.0) ASPM disabled(L0s/L1)
cap 0d[c0] = PCI Bridge subvendor=0x1a03 subdevice=0x1150
ecap 0002[100] = VC 1 max VC0
ecap 0001[800] = AER 1 0 fatal 1 non-fatal 1 corrected
PCI-e errors = Correctable Error Detected
Unsupported Request Detected
Non-fatal = Unsupported Request
Corrected = Advisory Non-Fatal Error
pcib3@pci0:0:1:4: class=0x060400 rev=0x00 hdr=0x01 vendor=0x1022 device=0x1453 subvendor=0x1458 subdevice=0x1000
vendor = 'Advanced Micro Devices, Inc. [AMD]'
device = 'Family 17h (Models 00h-0fh) PCIe GPP Bridge'
class = bridge
subclass = PCI-PCI
cap 01[50] = powerspec 3 supports D0 D3 current D0
cap 10[58] = PCI-Express 2 root port max data 256(512) RO NS ARI disabled
max read 128
link x1(x2) speed 2.5(8.0) ASPM disabled(L1)
slot 0 power limit 0 mW
cap 05[a0] = MSI supports 1 message, 64 bit
cap 0d[c0] = PCI Bridge subvendor=0x1458 subdevice=0x1000
cap 08[c8] = HT MSI fixed address window enabled at 0xfee00000
ecap 000b[100] = Vendor [1] ID 0001 Rev 1 Length 16
ecap 0019[270] = PCIe Sec 1 lane errors 0x1
ecap 001e[370] = L1 PM Substates 1
ecap 001d[380] = Downstream Port Containment 1
ecap 0023[3c4] = Designated Vendor-Specific 1
PCI-e errors = Correctable Error Detected
pcib4@pci0:0:1:5: class=0x060400 rev=0x00 hdr=0x01 vendor=0x1022 device=0x1453 subvendor=0x1458 subdevice=0x1000
vendor = 'Advanced Micro Devices, Inc. [AMD]'
device = 'Family 17h (Models 00h-0fh) PCIe GPP Bridge'
class = bridge
subclass = PCI-PCI
cap 01[50] = powerspec 3 supports D0 D3 current D0
cap 10[58] = PCI-Express 2 root port max data 128(512) RO NS ARI disabled
max read 128
link x1(x1) speed 2.5(8.0) ASPM disabled(L1)
slot 0 power limit 0 mW
cap 05[a0] = MSI supports 1 message, 64 bit
cap 0d[c0] = PCI Bridge subvendor=0x1458 subdevice=0x1000
cap 08[c8] = HT MSI fixed address window enabled at 0xfee00000
ecap 000b[100] = Vendor [1] ID 0001 Rev 1 Length 16
ecap 0019[270] = PCIe Sec 1 lane errors 0x1
ecap 001e[370] = L1 PM Substates 1
ecap 001d[380] = Downstream Port Containment 1
ecap 0023[3c4] = Designated Vendor-Specific 1
PCI-e errors = Correctable Error Detected
igb0@pci0:3:0:0: class=0x020000 rev=0x03 hdr=0x00 vendor=0x8086 device=0x1533 subvendor=0x1458 subdevice=0x1000
vendor = 'Intel Corporation'
device = 'I210 Gigabit Network Connection'
class = network
subclass = ethernet
bar [10] = type Memory, range 32, base 0xee800000, size 524288, enabled
bar [18] = type I/O Port, range 32, base 0x4000, size 32, enabled
bar [1c] = type Memory, range 32, base 0xee880000, size 16384, enabled
cap 01[40] = powerspec 3 supports D0 D3 current D0
cap 05[50] = MSI supports 1 message, 64 bit, vector masks
cap 11[70] = MSI-X supports 5 messages, enabled
Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
cap 10[a0] = PCI-Express 2 endpoint max data 256(512) FLR RO NS
max read 512
link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1)
ecap 0001[100] = AER 2 0 fatal 1 non-fatal 2 corrected
ecap 0003[140] = Serial 1 18c04dffffbb755a
ecap 0017[1a0] = TPH Requester 1
PCI-e errors = Correctable Error Detected
Unsupported Request Detected
Non-fatal = Unsupported Request
Corrected = Replay Timer Timeout
Advisory Non-Fatal Error
igb1@pci0:4:0:0: class=0x020000 rev=0x03 hdr=0x00 vendor=0x8086 device=0x1533 subvendor=0x1458 subdevice=0x1000
vendor = 'Intel Corporation'
device = 'I210 Gigabit Network Connection'
class = network
subclass = ethernet
bar [10] = type Memory, range 32, base 0xee700000, size 524288, enabled
bar [18] = type I/O Port, range 32, base 0x3000, size 32, enabled
bar [1c] = type Memory, range 32, base 0xee780000, size 16384, enabled
cap 01[40] = powerspec 3 supports D0 D3 current D0
cap 05[50] = MSI supports 1 message, 64 bit, vector masks
cap 11[70] = MSI-X supports 5 messages, enabled
Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
cap 10[a0] = PCI-Express 2 endpoint max data 128(512) FLR RO NS
max read 512
link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1)
ecap 0001[100] = AER 2 0 fatal 1 non-fatal 2 corrected
ecap 0003[140] = Serial 1 18c04dffffbb755b
ecap 0017[1a0] = TPH Requester 1
PCI-e errors = Correctable Error Detected
Unsupported Request Detected
Non-fatal = Unsupported Request
Corrected = Replay Timer Timeout