NVME drives issues...

Sorry to revice this old thread; running into a "metoo-ish' thing .
After further investications, it appear that particular drive (wd black sn7700) has a firmware bug, correctable with a new firmware from Western digital.
A "linux" guide is shown here: https://gist.github.com/sorend/38aa32b1b07124575026918e5201e299 SN770
I made an attempt to apply to freebsd with no luck, but I feel I could be close.
My setup : freebsd 14.1-RELEASE-p5 on a dell tower 3620
"kenv" extract:
Code:
smbios.bios.reldate="04/08/2024"
smbios.bios.revision="2.30"
smbios.bios.vendor="Dell Inc."
smbios.bios.version="2.30.0"
smbios.chassis.maker="Dell Inc."
smbios.chassis.type="Desktop"
smbios.memory.enabled="33554432"
smbios.planar.maker="Dell Inc."
smbios.planar.product="09WH54"
smbios.planar.version="A00"
smbios.socket.enabled="1"
smbios.socket.populated="1"
smbios.system.family="Precision"
smbios.system.maker="Dell Inc."
smbios.system.product="Precision Tower 3620"
dmesg extract:
Code:
nvme0: <Generic NVMe Device> mem 0xef000000-0xef003fff at device 0.0 on pci2
nda0 at nvme0 bus 0 scbus5 target 0 lun 1
nda0: <[B]WD_BLACK SN770 2TB 731100WD [/B][redacted serial]>
nda0: nvme version 1.4
nda0: 1907729MB (3907029168 512 byte sectors)

error when nvme's controller failed (now at boot time):
Code:
nvme0: Resetting controller due to a timeout and possible hot unplug.
nvme0: resetting controller
nvme0: failing outstanding i/o
nvme0: FLUSH sqid:3 cid:119 nsid:1
nvme0: ABORTED - BY REQUEST (00/07) crd:0 m:0 dnr:1 p:0 sqid:3 cid:119 cdw0:0
(nda0:nvme0:0:0:1): FLUSH. NCB: opc=0 fuse=0 nsid=1 prp1=0 prp2=0 cdw=0 0 0 0 0 0
(nda0:nvme0:0:0:1): CAM status: Unknown (0x420)
(nda0:nvme0:0:0:1): Error 5, Retries exhausted
nda0 at nvme0 bus 0 scbus5 target 0 lun 1
nda0: <WD_BLACK SN770 2TB 731100WD [redacted serial]> s/n [redacted serial] detached
GEOM_MIRROR: Device [redacted]: provider nda0p2 disconnected.
(nda0:nvme0:0:0:1): Periph destroyed
nvme0: IDENTIFY (06) sqid:0 cid:0 nsid:0 cdw10:00000001 cdw11:00000000
nvme0: ABORTED - BY REQUEST (00/07) crd:0 m:0 dnr:0 p:0 sqid:0 cid:0 cdw0:0
[I]etc...[/I]

I have only a remote access to this host. And really not willing to travel to this host, then install & start a windows to [perhaps] flash the drive's firmware using WD dashboard tool.

So, according to https://gist.github.com/sorend/38aa32b1b07124575026918e5201e299 , I figured the nvme tool from this guy is equivalent to nvmecontrol we have in freebsd.

I grabbed https://wddashboarddownloads.wdc.com/wdDashboard/config/devices/lista_devices.xml
wich lead me to https://wddashboarddownloads.wdc.com/wdDashboard/firmware/WD_BLACK_SN770_2TB/731130WD/device_properties.xml
then downloading 731130WD.fluf from https://wddashboarddownloads.wdc.com/wdDashboard/firmware/WD_BLACK_SN770_2TB/731130WD/731130WD.fluf wich seems to be my firmware .

So far, I've tried to start & stop the drive with devcontrol to make it reappearing in the system; so far results that seems fairly good:
Code:
root@fbsd > devctl disable nvme0
<2>1 2024-10-31T14:46:03.249345+01:00 fddl kernel - - - nvme0: detached
root@fbsd  > devctl enable nvme0
<2>1 2024-10-31T14:46:08.493052+01:00 fddl kernel - - - nvme0: <Generic NVMe Device> mem 0xef000000-0xef003fff at device 0.0 on pci2
root@fbsd  > nvmecontrol identify nvme0
nvmecontrol: Identify request failed  [I]<= not that good it seem... but don't know if this command works for that drive...[/I]
by the way, as far as the drive is not 'dettached', I'll try to flash it:
Code:
root@fbsd  > nvmecontrol firmware -f 731130WD.fluf nvme0
nvmecontrol: slot -1 specified but controller only supports 2 slots
I've also tried nvme0ns1 (seen womewhere while monkeing.)
 
slot -1 specified but controller only supports 2 slots
It is telling you right there.
Add a slot number to your options.
These are firmware 'slots' on the drive for storing different firmwares.
Slot -1 does not sound right.
It will not overwrite current firmware. You put new firmware in slot and switch to it.
smartctl will show you the slots on the nvme.
 
nvmecontrol is quite user friendly
Code:
root@kg-core2:~ # nvmecontrol firmware help
Neither a replace ([-f path_to_firmware]) nor activate ([-a]) firmware image action
was specified.
Usage:
    nvmecontrol firmware <args> controller-id|namespace-id

Download firmware image to controller
Options:
 -a, --activate                - Attempt to activate firmware
 -s, --slot=<NUM>              - Slot to activate and/or download firmware to
 -f, --firmware=<FILE>         - Firmware image to download
this on
Code:
root@kg-core2:~ # freebsd-version -ku
13.4-RELEASE-p1
13.4-RELEASE-p1
 
disk is now dead also in windows (make it freeze/reboot without blue screen); asking for RMA ; looks like it's a shit from sandisk sold under WD brand. Perhaps the SN in reference means SaNdisk's trashy hardware.
 
new disk received under warrantly (seems they have a 5 year warrantly, at least in western europe) ; So I upgraded firmware just for fun, using nvmecontrol & devctl ; I'll share my experiments , if it can ease the life of somes:
(runned on freebsd 14.1 (pxe from mfsbsd image ; thanks Martin Matuška for this incredible tool))

Identify new nvme:
Code:
root@mfsbsd:~ # nvmecontrol devlist
 nvme0: WD_BLACK SN850X 2000GB
    nvme0ns1 (1907729MB)

root@mfsbsd:~ # nvmecontrol identify nvme0
Controller Capabilities/Features
================================
Vendor ID:                   15b7
Subsystem Vendor ID:         15b7
Model Number:                WD_BLACK SN850X 2000GB
Firmware Version:            620331WD
Funny, sandisk exchanged the sn770 drive with a sn850.
Juste grab a new firmware for it:
https://wddashboarddownloads.wdc.com/wdDashboard/config/devices/lista_devices.xml to find this device
wich lead to :
then :
620361WD.fluf

the update it self:
Code:
root@mfsbsd:~ # nvmecontrol firmware  -s 1 -a  -f 620361WD.fluf nvme0
You are about to download and activate firmware image (620361WD.fluf) to controller nvme0.
This may damage your controller and/or overwrite an existing firmware image.
Are you sure you want to continue? (yes/no) yes
New firmware image activated and will take effect after next controller reset.
Controller reset can be initiated via 'nvmecontrol reset nvme0'

I missed the last line ; I've restarted my own way:
Code:
root@mfsbsd:~ # devctl disable nvme0
root@mfsbsd:~ # devctl enable nvme0

then show me if firmware is properly updated :
Code:
root@mfsbsd:~ # nvmecontrol identify nvme0
Controller Capabilities/Features
================================
Vendor ID:                   15b7
Subsystem Vendor ID:         15b7
Model Number:                WD_BLACK SN850X 2000GB
Firmware Version:            620361WD

seems to work perfectly fine.... hopes this help.
 
Back
Top