I've re-purposed an old HP X1600 storage server to be used as an archive file server. It has an HP SmartArray P420 in HBA mode and a 12 drive SAS backplane. The drives are 1 TB Western Digital RAID edition disks. As part of making this server suitable again I've started zero'ing all the hard disks. On one of them, during zero'ing, a curious error occured:
The kernel logged the following:
I've decoded the CDB using this tool, resulting in:
If we look at the requested LBA, 0x61ec5700, is 1642878720 in decimal, 1642878720 * 512 = 841 153 904 640, meaning the 841 153 904 640th byte of the disk was being addressed, which is within the size of the disk (1 000 204 886 016 addressable bytes).
So why would the disk (or some other component - the backplane or the controller itself?) report the requested LBA out of range?
FreeBSD version: 11.1-RELEASE-p6
Code:
# dd_rescue /dev/zero /dev/da1
dd_rescue: (info): Using softbs=128.0kiB, hardbs=4.0kiB
dd_rescue: (warning): Not using sparse writes for non-seekable output
dd_rescue: (info): ipos: 821438464.0k, opos: 821438464.0k, xferd: 821438464.0k
errs: 0, errxfer: 0.0k, succxfer: 821438464.0k
+curr.rate: 12770kB/s, avg.rate: 13186kB/s, avg.load:-O0.+%
dd_rescue: (warning): write /dev/da1 (821439360.0kiB): Invalid argument
dd_rescue: (warning): assumption rd(131072) == wr(-22) failed!
dd_rescue: (warning): write /dev/da1 (821439359.0kiB): Invalid argument!
dd_rescue: (warning): write /dev/da1 (821439488.0kiB): Invalid argument
dd_rescue: (warning): assumption rd(131072) == wr(-22) failed!
dd_rescue: (warning): write /dev/da1 (821439487.0kiB): Invalid argument!
dd_rescue: (warning): write /dev/da1 (821439616.0kiB): Invalid argument
dd_rescue: (warning): assumption rd(131072) == wr(-22) failed!
dd_rescue: (warning): write /dev/da1 (821439615.0kiB): Invalid argument!
dd_rescue: (info): ipos: 903358464.0k, opos: 903358464.0k, xferd: 903358464.0k
errs: 3, errxfer: 0.0k, succxfer: 903358079.9k
+curr.rate: 12508kB/s, avg.rate: 13119kB/s, avg.load:-O0.+%
The kernel logged the following:
Code:
Dec 14 10:58:38 kernel: (da1:ciss0:32:9:0): WRITE(10). CDB: 2a 00 61 ec 57 00 00 01 00 00
Dec 14 10:58:38 kernel: (da1:ciss0:32:9:0): CAM status: SCSI Status Error
Dec 14 10:58:38 kernel: (da1:ciss0:32:9:0): SCSI status: Check Condition
Dec 14 10:58:38 kernel: (da1:ciss0:32:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range)
Dec 14 10:58:38 kernel: (da1:ciss0:32:9:0): Error 22, Unretryable error
Dec 14 10:58:45 kernel: (da1:ciss0:32:9:0): WRITE(10). CDB: 2a 00 61 ec 57 00 00 01 00 00
Dec 14 10:58:45 kernel: (da1:ciss0:32:9:0): CAM status: SCSI Status Error
Dec 14 10:58:45 kernel: (da1:ciss0:32:9:0): SCSI status: Check Condition
Dec 14 10:58:45 kernel: (da1:ciss0:32:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range)
Dec 14 10:58:45 kernel: (da1:ciss0:32:9:0): Error 22, Unretryable error
Dec 14 10:58:52 kernel: (da1:ciss0:32:9:0): WRITE(10). CDB: 2a 00 61 ec 57 00 00 01 00 00
Dec 14 10:58:52 kernel: (da1:ciss0:32:9:0): CAM status: SCSI Status Error
Dec 14 10:58:52 kernel: (da1:ciss0:32:9:0): SCSI status: Check Condition
Dec 14 10:58:52 kernel: (da1:ciss0:32:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range)
Dec 14 10:58:52 kernel: (da1:ciss0:32:9:0): Error 22, Unretryable error
I've decoded the CDB using this tool, resulting in:
Code:
{ name: 'WRITE (10)',
fields:
[ { name: 'OPERATION CODE',
bits: 8,
value: '0x2a',
reserved: false,
obsolete: false },
{ name: 'Obsolete',
bits: 1,
value: '0x0',
reserved: false,
obsolete: true },
{ name: 'Obsolete',
bits: 1,
value: '0x0',
reserved: false,
obsolete: true },
{ name: 'Reserved',
bits: 1,
value: '0x0',
reserved: true,
obsolete: false },
{ name: 'FUA',
bits: 1,
value: '0x0',
reserved: false,
obsolete: false },
{ name: 'DPO',
bits: 1,
value: '0x0',
reserved: false,
obsolete: false },
{ name: 'WRPROTECT',
bits: 3,
value: '0x0',
reserved: false,
obsolete: false },
{ name: 'LOGICAL BLOCK ADDRESS',
bits: 32,
value: '0x61ec5700',
reserved: false,
obsolete: false },
{ name: 'GROUP NUMBER',
bits: 6,
value: '0x0',
reserved: false,
obsolete: false },
{ name: 'Reserved',
bits: 2,
value: '0x0',
reserved: true,
obsolete: false },
{ name: 'TRANSFER LENGTH',
bits: 16,
value: '0x100',
reserved: false,
obsolete: false },
{ name: 'CONTROL',
bits: 8,
value: '0x0',
reserved: false,
obsolete: false } ],
truncated: false }
If we look at the requested LBA, 0x61ec5700, is 1642878720 in decimal, 1642878720 * 512 = 841 153 904 640, meaning the 841 153 904 640th byte of the disk was being addressed, which is within the size of the disk (1 000 204 886 016 addressable bytes).
So why would the disk (or some other component - the backplane or the controller itself?) report the requested LBA out of range?
FreeBSD version: 11.1-RELEASE-p6