Hi,
I am running TrueNAS 12 on my server and had some issues with data corruption recently which caused me to temporarily remove my Intel Optane from my server. After finding out the issues were caused by a broken CPU, I replaced the CPU and wanted to re-include my Intel Optane.
But something went wrong and now I can't unlock my pool anymore... Here's what happened:
I actually followed the same steps that I successfully performed on FreeNAS 11, when initially installing my Intel Optane about 6 months ago.
This step did work in FreeNAS 11, but for some reason it doesn't work anymore in TrueNAS 12.
I thought: Ok, no problem. I'll just re-create optanepool and enable compression with the "zpool create" command itself. So I exported the pool the pool in the TrueNAS GUI, but then I couldn't create a new zpool over the old one, as the old one was still there, but not visible, as it was exported.
Then I made the mistake to think that it might be better to destroy optanepool from the GUI, so I re-imported optanepool and then exported it again in the GUI, while selecting the destroy checkbox. At that time my main RAIDZ2 pool was actually still locked after a reboot (I even thought it would be safer to keep it locked).
But, here it comes, when destroying optanepool from the TrueNAS GUI, it didn't just destroy the optanepool partition only. No it destroyed ALL partitions on my Intel Optane, including the partition that I was using as SLOG for my password encrypted RAIDZ2 pool
When I now try to unlock my RAIDZ2 pool, it fails with the following error:
In the last line, you can clearly read "one or more devices is currently unavailable". In the console, you can also see that it successfully unlocks each HDD, one by one, and after unlocking all 8 HDDs, it then destroys them again:
So apparently my accidently destroyed SLOG device, is causing the pool to become impossible to unlock.
I was thinking unlock my HDDs one by one using the command line. Perhaps then, even though I can't unlock the destroyed SLOG anymore, I'll be able to "see" and modify my unlocked RAIDZ2 pool and perhaps remove the SLOG, so that TrueNAS no longer expects a SLOG while unlocking. I found following commands to do this:
geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key dev/gptid/efde27c0-cdd4-11ea-a82a-d05099d3fdfe
geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key dev/gptid/ef5b1297-cdd4-11ea-a82a-d05099d3fdfe
Is this a good idea?
I am running TrueNAS 12 on my server and had some issues with data corruption recently which caused me to temporarily remove my Intel Optane from my server. After finding out the issues were caused by a broken CPU, I replaced the CPU and wanted to re-include my Intel Optane.
But something went wrong and now I can't unlock my pool anymore... Here's what happened:
I actually followed the same steps that I successfully performed on FreeNAS 11, when initially installing my Intel Optane about 6 months ago.
- Create a GPT partition table
- gpart create -s gpt nvd0
- Create 3 partitions (I didn't do anything yet with the swap)
- gpart add -i 1 -b 128 -t freebsd-swap -s 16g nvd0
- gpart add -i 2 -t freebsd-zfs -s 20g nvd0
- gpart add -i 3 -t freebsd-zfs nvd0
- Add the 2nd partition as SLOG to my existing password encrypted RAIDZ2 pool
- zpool add hgstpool log /dev/gptid/cf3dbeb8-c5dc-11ea-8b09-d05099d3fdfe
- Create a non-redundant test pool from partition 3
- zpool create -m /mnt/optanepool optanepool /dev/gptid/d23e29f0-c5dc-11ea-8b09-d05099d3fdfe
- zpool export optanepool
- Import optanepool in the GUI
Code:
Error: Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 137, in call_method
result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self,
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1195, in _call
return await methodobj(*prepared_call.args)
File "/usr/local/lib/python3.8/site-packages/middlewared/service.py", line 463, in update
rv = await self.middleware._call(
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1195, in _call
return await methodobj(*prepared_call.args)
File "/usr/local/lib/python3.8/site-packages/middlewared/schema.py", line 972, in nf
args, kwargs = clean_and_validate_args(args, kwargs)
File "/usr/local/lib/python3.8/site-packages/middlewared/schema.py", line 930, in clean_and_validate_args
value = attr.clean(args[args_index + i])
File "/usr/local/lib/python3.8/site-packages/middlewared/schema.py", line 607, in clean
data[key] = attr.clean(value)
File "/usr/local/lib/python3.8/site-packages/middlewared/schema.py", line 167, in clean
value = super(Str, self).clean(value)
File "/usr/local/lib/python3.8/site-packages/middlewared/schema.py", line 53, in clean
raise Error(self.name, f'Invalid choice: {value}')
middlewared.schema.Error: [aclmode] Invalid choice: DISCARD
I thought: Ok, no problem. I'll just re-create optanepool and enable compression with the "zpool create" command itself. So I exported the pool the pool in the TrueNAS GUI, but then I couldn't create a new zpool over the old one, as the old one was still there, but not visible, as it was exported.
Then I made the mistake to think that it might be better to destroy optanepool from the GUI, so I re-imported optanepool and then exported it again in the GUI, while selecting the destroy checkbox. At that time my main RAIDZ2 pool was actually still locked after a reboot (I even thought it would be safer to keep it locked).
But, here it comes, when destroying optanepool from the TrueNAS GUI, it didn't just destroy the optanepool partition only. No it destroyed ALL partitions on my Intel Optane, including the partition that I was using as SLOG for my password encrypted RAIDZ2 pool
When I now try to unlock my RAIDZ2 pool, it fails with the following error:
Code:
Error: concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/usr/local/lib/python3.8/site-packages/middlewared/worker.py", line 91, in main_worker
res = MIDDLEWARE._run(*call_args)
File "/usr/local/lib/python3.8/site-packages/middlewared/worker.py", line 45, in _run
return self._call(name, serviceobj, methodobj, args, job=job)
File "/usr/local/lib/python3.8/site-packages/middlewared/worker.py", line 39, in _call
return methodobj(*params)
File "/usr/local/lib/python3.8/site-packages/middlewared/worker.py", line 39, in _call
return methodobj(*params)
File "/usr/local/lib/python3.8/site-packages/middlewared/schema.py", line 977, in nf
return f(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/zfs.py", line 371, in import_pool
self.logger.error(
File "libzfs.pyx", line 391, in libzfs.ZFS.__exit__
File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/zfs.py", line 365, in import_pool
zfs.import_pool(found, new_name or found.name, options, any_host=any_host)
File "libzfs.pyx", line 1095, in libzfs.ZFS.import_pool
File "libzfs.pyx", line 1123, in libzfs.ZFS.__import_pool
libzfs.ZFSException: one or more devices is currently unavailable
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/middlewared/job.py", line 361, in run
await self.future
File "/usr/local/lib/python3.8/site-packages/middlewared/job.py", line 397, in __run_body
rv = await self.method(*([self] + args))
File "/usr/local/lib/python3.8/site-packages/middlewared/schema.py", line 973, in nf
return await f(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/pool_/encryption_freebsd.py", line 290, in unlock
raise e
File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/pool_/encryption_freebsd.py", line 272, in unlock
await self.middleware.call('zfs.pool.import_pool', pool['guid'], {
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1238, in call
return await self._call(
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1203, in _call
return await self._call_worker(name, *prepared_call.args)
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1209, in _call_worker
return await self.run_in_proc(main_worker, name, args, job)
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1136, in run_in_proc
return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1110, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
libzfs.ZFSException: ('one or more devices is currently unavailable',)
Code:
Feb 4 10:35:29 data GEOM_ELI: Device gptid/ef1cec0a-cdd4-11ea-a82a-d05099d3fdfe.eli created.
Feb 4 10:35:29 data GEOM_ELI: Encryption: AES-XTS 256
Feb 4 10:35:29 data GEOM_ELI: Crypto: hardware
Feb 4 10:35:31 data GEOM_ELI: Device gptid/ef5b1297-cdd4-11ea-a82a-d05099d3fdfe.eli created.
Feb 4 10:35:31 data GEOM_ELI: Encryption: AES-XTS 256
Feb 4 10:35:31 data GEOM_ELI: Crypto: hardware
Feb 4 10:35:32 data GEOM_ELI: Device gptid/efde27c0-cdd4-11ea-a82a-d05099d3fdfe.eli created.
Feb 4 10:35:32 data GEOM_ELI: Encryption: AES-XTS 256
Feb 4 10:35:32 data GEOM_ELI: Crypto: hardware
Feb 4 10:35:34 data GEOM_ELI: Device gptid/f00c5450-cdd4-11ea-a82a-d05099d3fdfe.eli created.
Feb 4 10:35:34 data GEOM_ELI: Encryption: AES-XTS 256
Feb 4 10:35:34 data GEOM_ELI: Crypto: hardware
Feb 4 10:35:36 data GEOM_ELI: Device gptid/efd8967a-cdd4-11ea-a82a-d05099d3fdfe.eli created.
Feb 4 10:35:36 data GEOM_ELI: Encryption: AES-XTS 256
Feb 4 10:35:36 data GEOM_ELI: Crypto: hardware
Feb 4 10:35:38 data GEOM_ELI: Device gptid/f01fd8e3-cdd4-11ea-a82a-d05099d3fdfe.eli created.
Feb 4 10:35:38 data GEOM_ELI: Encryption: AES-XTS 256
Feb 4 10:35:38 data GEOM_ELI: Crypto: hardware
Feb 4 10:35:39 data GEOM_ELI: Device gptid/f03634f0-cdd4-11ea-a82a-d05099d3fdfe.eli created.
Feb 4 10:35:39 data GEOM_ELI: Encryption: AES-XTS 256
Feb 4 10:35:39 data GEOM_ELI: Crypto: hardware
Feb 4 10:35:41 data GEOM_ELI: Device gptid/f04a1e2c-cdd4-11ea-a82a-d05099d3fdfe.eli created.
Feb 4 10:35:41 data GEOM_ELI: Encryption: AES-XTS 256
Feb 4 10:35:41 data GEOM_ELI: Crypto: hardware
Feb 4 10:35:42 data GEOM_ELI: Device gptid/ef1cec0a-cdd4-11ea-a82a-d05099d3fdfe.eli destroyed.
Feb 4 10:35:42 data GEOM_ELI: Device gptid/ef5b1297-cdd4-11ea-a82a-d05099d3fdfe.eli destroyed.
Feb 4 10:35:42 data GEOM_ELI: Device gptid/efde27c0-cdd4-11ea-a82a-d05099d3fdfe.eli destroyed.
Feb 4 10:35:42 data GEOM_ELI: Device gptid/f00c5450-cdd4-11ea-a82a-d05099d3fdfe.eli destroyed.
Feb 4 10:35:42 data GEOM_ELI: Device gptid/efd8967a-cdd4-11ea-a82a-d05099d3fdfe.eli destroyed.
Feb 4 10:35:42 data GEOM_ELI: Device gptid/f01fd8e3-cdd4-11ea-a82a-d05099d3fdfe.eli destroyed.
Feb 4 10:35:42 data GEOM_ELI: Device gptid/f03634f0-cdd4-11ea-a82a-d05099d3fdfe.eli destroyed.
Feb 4 10:35:42 data GEOM_ELI: Device gptid/f04a1e2c-cdd4-11ea-a82a-d05099d3fdfe.eli destroyed.
- I can't add a new / replacement SLOG, because I need an unlocked pool for that
- I can't "remove" the destroyed SLOG from my pool, because I need an unlocked pool for that
- I tried physically removing the Intel Optane, so that TrueNAS might realize the SLOG is no longer there, but I still can't unlock my pool
I was thinking unlock my HDDs one by one using the command line. Perhaps then, even though I can't unlock the destroyed SLOG anymore, I'll be able to "see" and modify my unlocked RAIDZ2 pool and perhaps remove the SLOG, so that TrueNAS no longer expects a SLOG while unlocking. I found following commands to do this:
geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key dev/gptid/efde27c0-cdd4-11ea-a82a-d05099d3fdfe
geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key dev/gptid/ef5b1297-cdd4-11ea-a82a-d05099d3fdfe
Is this a good idea?