ZFS 2x3tb as one raidz member

mer · Jul 18, 2022

mefizto said:
Is it a condition when the OS can deadlock when moving a page to the swap?

Basically, instead of having swap on a separate partition you have a swap file. You start to have memory pressure, a lot of it comes from file system buffers, so you want to push things to swap. But you have a swapfile so you are creating more pressure from filesystem buffers which causes more things to go to swap.

Typically won't deadlock, but "positive feedback loop that causes the system to eventually crash".

I believe that is what gpw928 means by "deadly embrace"

mefizto · Jul 18, 2022

Hi mer,

thank you for the reply.

mer said:
Basically, instead of having swap on a separate partition you have a swap file.

This is the crux, is it not? If one were to use swap partition, the issue would not arise, correct?

Kindest regards,

M

mer · Jul 18, 2022

mefizto said:
Hi mer,

thank you for the reply.

This is the crux, is it not? If one were to use swap partition, the issue would not arise, correct?

Kindest regards,

M

My understanding of the kernel, correct. A swap partition does not add pressure in low memory conditions. I've always treated swapfiles as a temporary solution as in "Oh crap I'm starting to run out of memory so add a swapfile until I add extra swap partitions or more physical RAM". But that's me others may have different opinions.

gpw928 · Jul 18, 2022

mer said:
Basically, instead of having swap on a separate partition you have a swap file. You start to have memory pressure, a lot of it comes from file system buffers, so you want to push things to swap. But you have a swapfile so you are creating more pressure from filesystem buffers which causes more things to go to swap.

Typically won't deadlock, but "positive feedback loop that causes the system to eventually crash".

I believe that is what gpw928 means by "deadly embrace"

Time zone differences mean that I generally get to these discussions late...

Your description is correct. The deadly embrace (deadlock) happens because you need more memory to get more memory. I had a look around for references, and Issue #7734 (Swap deadlock) remains open for OpenZFS. So it's a bug, and been open for 4 years, so diabolically difficult to fix. I have to admit that I don't know the exact current disposition with FreeBSD. However, the conventional wisdom is to avoid swap files resident within ZFS pools -- so a swap partition is fine (and use a GEOM mirror for reliability).

gpw928 · Jul 19, 2022

PMc said:
It's a money question. There is a price difference of almost 1:10, so when the question is, do I need that stuff, the imo correct answer is: you need it if there is somebody who requires that quality-of-service - and who pays for it.

I contest the 10:1 assertion. I'll agree 5:1 (for the very best SSDs). But a separate ZIL does not have to be large (certainly no larger than main memory). So modest size moderates the cost.

I run cheap SSD mirrors (250 GB Microns) for the root on several of my physical hosts (so I do get your "money question"). But I don't run databases or synchronous NFS mounts on these devices. I accept that if they crash, I will lose file system transactions (but don't generally expect to lose file system integrity).

PMc said:
There is a thing called "risk-assessment", and that means, figure out which risks you can afford to take, and which risks need remedy - and that is individual to the use-case.

I agree -- it's more than a simple purchase price question.

For me, an unreliable device that loses committed database (and other application) transactions has the potential for all sorts of down-sides into the indefinite future (silent corruption -- possibly polluting the backups, down-time, recoveries, rebuilds,...). The potential for untold grief is endless. All the ZFS documents say that the ZIL must be on reliable media, which is why I left my ZFS server ZIL on (redundant) spinning disks for about 8 years. Then, when I moved to a separate ZIL, I put it on an enterprise class SSD mirror.

PMc · Jul 24, 2022

I finally tried to figure out some deeper info on this:
The power-loss-protection feature is recommended for database use, but it does normally exist only in devices that are explicitely marketed for datacenter/enterprise use.
It does often NOT exist in high-performance/professional/NAS devices - according to this database.

Samsung explains a bit on what it is about.
And Kingston states that everything is fine (and the PLP feature is supposed to exist in their DC500R/M series - which would be one of the more affordable ones if it holds to the promises)

Then, if, according to Samsung above, the problem is because of the DRAM, the truenas people here had a cute idea: devices without DRAM should not suffer the problem, if the performance might suffice for the use-case.
I don't know if that approach is recommendable, but it offers an explanation why I never noticed any problems or even some slight irregularity, although there are these counters in my devices:

Code:

 12 Power_Cycle_Count       -O--CK   100   100   000    -    989
174 Unexpect_Power_Loss_Ct  -O--CK   100   100   000    -    756

gpw928 · Jul 24, 2022

PMc said:
The power-loss-protection feature is recommended for database use, but it does normally exist only in devices that are explicitely marketed for datacenter/enterprise use.
It does often NOT exist in high-performance/professional/NAS devices - according to this database.

That database is composed and maintained by one person, who has placed no emphasis on Power Loss Protection.

PMc said:
Samsung explains a bit on what it is about.

where they say "To guarantee data integrity, Samsung’s enterprise SSDs feature full power loss protection with backup power circuitry in the form of tantalum capacitors"...

PMc said:
And Kingston states that everything is fine (and the PLP feature is supposed to exist in their DC500R/M series - which would be one of the more affordable ones if it holds to the promises)

There's more than a few carefully chosen (weasel) words there.
As you observe, Kingston sell SSDs with PLP. If "everything is fine" there would be no market for them.

I live in a rural location, and the power drops unexpectedly quite often. I have many unsafe shutdowns on my ZFS server (these happened before I got my UPS, and one of the Intel DC SSDs is a lot older than the other):

Code:

[sherman.138] $ for d in /dev/*da?^Jdo ^Jsudo smartctl -A $d | egrep "174|Unexpect"^Jdone
174 Unsafe_Shutdown_Count   0x0032   100   100   000    Old_age   Always       -       6
174 Unsafe_Shutdown_Count   0x0032   100   100   000    Old_age   Always       -       58

And also on my Dedbian KVM server (with cheap SSD boot disk mirror):

Code:

[orac.1144] $ for d in /dev/sd?^Jdo ^Jsudo smartctl -A $d | egrep "^174|Unexpect"^Jdone       
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       24
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       25

I think that the arguments are very similar to ECC memory. If you read the papers by Google, bit flipping is rampant.
However the lived experience is that plenty of people don't have ECC and never notice a problem.
And the marketing is similar too. PLP, like ECC, costs a lot more and gives Intel and others a lucrative "Enterprise" market.

In the end you pays your money and takes your chances... So, it's back to risk management...