ZFS Recommendation for 100TB raid-z pool

jbo@

Developer
I need to provide a network drive for an SME with 100 TB usable capacity. I would like to use a ZFS raid-z3 pool consisting of 14x 12TB SAS or SATA drives (this will be a rare-write/many-reads situation).

I have been looking at chassis like the SuperMicro SC846BE1C-R1K03JBOD. Given that I'll run ZFS on this I don't want a raid controller but merely an HBA. Supermicro lists the AOC-SAS3-9300-8E as a supported HBA.
I could not find this exact model number listed in the hardware notes of FreeBSD 13.0. Is that gonna be a problem?
Is there any reason to believe that an LSI HBA officially supported by FreeBSD would not work in a chassis like this?

Now, I never worked with those JBOD chassis before. As I understood, I slap in the HBA, tons of drives and connect this chassis to an application server. Does something like this work out of the box? Will the application server (also FreeBSD 13.0) just see those drives as individual drives and I can create a ZFS pool like I am used to with "local drives"?

How does the (physical) connection to the application server work? Do I just add another HBA with external SAS ports to it, connect it to the external SAS ports of the HBA in the JBOD chassis and that's it?

Anything else you'd like to share in terms of advice, experience or similar?
 
Now, I never worked with those JBOD chassis before.
I don't think a JBOD chassis is useful in most cases.
You are buying a rack mount that is 26" deep and all it contains is disk drives.
That is silly. Now perhaps the backplane is segmented into two zones and you could use it for cluster storage.
ect. ect.


Supermicro lists the AOC-SAS3-9300-8E as
if you drill down on this it is probably just an OEM LSI SAS9300 Controller with 8E meaning 2 external SAS connectors. So take their recommendation with a grain of salt. Any LSI card wiith the right connectors will work.
You will probably have an easier time finding LSI SAS9400 controller with 2 external connectors.
 
You have a few mentioned options (and not mentioned options ) that I think might likely have different impacts. What comes to my mind:
  1. As for Supermicro I'd consider a chassis with a build-in backplane/expander, i.e. not JBOD.: You reduce the amount of SAS/SATA cables & connectors, because one cable handles 4 hdd connections at a time.
  2. When using SAS instead of SATA, in combination with the correct Supermicro HBA, you have the option of dual datapaths to individual SAS disks. For that you'll need extra connections at the motherboard/controller side. For SATA there is no use for dual data paths.
  3. Using SAS instead of SATA, when using an appropriate SAS controller and Supermicro HBA, you could have twice (thrice?) the max transfer speed to and from each disk.
  4. Try to estimate the time max. allowed by your use case that the system is in resilver mode, when one (two?) disks have failed. Big disks (=12 TB is big) take a long time to rebuild under heavy load and also under not-heavy load. For that case SAS may have some advantages. I mentioned the previous point especially for this item; not as much so for normal operations.

Obviously, SAS, instead of SATA, adds substantially to the costs of the hdds and therefore the system.
I hope superfluous: take into consideration the max. space utilization of such a total amount of HDD disk TBs. Typically, I read, that going over 80% ZFS system performance will suffer.
 
How does the (physical) connection to the application server work? Do I just add another HBA with external SAS ports to it, connect it to the external SAS ports of the HBA in the JBOD chassis and that's it?
Well you need to investigate how its wired. It says 4 SAS Connectors.
I would bet it broke down into 2 backplanes each feeding an expander card. So 2 redundant disk subsystems possible.
And yes just connect an SAS9300 client box to the JBOD and the drives will appear.

It's back to use case here. Are you planning on two separate systems hooking up to this JBOD?
If not I would get this chassis with a motherboard. Self sufficient storage server. Use 10G fiber to clients.
 
They make chassis that can hold a motherboard plus a dozen or two dozen disks. That saves you a second set of power supplies and external connections. Most any chassis (JBOD or internal) that holds a dozen or more disks will have internal multiplexers (typically SAS expanders, also doing SATA translation). With this many disks, if you are interested in using the bandwidth, I would go for two SAS HBAs instead of just one, and dual-path every SAS expander to both HBAs. That's a little extra hassle (you have to configure multipath), but it makes the whole system more resilient.

If you can afford 18 or 20TB disks, that might help make the whole system easier and cheaper, fewer disks: 9 disks are easier to manage than 14. But I don't know whether 18 and 20TB are reasonably affordable in the consumer market yet. For the disks, look at the relative cost of SAS and SATA; many disk models are available at nearly identical prices. When in doubt, I would go for SAS disks, because a seamless SCSI stack just works more smoothly.

Finally, one problem you should probably solve is identifying which disk is which (so you can say "3rd disk from left"), and being able to turn trouble/replace indicators on. That can be done with SES utils; it might be easy, it might take years to engineer.
 
If you have an eye on performance you really need to investigate the backplane and expander wiring.

Here is my example; 2ea, Supermicro 2U Chassis with 24 -2.5" drives.
One has an expander card mounted on the rear of the backplane outputting only 2 SAS connectors.
So speedy drives are bottlenecked by the expander.

My other 24 bay chassis has 2 backplanes each requiring 3 SAS connectors for a total of 6 connectors.
This backplane provides direct connection to controller card. No expander card.

These are the details you need to seek depending on needs.
 
Is there any reason to believe that an LSI HBA officially supported by FreeBSD would not work in a chassis like this?
You cannot add cards to a JBOD like this. There is no motherboard in this unit.
They usually use an expander card mounted where PCI cards would go.
There you will find the external SAS connectors on the rear panel.
 
I have been looking at chassis like the SuperMicro SC846BE1C-R1K03JBOD.

Given that I'll run ZFS on this I don't want a raid controller but merely an HBA. Supermicro lists the AOC-SAS3-9300-8E as a supported HBA.

You specifically mentioned an HBA controller with external ports. Given that the mentioned chassis does not come with a motherboard, and previous comments, please describe your intented system in more detail. Do you want just hdd's with some sort of external connection (i.e. a shelve of disks) to a nearby other system connected via an external SAS cable? Or do you want a self supporting system inc. motherboard that interfaces to the rest of your environment via ethernet, probably via an optical interface.
 
 

Attachments

  • 88DF5A91-FBF9-46E3-9239-B9CA1C141EEA.jpeg
    88DF5A91-FBF9-46E3-9239-B9CA1C141EEA.jpeg
    163.6 KB · Views: 177
Supermicro lists the AOC-SAS3-9300-8E as a supported HBA.
I want to point out that this recommendation is for a client machine with a SAS3-9300-8e to connect to this JBOD box.
So that is the Supermicro qualified adapter to use for interface. Many others should as well.

The literature indicates this JBOD uses a management card for monitoring power and IPMI.
It uses a chip similar to Aspeed BMC as they mention redmine.
It has an ethernet interface as well and can monitor temps.
 
In the same vein of 24 bay chassis I like this:

It gives you 16 SAS/SATA bays and 8 NVMe bays. That is the ultimate in my book.
Your needs might vary.

They are recommending an X12 board for this chassis but I would seriously consider SuperMicro AMD EPYC

Take a look at the pdf for the SC-846 Chassis. In the appendix it shows various backplane configurations.
Section E-9 figure E-5 and E-6 show the differences in very similar backplanes. Single port versus dual port.
So you need to research the backplanes carefully by looking at the backplane part number on the chassis list.
These are expensive chassis so you must define your needs and shop wisely.
 
As you may have noticed I am more experienced in hardware and try to stay out of the ZFS fray.

That said 14 drives in raid-z3 is not how I would roll. Too many drives in one array.
Mirrored raid-z2 with 7 drives each. Use the split backplane to your advantage. Separate controllers to each mirror.
1 hot spare and one shelf drive minimum.

Do notice that I prefer OS drives separate while many marvel at Boot Envirnoments.
There are so many ways to do things and none are wrong.
 
That said 14 drives in raid-z3 is not how I would roll. Too many drives in one array.
Mirrored raid-z2 with 7 drives each. Use the split backplane to your advantage. Separate controllers to each mirror.
1 hot spare and one shelf drive minimum.
I would do 3 RAID-Z vdevs of 4 drives each, plus the spare and shelf drive like you say. This is off the cuff, though. I'd do some more reading before settling on it.
 
I need to provide a network drive for an SME with 100 TB usable capacity. I would like to use a ZFS raid-z3 pool consisting of 14x 12TB SAS or SATA drives (this will be a rare-write/many-reads situation).

I have been looking at chassis like the SuperMicro SC846BE1C-R1K03JBOD. Given that I'll run ZFS on this I don't want a raid controller but merely an HBA. Supermicro lists the AOC-SAS3-9300-8E as a supported HBA.
I could not find this exact model number listed in the hardware notes of FreeBSD 13.0. Is that gonna be a problem?
Is there any reason to believe that an LSI HBA officially supported by FreeBSD would not work in a chassis like this?

Now, I never worked with those JBOD chassis before. As I understood, I slap in the HBA, tons of drives and connect this chassis to an application server. Does something like this work out of the box? Will the application server (also FreeBSD 13.0) just see those drives as individual drives and I can create a ZFS pool like I am used to with "local drives"?

How does the (physical) connection to the application server work? Do I just add another HBA with external SAS ports to it, connect it to the external SAS ports of the HBA in the JBOD chassis and that's it?

Anything else you'd like to share in terms of advice, experience or similar?

I'm not sure what you mean with (my emphasis) "I have been looking at chassis like [...]". If you mean specifically a type of chassis that is basically a shelve of disks, than this enclosure is such a one. Given this, it’s not really surprising these are also labelled as JBOD. More common are Supermicro chassis that have or need a motherboard installed. Then, in case of JBOD, you have backplanes for a JBOD version that has 24 separate HDD SAS/SATA connectors. That is not the case here: the backplane is already an expander-type backplane. Earlier I made the mistake that this JBOD enclosure had separate SAS/SATA connectors for each individual disk, thus without an expander chip, sorry for that.

This means that with these enclosure type chassis you have:
  1. a shelve, i.e. a separate enclosure chassis
  2. external SAS cables, connecting 1 with 3
  3. an HBA with external connections, housed in a separate server (chassis)
A fully populated enclosure and its set up needs 4 separate single SFF-8644 external cables and 2 AOC-SAS3-9300-8E HBAs. The expander backplane in the enclosure means you are multiplexing. For an overview of this multiplexing: SAS-2 Multiplexing.

Because of the SAS cable connectors your physical placement options in relation to your server where the HBA & server is located, is limited. Supermicro provides cables with two SFF-8644 connectors at both side; lengths of 1, 2, and 3 meters: SAS external cables. I'm not sure if all those lengths are capable of SAS3 speeds[*].

How does the (physical) connection to the application server work? Do I just add another HBA with external SAS ports to it, connect it to the external SAS ports of the HBA in the JBOD chassis and that's it?
This "another HBA with external SAS ports" refers to one of two recommended "Qualified SAS HBA / controllers" as mentioned on SuperMicro SC846BE1C-R1K03JBOD. You do not have an equivalent controller in this particular enclosure: the expander-backplane connects to the rear of the chassis: see Figure 4-2. Rear View on p. 4-1 of its manual. Here you have 8 SFF-8644 connectors of the E2C variant; that is the variant that has two expander chips enabling dual SAS-datapaths. Your E1C variant does not have that [*].

The AOC-SAS3-9300-8E looks "the same" controller as the SAS 9300-8e. IF that's the case, then you'll find some more info here: Broadcom sas-9300-8e HBA. No clear answers here: truenas-forum but a link to their SAS primer. More info on SAS expanders: SAS expanders and FAQ SAS expanders . As also stated by others, I think that every FreeBSD supported SAS3 (SAS2 when your drives highest supported speeds are SAS2) would suffice, with external connectors of course.

With these enclosures you have the compute/control burden at the external server. That can make sense if for example that server is sort of dedicated for this task. This is indicated in Figure 4-4. Sample Cascading Storage, Single HBA (p. 4-5) of its manual. Video examples:
  1. Level1: We bootstrapped our own ZFS storage server: 172tb, extremely low cost.
  2. SAS Expander JBOD Daisy Chain Instructions
In this case it's still one (FreeBSD) server but, with lots of external disks and not a separate server for each enclosure/shelve to manage. If you don't have a server available for this, then the question comes to mind why not a chassis with its own motherboard and (optical) ethernet interface. That could be a more flexible set up, especially distant-wise. You'll have to decide and weigh the different aspects of each solution.

___
[*] You could also consider to contact Supermicro Europe in the Netherlands with technical questions that you might have and perhaps alternative chassis. No FreeBSD specific drivers I'm afraid.
 
Thank you for all the valuable information you guys provided - much appreciated!

As I understand I basically need (or want) to choose between one of these two options:
  1. A bunch of disks in a separate chassis connected to the server via external SAS.
  2. A dedicated storage server running FreeBSD providing access to the storage via something like 10Gbps network.
Could you guys elaborate on the (dis)advantages of each option in terms of performance? Assuming that the "main server" can be as beefy as needed I would assume that solution 1 with the disks connected via external SAS might be the more favorable option - is that correct?

In case of option 2, I take it that I would want to have the dedicated storage server act as an NFS host to which the other server can connect to?

One more question: Given that performance is not of the greatest priority: Can I put SATA disks into a JBOD exclosure and still use the external SAS interface? As per my understanding that would work as the backplane in the JBOD chasis would do the SAS/SATA translation, correct?
 
Should this server be mainly file storage? Or will RDBMS work on it? What's exactly the main purpose of it?

My advise is when really having the need to build such a storage unit by yourself and not taking one of the shelf instead - which makes sense, because DIY is much much cheaper - it can never hurt to have a look at companies which have tons of experience in building such units by themselves and are making their design as well experiences public.

Backblaze as a backup company (not affiliated with them in any way) builds their own storage units since ages, and in the hundreds and above. They are always giving you detailed lists about what they are using to build their storage servers down to the screw, including parts list, wiring diagrams, where to get the chassis and so on. You get the idea. It even comes with detailed building instructions.

Their latest iteration of such a storage pod can be found here, I figure even when not building a machine like that it might give you some interesting insight by people who've got years of experience now in such things. I mean these guys are literally running their business using these machines.


The linked design is a 480 TB storage server, 4U, where 1 GB is around .05 $. 2016 prices.

They also make drive failure stats public quarterly, in case you wonder which HDD brand to chose. And their sample size is also big enough to make statistically significant statements as well. In short: don't use Seagate.

 
Assuming that the "main server" can be as beefy as needed I would assume that solution 1 with the disks connected via external SAS might be the more favorable option - is that correct?
Ok now we are drilling down to the core tenants. You have a main server and want to add storage component.

To me Option 1 is a safe option but not the best. Why?
#1 Almost all these use expanders. They blow chunks. You want direct connect to controller.
For 24 drives you need 6 Channels of SAS.
Notice how the JBOD you asked about only has 4 SAS Channels. Right there you are shortchanging your disks.
24 drives need 6 channels.

So for only one "Main Server" I can see why you might buy the JBOD you linked to.
You could add two SAS9300-8E to "Main Server" and have 4 channels to JBOD. Single consumer.
Additionally you could go with newest SAS9400-16E and knock it down to a single slot solution.

I don't like the fact that the JBOD case is totally empty except front bays. Its ridiculus for 26" deep chassis.

Personally I prefer Option #2. It will add an extra box but it is dedicated storage server.
For "Main Server" connection I would recommend Quad Port Chelsio 10G LAGG 4 ports to "storage box" using the same hardware.
That is unless you have a 10G switch. Then you could distribute it differently. LAGG to trunk ports on main switch.
 
One more question: Given that performance is not of the greatest priority: Can I put SATA disks into a JBOD exclosure and still use the external SAS interface? As per my understanding that would work as the backplane in the JBOD chasis would do the SAS/SATA translation, correct?
Yes That will work. You can buy enterprise SATA drives that work fine.
In case of option 2, I take it that I would want to have the dedicated storage server act as an NFS host to which the other server can connect to?
This is where ZFS shines. You can run multiple export formats from the same storage array without separate storage pools.
So depending on clients; for Windows you might want Samba and for Linux iSCSI and FreeBSD using NFS.
All possible from the same disk pool.
UFS you have to set aside a disk share for each. Even if barely used. Rigid versus flexible.
 
This is where ZFS shines. You can run multiple export formats from the same storage array without separate storage pools.
So depending on clients; for Windows you might want Samba and for Linux iSCSI and FreeBSD using NFS.
All possible from the same disk pool.
Can you be a bit more specific here? When you say "Multiple export formats" you're basically talking about serving multiple services (eg. NFS, Samba, ...) on the storage server, right? You're not referring to some magic which allows the "main server" to access the ZFS storage pool for the "storage server" via networking through a mechanism other than the typical NFS, SMB, ... and so on. Or is there some magic "remote ZFS" feature that you're referring to?
 
you're basically talking about serving multiple services (eg. NFS, Samba, ...) on the storage server, right?
Yes but ZFS allows you to combine the storage part of the services. One pool for all.
Whereas on UFS you must define a chunk of disk to allocate to the service.
 
Back
Top