ZFS data subsetting in the form of redaction

Using the ZFS redact feature is reasonably easy.

  • Yes.

    Votes: 0 0.0%
  • No.

    Votes: 0 0.0%

  • Total voters
    1
Hi y’all, I am trying to wrap my head around the zfs redact feature (com.delphix:redaction_bookmarks and com.delphix:redacted_datasets zpool-features(7)). I have a bit of troubles following the example in zfs-send(8), § Redaction. Let me quote the text followed by my interpretation:​
[TABLE=noborder,center]
[TR]
[TH]description[/TH][TH]action[/TH]
[/TR]
[TR]
[TD]
In order to make the purpose of the feature more clear, an example is provided. Consider a zfs filesystem containing four files. These files represent information for an online shopping service.​
[/TD]
[TD]
Bash:
zfs create tank/onlineshop
cd /tank/onlineshop
[/TD]
[/TR]
[TR]
[TD]
One file contains a list of usernames and passwords,​
[/TD]
[TD]
Bash:
cat > usernames_and_passwords << 'EOT'
Kai Burghardt,$1$$I2o9Z7NcvQAKp7wyCTlia0
EOT
[/TD]
[/TR]
[TR]
[TD]
another contains purchase histories,​
[/TD]
[TD]
Bash:
cat > purchase_history << 'EOT'
Kai Burghardt,2023-11-08T18:38Z,10 boxes of Oreos
EOT
[/TD]
[/TR]
[TR]
[TD]
a third contains click tracking data,​
[/TD]
[TD]
Bash:
cat > click_tracking_data << 'EOT'
2607:fc50:1:c600:216:3eff:fe15:195,2023-11-08T18:38Z,dick_towel_promo
EOT
[/TD]
[/TR]
[TR]
[TD]
and a fourth contains user preferences.​
[/TD]
[TD]
Bash:
cat > user_preferences << 'EOT'
Kai Burghardt,no_newsletter
EOT
[/TD]
[/TR]
[TR]
[TD]
The owner of this data wants to make it available for their development teams to test against, and their market research teams to do analysis on. The development teams need information about user preferences and the click tracking data, while the market research teams need information about purchase histories and user preferences. Neither needs access to the usernames and passwords. However, because all of this data is stored in one ZFS filesystem, it must all be sent and received together. In addition, the owner of the data wants to take advantage of features like compression, checksumming, and snapshots, so they do want to continue to use ZFS to store and transmit their data. Redaction can help them do so.​
[/TD][TD]
N/A​
[/TD]
[/TR]
[TR]
[TD]
First, they would make two clones of a snapshot of the data on the source.​
[/TD]
[TD]
Bash:
zfs snapshot tank/onlineshop@2023-11-08T18:40Z
zfs clone tank/onlineshop@2023-11-08T18:40Z tank/onlineshop_marketresearchteam
zfs clone tank/onlineshop@2023-11-08T18:40Z tank/onlineshop_developmentteam
[/TD]
[/TR]
[TR]
[TD]
In one clone, they create the setup they want their market research team to see; they delete the usernames and passwords file, and overwrite the click tracking data with dummy information.​
[/TD]
[TD]
Bash:
rm /tank/onlineshop_marketresearchteam/usernames_and_passwords
cat > /tank/onlineshop_marketresearchteam/click_tracking_data << 'EOT'
1000,1970-01-01T00:00Z,dick_towel_promo
EOT
[/TD]
[/TR]
[TR]
[TD]
In another, they create the setup they want the development teams to see, by replacing the passwords with fake information and replacing the purchase histories with randomly generated ones.​
[/TD]
[TD]
Bash:
cat > /tank/onlineshop_developmentteam/usernames_and_passwords << 'EOT'
Johnny Doe1,fakepassword
EOT
cat > /tank/onlineshop_developmentteam/purchase_history << 'EOT'
Johnny Doe1,2345-12-08T14:14Z,NaN boxes of Hydrox
EOT
[/TD]
[/TR]
[TR]
[TD]
They would then create a redaction bookmark on the parent snapshot, using snapshots on the two clones as redaction snapshots.​
[/TD]
[TD]
Bash:
zfs snapshot tank/onlineshop_marketresearchteam@2023-11-08T18:42Z
zfs snapshot tank/onlineshop_developmentteam@2023-11-08T18:42Z
zfs redact tank/onlineshop@2023-11-08T18:40 redacted \
           tank/onlineshop_marketresearchteam@2023-11-08T18:42Z \
           tank/onlineshop_developmentteam@2023-11-08T18:42Z
[/TD]
[/TR]
[TR]
[TD]
The parent can then be sent, redacted, to the target server where the research and development teams have access.​
[/TD]
[TD]
Bash:
zfs send --redact redacted tank/onlineshop@2023-11-08T18:40 | \
zfs receive tank/onlineshop_redacted # on target server
[/TD]
[/TR]
[/TABLE]
Finally, incremental sends from the parent snapshot to each of the clones can be sent to and received on the target server; these snapshots are identical to the ones on the source, and are ready to be used, while the parent snapshot on the target contains none of these snapshots are identical to the ones on the source, and are ready to be used, while the parent snapshot on the target contains none of the username and password data present on the source, because it was removed by the redacted send operation.​
Now I’m lost. How is it ensured that the market research team and the development team have access to their respective sets of data? How can I grant the market research team access to data they are supposed to read? I understand that a redacted data set cannot be mounted. Can the development team obtain read/write access to (their derivative of) onlineshop data? How? Are my premises (the show code) correct? Thank you for your assistance.​
 
I am not familiar with this feature, and have never used it. Keeping that in mind, this is how I think it is supposed to work. I think there's an error in this part of the documentation:
...while the parent snapshot on the target contains none of these snapshots are identical to the ones on the source, and are ready to be used, while the parent snapshot on the target contains none of the username and password data present on the source, because it was removed by the redacted send operation.
That (run-on?) sentence doesn't make sense. I think what they meant to say is that the parent snapshot on the source has the unredacted data, but the parent snapshot on the target(s) does not.

Can you clone the parent snapshot on the target? Do you see the two redacted snapshots on the target? If so, can you clone them? If I understand ZFS correctly, that's the way to get read/write access to a snapshot; you clone it.

Awesome clear post with excellent formatting, BTW. Unfortunately it cannot be quoted properly, but that's a problem for the site admin.
 
The great thing about open source is that, well, that you can dig into the source code. ? Fortunately it is not necessary to look at the actual C code. ?‍? There is a test suite for this feature in /usr/src/sys/contrib/openzfs/tests/zfs-tests/tests/functional/redacted_send/. The sequence of commands paired with comments gives me some idea how things ought to work. In particular it is necessary to set the vfs.zfs.allow_redacted_dataset_mount sysctl(8) to 1 (kind of mentioned in zfs(4)) and force the mount. ?​
Bash:
sysctl vfs.zfs.allow_redacted_dataset_mount=1
zfs mount -f tank/onlineshop_redacted # this is readable _and_ writable
However, I can’t figure out the redaction mechanics. For starters usernames_and_passwords, which was deleted in the market research redaction snapshot, exists, but is a hole, so the development team could not work with this.​
Bash:
hd usernames_and_passwords
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000020
All other files are unredacted! ? I read up on the original OpenZFS pull request 7958. It turns out, quote, “[…] the granularity of redaction occurs on a per-block basis. Blocks cannot be partially redacted.” So I suppose the observed behavior should not be surprising. If there is at least 1 Byte to redact, the entire block is redacted.

My (potential) application is just data subsetting, so I’m satisfied. ? I’m curious though how the redact feature could be used to actually redact sensitive information and just that. The manual page suggests this was possible (at least I construe it like that). However you have no control in which block data is written. :-/
Can you clone the parent snapshot on the target? Do you see the two redacted snapshots on the target? If so, can you clone them? If I understand ZFS correctly, that's the way to get read/write access to a snapshot; you clone it.
Yeah, this is a full zfs-send(8) stream. zstreamdump(8) shows there is a REDACT for usernames_and_passwords, but other than that the target receives a full file system. Therefore there is no need to take the zfs-clone(8) detour. The redaction snapshots are not present, only indicated in the redact_snaps zfsprops(7), and the original branching snapshot (the very first zfs-snapshot(8)) wasn’t transferred (I mean, that would defeat the purpose).

PS: Currently you do not want to receive a redacted dataset to a pool you boot from, see PR 264174.​
 
Last edited:
Back
Top