Hi y’all, I am trying to wrap my head around the
[TABLE=noborder,center] zfs redact
feature (com.delphix:redaction_bookmarks
and com.delphix:redacted_datasets
zpool-features(7)). I have a bit of troubles following the example in zfs-send(8), § Redaction. Let me quote the text followed by my interpretation:[TR]
[TH]description[/TH][TH]action[/TH]
[/TR]
[TR]
[TD]
[/TD]In order to make the purpose of the feature more clear, an example is provided. Consider a zfs filesystem containing four files. These files represent information for an online shopping service.
[TD]
Bash:
zfs create tank/onlineshop
cd /tank/onlineshop
[/TR]
[TR]
[TD]
[/TD]One file contains a list of usernames and passwords,
[TD]
Bash:
cat > usernames_and_passwords << 'EOT'
Kai Burghardt,$1$$I2o9Z7NcvQAKp7wyCTlia0
EOT
[/TR]
[TR]
[TD]
[/TD]another contains purchase histories,
[TD]
Bash:
cat > purchase_history << 'EOT'
Kai Burghardt,2023-11-08T18:38Z,10 boxes of Oreos
EOT
[/TR]
[TR]
[TD]
[/TD]a third contains click tracking data,
[TD]
Bash:
cat > click_tracking_data << 'EOT'
2607:fc50:1:c600:216:3eff:fe15:195,2023-11-08T18:38Z,dick_towel_promo
EOT
[/TR]
[TR]
[TD]
[/TD]and a fourth contains user preferences.
[TD]
Bash:
cat > user_preferences << 'EOT'
Kai Burghardt,no_newsletter
EOT
[/TR]
[TR]
[TD]
[/TD][TD]The owner of this data wants to make it available for their development teams to test against, and their market research teams to do analysis on. The development teams need information about user preferences and the click tracking data, while the market research teams need information about purchase histories and user preferences. Neither needs access to the usernames and passwords. However, because all of this data is stored in one ZFS filesystem, it must all be sent and received together. In addition, the owner of the data wants to take advantage of features like compression, checksumming, and snapshots, so they do want to continue to use ZFS to store and transmit their data. Redaction can help them do so.
N/A
[/TD][/TR]
[TR]
[TD]
[/TD]First, they would make two clones of a snapshot of the data on the source.
[TD]
Bash:
zfs snapshot tank/onlineshop@2023-11-08T18:40Z
zfs clone tank/onlineshop@2023-11-08T18:40Z tank/onlineshop_marketresearchteam
zfs clone tank/onlineshop@2023-11-08T18:40Z tank/onlineshop_developmentteam
[/TR]
[TR]
[TD]
[/TD]In one clone, they create the setup they want their market research team to see; they delete the usernames and passwords file, and overwrite the click tracking data with dummy information.
[TD]
Bash:
rm /tank/onlineshop_marketresearchteam/usernames_and_passwords
cat > /tank/onlineshop_marketresearchteam/click_tracking_data << 'EOT'
1000,1970-01-01T00:00Z,dick_towel_promo
EOT
[/TR]
[TR]
[TD]
[/TD]In another, they create the setup they want the development teams to see, by replacing the passwords with fake information and replacing the purchase histories with randomly generated ones.
[TD]
Bash:
cat > /tank/onlineshop_developmentteam/usernames_and_passwords << 'EOT'
Johnny Doe1,fakepassword
EOT
cat > /tank/onlineshop_developmentteam/purchase_history << 'EOT'
Johnny Doe1,2345-12-08T14:14Z,NaN boxes of Hydrox
EOT
[/TR]
[TR]
[TD]
[/TD]They would then create a redaction bookmark on the parent snapshot, using snapshots on the two clones as redaction snapshots.
[TD]
Bash:
zfs snapshot tank/onlineshop_marketresearchteam@2023-11-08T18:42Z
zfs snapshot tank/onlineshop_developmentteam@2023-11-08T18:42Z
zfs redact tank/onlineshop@2023-11-08T18:40 redacted \
tank/onlineshop_marketresearchteam@2023-11-08T18:42Z \
tank/onlineshop_developmentteam@2023-11-08T18:42Z
[/TR]
[TR]
[TD]
[/TD]The parent can then be sent, redacted, to the target server where the research and development teams have access.
[TD]
Bash:
zfs send --redact redacted tank/onlineshop@2023-11-08T18:40 | \
zfs receive tank/onlineshop_redacted # on target server
[/TR]
[/TABLE]
Finally, incremental sends from the parent snapshot to each of the clones can be sent to and received on the target server; these snapshots are identical to the ones on the source, and are ready to be used, while the parent snapshot on the target contains none of these snapshots are identical to the ones on the source, and are ready to be used, while the parent snapshot on the target contains none of the username and password data present on the source, because it was removed by the redacted send operation.
Now I’m lost. How is it ensured that the market research team and the development team have access to their respective sets of data? How can I grant the market research team access to data they are supposed to read? I understand that a redacted data set cannot be mounted. Can the development team obtain read/write access to (their derivative of) onlineshop data? How? Are my premises (the show code) correct? Thank you for your assistance.