There's still some silliness here, but there is now only one opcode
open-splice-and-close, that writes a single chunk from the payload.
This is really all we need for metadata, and small content objects are
also fine with this.
We get some deduplication between content objects by creating a
dictionary for (uid,gid,mode) tuples and xattrs.
This still keeps the operation/payload code in, so we could do
rollsums in a future update easily.
We have a chain of checksums from the root up until here. While doing
checksums of the objects individually would be a good redundancy check
for test cases and the like, when doing a pull there's no good reason
to burn cycles on SHA256.
This caused deadlocks and/or EMFILE due to the interaction between
threads and fds. What we really want here is a better pull-based
model for parsing content objects.
Another idea would be to change static deltas so that content objects
have a special opcode that includes their metadata first, and then do
rollsums etc. only over actual content.
This will avoid too many open files at the same time that could cause
an EMFILE error.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
(cherry picked from commit bc092b06f0e34e93f7d6102957bf55fd7ffd1b9e)
This has a very basic level of functionality (deltas can be generated,
and applied offline). There is only some stubbed out pull code to
fetch them via HTTP.
But, better to commit this now and improve it from a known starting
point, rather than have it languish in a branch.