If pull is interrupted, we may have downloaded an arbitrary subset of
the requested objects. Previously, we handled this by scanning for
all objects each time.
However, there's an easy optimization - this patch creates a lock file
in the repo. If we don't see that file when starting a pull, we know
we don't need to stat() every file; presence of a dirtree object for
example implies the existence of everything it references.
Originally, the idea was that clients would replicate "OS/tree"s from
a build server, but we'd run things like "ldconfig" on the client.
This was to allow adding e.g. the nVidia binary driver.
However, the triggers were the only thing in the system at the moment
that really had expected knowledge of the *contents* of the OS, like
the location of binaries.
For now, it's architecturally cleaner if we move the burden of
triggers to the tree builder (e.g. gnome-ostree or RPM). Eventually
we may want OSTree to assist with this type of thing (perhaps
something like RPM %ghost), but this is the right thing to do now.
When multiple threads need to uncompress an object, there was
a race condition where thread A could get EEXIST, unlink,
then thread B calls linkat(), then thread A tries to link() but
fails.
We can just loop in this case.
Previously we'd open(path, O_NOATIME) and do a series of small read()
calls to just parse the header. I think this will trigger kernel readahead
into the compressed portion, but we don't care about that.
This should be more efficient.
This seems to work around a likely Linux kernel VFS bug, where I
randomly see ENOENT on link() when we *definitely* called mkdir() at
an earlier point in time.
This is an incompatible change to archive-z, thus it is now renamed to
archive-z2 and ostree will no longer parse archive-z.
I noticed in perf that we were spending some time zlib-decompressing
file headers, which is just inefficient. Rather than do this, keep
the headers uncompressed, and just zlib-compress content.
This gives us something closer to the advantages of archive and
archive-z when using the latter. Concretely we get deduplication
among multiple checkouts, along with the "devino" hash table trick
during commits to avoid checksumming content again.
This is enabled by default.
For similar reasons as metadata, this avoids having the main thread
blocked in fdatasync(), and even better - we can achieve much higher
parallelism if we have multiple threads blocked on fdatasync().
This is where loose content objects are stored as one compressed file,
instead of the two separate ones for regular archive mode. This mode
would be suitable for HTTP servers, beause only one HTTP request is
necessary, and the result would be compressed.
Cleanly separate metadata/content APIs, rather than defaulting to
raw streams. This helps most use cases.
Also, drop support for staging content without knowing the total
length. This complicated the code, and for things like streaming
HTTP, we should be able to figure this out from Content-Length.